Aligning Pharmacologic Classes Between MeSH and ATC Rainer Winnenburg1, Laritza Rodriguez1, Fiona Callaghan1, Alfred Sorbello2, Ana Szarfman2, and Olivier Bodenreider1 1 Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA 2 Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA ABSTRACT across vocabularies, pharmacologic classes exhibit greater Objective: To align pharmacologic classes in ATC and MeSH with lex- variability, not only in their names, but also in granularity. ical and instance-based techniques. Methods: Lexical alignment: we map the names of ATC classes to For example, the drug lisinopril is classified as Angiotensin- MeSH through the UMLS, leveraging normalization and additional synon- Converting Enzyme Inhibitors in MeSH, but as ACE inhibi- ymy. Instance-based alignment: we associate ATC and MeSH classes tors, plain in ATC. through the drugs they share, using the Jaccard coefficient to measure class-class similarity. We use a metric to distinguish between equivalence The objective of this study is to investigate various ontology and inclusion mappings. matching techniques for aligning pharmacologic classes Results: We found 221 lexical mappings, as well as 343 instance-based mappings, with a limited overlap (61). From the 343 instance-based map- between MeSH and ATC. Such methods are expected to pings we classify 113 as equivalence mappings and 230 as inclusion map- facilitate the curation of a mapping by experts. To our pings. A limited failure analysis is presented. knowledge, this work represents the first effort to map Conclusion: Our instance-based approach to aligning pharmacologic classes has the prospect of effectively supporting the creation of a mapping pharmacologic classes between MeSH and ATC using a of pharmacologic classes between ATC and MeSH. This exploratory inves- sophisticated instance-based alignment technique. tigation needs to be evaluated in order to adapt the thresholds for similarity. 2 BACKGROUND 1 INTRODUCTION The general framework of this study is that of ontology The National Library of Medicine (NLM) and the Food and alignment (or ontology matching). Various techniques have Drug Administration (FDA) Center for Drug Evaluation and been proposed for aligning concepts across ontologies, in- Research (CDER) are collaborating on a research project to cluding lexical techniques (based on the similarity of con- extract adverse drug reactions from the biomedical litera- cept names), structural techniques (based on the similarity ture. More specifically, this investigation leverages the in- of hierarchical relations), semantic techniques (based on dexing of MEDLINE citations to extract associations be- semantic similarity between concepts), and instance-based tween co-occurring drug entities and clinical manifestations techniques (based on the similarity of the set of instances of in the context of adverse events. two concepts). An overview of ontology alignment is pro- The biomedical literature is indexed with the Medical Sub- vided in (Euzenat and Shvaiko, 2007). ject Headings (MeSH) vocabulary. For data mining purpos- The main contribution of this paper is not to propose a novel es, however, adverse drug reactions are usually analyzed in technique, but rather to apply existing techniques to a novel reference to other standard vocabularies, namely the Ana- objective, namely aligning pharmacologic classes between tomical Therapeutic Chemical (ATC) drug classification MeSH and ATC. To this end, we use lexical and instance- system for drug entities, and the Medical Dictionary for based techniques, because the names of pharmacologic clas- Regulatory Activities (MedDRA) for clinical manifesta- ses and the list of drugs that are members of these classes tions. Toward this end, drug entities have to be mapped are the main two features available in these resources. from MeSH to ATC, and manifestations from MeSH to MedDRA. This paper focuses only on the drug entities. 2.1 Lexical techniques Drug entities include not only individual drugs (e.g., Lexical techniques for ontology matching compare concept atorvastatin), but also drug classes (e.g., statins). In previ- names across ontologies. When synonyms are available, ous work, we have mapped individual drugs between they can be used to identify additional matches. Matching RxNorm (which includes MeSH drugs) and ATC techniques beyond exact match utilize edit distance or nor- (Bodenreider and Taft, 2013; Winnenburg and Bodenreider, malization to account for minor differences between concept 2012). In contrast, no mapping is available between phar- names. macologic classes in MeSH and in ATC. Moreover, unlike As part of the Unified Medical Language System (UMLS), individual drugs, whose names are relatively standardized linguistically-motivated normalization techniques have been developed specifically for biomedical terms (McCray, et al., * To whom correspondence should be addressed: obodenrei- 1994). UMLS normalization abstracts away from inessential der@mail.nih.gov 1 Winnenburg et al. differences, such as inflection, case and hyphen variation, as ever, their mapping was limited to individual drugs and did well as word order variation. The UMLS normalization not include pharmacologic classes. techniques form the basis for integrating terms into the Lexical techniques are a component of most ontology UMLS Metathesaurus, but can be applied to terms that are alignment systems (Euzenat and Shvaiko, 2007). While not in the UMLS. For example, the ATC class Thiouracils there have been attempts to map individual drugs from ATC (H03BA) and the MeSH class Thiouracil (D013889) match to concepts in the UMLS and MeSH through lexical tech- after normalization (ignoring singular/plural differences). niques, (Merabti, et al., 2011) note that these techniques are Lexical techniques typically compare the names of concepts not appropriate for the mapping of pharmacologic classes. across two ontologies as provided by these ontologies. While instance-based techniques are also available in many However, additional synonyms can be used, for example, systems, the applicability of this technique is limited, be- synonyms from the UMLS Metathesaurus. In other words, cause there is often no available information about instances we leverage cosynonymy similarity for matching pharmaco- as part of the ontologies to be aligned. For example, most logic classes. In this case, although the ATC class Anticho- biomedical terminologies and ontologies are simple class linesterases (N06DA) and the MeSH class Cholinesterase hierarchies. The instances of these classes are present in Inhibitors (D002800) do not match lexically, both names electronic medical record systems and clinical data ware- are cosynonyms, because they are found among the syno- houses, but typically not distributed along with the ontolo- nyms of the UMLS Metathesaurus concept C0008425. gies. One exception in the biomedical domain is the Gene 2.2 Instance-based techniques Ontology (GO) (Ashburner, et al., 2000), for which the gene products annotated to GO terms can be considered instances Also called extensional techniques, instance-based tech- of the corresponding classes. (Kirsten, et al., 2007) have niques compare classes based on the sets of individuals (i.e., aligned GO terms across the three hierarchies of GO instances) of each class. Many biomedical ontologies con- through the gene products to which they are co-annotated. sist of class hierarchies, but do not contain information about instances. Here, however, individual drugs (e.g., To our knowledge, our work is the first attempt to align atorvastatin) are the members – not subclasses – of pharma- pharmacologic classes with instance-based techniques (i.e., cologic classes (e.g., statins). In other words, pharmacologic beyond name matching), and the first application of aligning classes have individual drugs as instances, not subclasses. pharmacologic classes in ATC and MeSH. Moreover, while most ontology alignment systems mainly consider matches Several methods have been proposed to implement instance- between equivalent classes, we are also interested in identi- based matching. (Isaac, et al., 2007) decompose these meth- fying those cases where one class is included in another ods into three basic elements: (1) A measure is used for class. evaluating the association between two classes based on the proportion of shared instances. Typical measures include 3 MATERIALS information-based measures (e.g., Jaccard similarity coeffi- cient) and statistical measures (e.g., log likelihood ratio). (2) 3.1 Anatomical Therapeutic Chemical Drug A threshold is applied to the measures and pairs of classes Classification System (ATC) for which the measure is above the threshold are deemed The ATC is a clinical drug classification system developed closely associated and mapping candidates. (3) Hierarchical and maintained by the World Health Organization (WHO) relations in the two ontologies to be aligned can also be lev- as a tool for drug utilization research to improve quality of eraged by deriving instance-class relations between instanc- drug use (ATC, 2013). The system is organized as a hierar- es of a given class and the ancestors of this class. In other chy that classifies clinical drug entities at five different lev- words, in addition to asserted classes (i.e., the classes of els: 1st level anatomical (e.g., A: Alimentary tract and me- which individual drugs are direct members), we also consid- tabolism), 2nd level therapeutic (e.g., A10: Drugs used in er inferred classes (i.e., the classes of which asserted classes diabetes), 3rd level pharmacological (e.g., A10B: Blood are subclasses). For example, the class asserted in MeSH for glucose lowering drugs, excluding insulins), 4th level chem- the drug atorvastatin is Hydroxymethylglutaryl-CoA Reduc- ical (e.g., A10BA: Biguanides), and 5th level chemical sub- tase Inhibitors (i.e., statins), whose parent concepts include stance or ingredient (e.g., A10BA02: metformin). The 2013 Anticholesteremic Agents. Therefore, the class Anticho- version of ATC integrates 4,516 5th-level drugs and 1,255 lesteremic Agents is an inferred pharmacologic class for drug groups (levels 1-4). atorvastatin. 3.2 MeSH 2.3 Related work The Medical Subject Headings (MeSH) is a controlled vo- As part of the EU-ADR project, (Avillach, et al., 2013) ex- cabulary produced and maintained by the NLM (NLM, tracted adverse drug reactions from the biomedical literature 2013). It is used for indexing, cataloging, and searching the and mapped MeSH drugs to ATC through the UMLS. How- 2 Aligning Pharmacologic Classes Between MeSH and ATC biomedical literature in the MEDLINE/PubMed database, and other documents. The MeSH thesaurus includes 26,853 descriptors (or “main headings”) organized in 16 hierarchies (e.g., Chemical and Drugs). Additionally, MeSH provides about 210,000 supplementary concept records (SCRs), of which many represent chemicals and drugs. Each SCR is linked to at least one descriptor. While most chemical de- scriptors provide a structural perspective on drugs, some descriptors play a special role as they can be used to denote Figure 1. Alignment of ATC and MeSH classes, alignment pharmacological actions in drug descriptors and SCRs. via their instances (left) in comparison to direct lexical MeSH 2013 is used in this study. mapping of the class names (right). 3.3 RxNorm broad classes. We also excluded 164 of the 1,241 ATC RxNorm is a standardized nomenclature for medications groups (2nd – 4th level) corresponding to drug combinations, produced and maintained by the U.S. National Library of because combination drugs are often underspecified in Medicine (NLM) (NLM, 2013). RxNorm concepts are ATC. linked by NLM to multiple drug identifiers for commercial- Similarly, in MeSH, we excluded the top-level descriptors ly available drug databases and standard terminologies, in- of the Chemicals and Drugs hierarchy (i.e., D01 - D27), as cluding MeSH. RxNorm serves as a reference terminology well as the top-level of the pharmacological action de- for drugs in the US. The March 2013 version of RxNorm scriptors (Pharmacologic Actions, Molecular Mechanisms used in this study integrates about 10,500 base and salt in- of Pharmacological Action, Physiological Effects of Drugs, gredients. NLM also provides an application programming and Therapeutic Uses). interface (API) for accessing RxNorm data programmatical- ly (NLM, 2013). 4.1 Lexical alignment We mapped all 1,077 eligible ATC classes (2nd – 4th level) 3.4 Unified Medical Language System (UMLS) to MeSH descriptors in the Chemicals and Drugs [D] tree The UMLS is a terminology integration system created and using the UMLS Terminology Services (UTS). More pre- maintained by the National Library of Medicine (NLM) cisely, we used the ExactString and NormalizedString (NLM, 2013). The UMLS Metathesaurus integrates over search function of the UTS API 2.0 to establish mappings 150 terminologies, including MeSH, but not ATC. Synony- from the names of the ATC classes to UMLS concepts. We mous terms across terminologies are grouped into concepts used normalization only when the exact technique did not and assigned the same concept unique identifier. The Me- result in a mapping. We then mapped the UMLS concepts to tathesaurus provides a comprehensive set of synonyms for MeSH descriptor IDs. biomedical concepts and is often used for integrating termi- nologies beyond its own. NLM provides an application pro- 4.2 Instance-based alignment gramming interface (API) for accessing UMLS data pro- Mapping ATC drugs to RxNorm ingredients. In previous grammatically. Version 2012AB of the UMLS is used in work we have mapped ATC single-ingredient drugs to In- this study. gredients (IN) and Precise Ingredients (PIN) in RxNorm using a lexical approach with additional normalization steps 4 METHODS (Winnenburg and Bodenreider, 2012). We used these map- Our approach to aligning pharmacologic classes between pings to establish the alignment of ATC and RxNorm drugs MeSH and ATC based on their instances is depicted in Fig- in this study. ure 1 and can be summarized as follows. First, we estab- Mapping MeSH drugs to RxNorm ingredients. Since lished a lexical alignment of MeSH and ATC classes based MeSH drugs are integrated in RxNorm, mappings to equiva- on the class names and their synonyms (Figure 1, right). We lent drug concepts from MeSH can be obtained via the then constructed an instance-based alignment of MeSH and getProprietaryInformation function from the RxNorm API. ATC classes considering the individual drugs shared by the We systematically exploited this information for all Ingredi- classes (Figure 1, left). We mapped individual drugs from ents (IN) and Precise Ingredients (PIN) in RxNorm and cre- MeSH and ATC via their ingredients (IN) or precise ingre- ated a mapping table between RxNorm CUIs and MeSH dients (PIN) in RxNorm. We used a similarity measure and Main Headings (MH) and Supplementary Concept Records thresholds to identify class mappings and compared them (SCR). with the mappings retrieved by the lexical approach. Inferring class membership in ATC. We considered the In our alignment work, we excluded the 14 ATC groups of hierarchical relations from 5th level drugs to their 4th level level 1 (anatomical classification), because they are too 3 Winnenburg et al. chemical groups as asserted drug class membership. We inferred membership between 5th level drugs and groups of |A ∩ M| level 3 and 2 through transitive closure. For example, te- JC(A, M) = |A ∪ M| mafloxacin (J01MA05) is a member of the chemical group Fluoroquinolones (J01MA - asserted), the pharmacological �|A ∩ M| × (|A ∩ M| − 0.8) group QUINOLONE ANTIBACTERIALS (J01M - inferred), JCmod(A, M) = |A ∪ M| and the therapeutic group ANTIBACTERIALS FOR SYSTEMIC USE (J01 - inferred). where A ∩ M represents the number of drugs common to A Table 1. Asserted and inferred MeSH classes for the drug and M, and A ∪ M the total number of unique drugs in both temafloxacin (C054745) with type of relationship to the classes. drug and tree numbers in MeSH. The Jaccard coefficient measures the similarity between the two classes, but does not reflect whether one class is includ- Type Asserted Classes Inferred Classes ed in the other. Because of the difference in granularity be- tween classes in ATC and MeSH, we introduce a simple Anti-Bacterial Agents Anti-Infective Agents PA (D000900) (D000890) metric for detecting whether the drugs that are not shared by [D27.505.954.122.085] [D27.505.954.122] both classes are primarily in one of the two classes. This “one-sidedness” coefficient is calculated as follows: Quinolones (D015363) 0, for a = 0 and m = 0 [D03.438.810.835] |a-m| / a+m, otherwise. Fluoroquinolones Quinolines MH (D024841) (D011804) where a and m are the number of drugs specific to the ATC D03.438.810.835.322 [D03.438.810] class and the MeSH class, respectively. Thus, a “one- Heterocyclic Compounds, 2-Ring sidedness” coefficient close to 0 indicates that the drugs that (D006574) are not shared by the two classes are evenly distributed be- [D03.438] tween the ATC and MeSH class. In contrast, a coefficient close to 1 indicates that only one of the classes contains Inferring class membership in MeSH. We associated each most of the drugs that are not shared by the other. RxNorm ingredient (IN or PIN) with its corresponding Thresholds. In order to select the best equivalent or inclu- MeSH supplementary concept record (SCR) or main head- sion mappings between ATC and MeSH, we characterize ing (MH). In turn, we associated these drugs with their as- each pair of ATC and MeSH classes with respect to Jaccard serted classes. For an SCR, we considered its pharmacologi- similarity and one-sidedness. Low one-sidedness indicates cal actions, as well as the MeSH heading(s) mapped to. For equivalence and high one-sidedness indicates inclusion. a MH, we considered its pharmacological actions, as well as High Jaccard similarity indicates strong overlap between the its direct ancestors. These constitute the asserted classes. two classes. Based on preliminary analysis, we selected of a We inferred membership between the drugs and higher-level threshold of 0.5 for the one-sidedness metric. Similarly, we descriptors in the MeSH hierarchy. For example, as shown selected of a threshold of 0.5 and 0.25 for Jaccard similarity in Table 1, the SCR temafloxacin has Anti-Bacterial Agents for equivalence (low one-sidedness) and inclusion (high as pharmacological action and Fluoroquinolones as main one-sidedness), respectively. The lower threshold for Jac- heading mapped to. Form these asserted classes, we infer card similarity for inclusion was determined empirically. As membership to Anti-Infective Agents (from Anti-Bacterial shown in Table 2, each pair of ATC and MeSH classes is Agents) and to Quinolones, Quinolines, and Heterocyclic characterized as an equivalence mapping (EQ+), an inclu- Compounds, 2-Ring (from Fluoroquinolones). sion mapping (IN+), or not a mapping (EQ- and IN-). Measure for aligning ATC and MeSH classes. Based on the asserted and inferred class membership of drugs in ATC 5 RESULTS and MeSH we conducted a pairwise comparison of all ATC against all MeSH classes. For each pair of ATC class (A) 5.1 Lexical alignment and MeSH class (M), we computed the Jaccard coefficient. For the 1,077 eligible ATC groups of level 2-4, we were In order to reduce the similarity of pairs of classes with a able to retrieve 226 mappings to descriptors from the Chem- small number of shared members, we used a modified ver- icals and Drugs [D] tree in MeSH. We have 18 mappings sion of the Jaccard coefficient, JCmod, as suggested in for therapeutic classes (2nd level), 42 for pharmacological (Isaac, et al., 2007), classes (3rd level), and 161 for chemical classes (4th level). We ignored mappings for the broad anatomical classes (1st level). Of the 221 mappings, 96 are to pharmacological ac- 4 Aligning Pharmacologic Classes Between MeSH and ATC tions (functional perspective) in MeSH, whereas 125 are to Table 3. Comparison between lexical and instance-based other descriptors at various levels of the MeSH hierarchy alignment. (structural perspective). Instance-based 5.2 Instance-based alignment Yes No No assoc. Total Of the 1,077 eligible ATC groups, 874 (81%) could be as- Yes 61 19 141 221 sociated with at least one descriptor or pharmacological Lexical No 282 571 62 915 action in MeSH. We identified a total of 933 associations Total 343 590 203 1136 for the 874 ATC groups (multiple associations per ATC group possible). As shown in Table 2, based on the one- 6 DISCUSSION sidedness metric, we characterized 323 associations as equivalence and 610 as inclusion. Of the 323 equivalence 6.1 Examples and failure analysis associations, 113 (35%) exhibit high Jaccard similarity and True positive for equivalent instance-based mappings. We are selected as equivalence mappings (EQ+). Of the 610 identify an equivalence mapping between the 4th-level ATC inclusion associations, 230 (38%) exhibit high Jaccard simi- group Fluoroquinolones (J01MA) and the MeSH descriptor larity and are selected as inclusion mappings (IN+). The Fluoroquinolones (D024841). The two classes share 14 other associations (EQ- and IN-) are not deemed strong drugs. The ATC group has one extra drug (moxifloxacin), enough to denote mappings. In summary, we were able to and the MeSH descriptor has 2 (flumequine and besifloxa- characterize as a mapping (EQ+ and IN+) 343 (37%) of the cin). Jaccard similarity is high (0.82) and the one-sidedness associations between ATC and MeSH classes. It should be score is low (0.33), because the 3 drugs that are not in mentioned that we were not able to obtain mappings to common are not all on the same side. This mapping is also MeSH classes for 203 ATC classes, because they only con- identified by the lexical technique (exact match). tain drug instances that could not be mapped to drugs in True positive for inclusion instance-based mappings. We MeSH. identify an inclusion mapping between the 4th-level ATC group Fluoroquinolones (S01AE) and the MeSH descriptor Table 2. Characterization of the associations between ATC Fluoroquinolones (D024841). Although the two classes are and MeSH classes based on Jaccard similarity and score for seemingly identical, our mapping is identified as an inclu- one-sidedness. The numbers in grey fields indicate the asso- sion, with 7 drugs in common, 1 drug specific to the ATC ciations that are not strong enough to denote mappings. class and 9 drugs specific to the MeSH class. In fact, the ATC class is not the same general class for anti-infective One-sidedness agents as in the example above (J01MA), but rather the spe- ≥ .5 < .5 Total cific class of fluoroquinolones for ophthalmic use (S01AE). ≥ .5 EQ+ (113) 343 The fluoroquinolones used for eye disorders are a subset of IN+ all fluoroquinolones and the ATC class S01AE is appropri- [.25-.5[ (230) EQ- ately characterized as being included in the MeSH class for Jaccard 590 < .25 IN- (380) (210) fluoroquinolones. This example also illustrates a false posi- tive for the lexical mapping, since it is generally assumed Total 610 323 933 that lexical mappings are equivalence mappings. 5.3 Comparison between lexical and instance- False negative for equivalent instance-based mappings. based alignment Many ATC and MeSH classes share only one or very few As illustrated in Table 3, from the 221 lexical mappings drugs, making it difficult to assess equivalence or inclusion. between ATC and MeSH classes, we could confirm 61 with For example, the 4th-level ATC group Silver compounds our instance-based approach (30 as equivalence mappings, (D08AL) and the MeSH descriptor Silver Compounds 31 as inclusion mappings). For 19 of the lexical mappings (D018030) share only one drug (silver). The modified ver- we found an association with low Jaccard similarity (IN- / sion of the Jaccard coefficient has a score of 0.45 in this EQ -), and for 141 of the lexical mappings we did not find case, which is below our threshold of 0.5 for equivalence. any association through the instance-based alignment (main- During this failure analysis, we discovered that some MeSH ly due to the lack of any mapping for the drug instances in drugs did not have a pharmacological action assigned to these classes). Finally, the instance-based approach pro- them as we expected. For example, while pyrantel is listed duced 282 additional drug class mappings that were not as Antinematodal Agents, oxantel is not. We are investigat- detected by the lexical approach, whereas 633 (571 + 62) ing whether the pharmacological action for this SCR should ATC classes could neither be mapped by the lexical nor the be inferred from the descriptor to which it is mapped (Py- instance-based approach. rantel in this case). Because of these missing pharmacologic 5 Winnenburg et al. actions, the 3rd-level ATC group ANTINEMATODAL 6.3 Significance AGENTS (P02C) fails to be mapped to the MeSH pharma- To our knowledge, our work is the first attempt to align cological action Antinematodal Agents (D000969), the Jac- pharmacologic classes with instance-based techniques, dis- card similarity being just below the threshold (0.49). tinguishing between equivalence and inclusion relations, as Discrepancy between lexical and instance-based alignment well as the first application of alignment between pharmaco- (missed lexical mapping). Despite the use of UMLS synon- logic classes in ATC and MeSH. Our instance-based ap- ymy and normalization, the lexical alignment fails to identi- proach to aligning pharmacologic classes has yielded 343 fy a mapping between the 3rd-level ATC group mappings, and has the prospect of effectively supporting the POTASSIUM-SPARING AGENTS (C03D) and the MeSH creation of a mapping of pharmacologic classes between pharmacological action Diuretics, Potassium Sparing ATC and MeSH. This exploratory investigation needs to be (D062865). In contrast, the instance-based alignment identi- evaluated in order to adapt the thresholds for similarity. fies an equivalence mapping with very high Jaccard similar- ity (0.99). This finding is consistent with the conclusions of ACKNOWLEDGEMENTS (Merabti, et al., 2011). This work was supported by the Intramural Research Pro- Discrepancy between lexical and instance-based alignment gram of the NIH, National Library of Medicine and by the (missed instance-based mapping). We have identified sev- Center for Drug Evaluation and Research of the Food and eral causes for discrepancies between lexical and instance- Drug Administration. The authors want to thank Rave based alignments. As mentioned earlier, some ATC classes Harpaz and Anna Ripple for useful discussions. only contain drugs that cannot be mapped to MeSH through DISCLAIMER RxNorm, which we used to bridge between the two. Some- The findings and conclusions expressed in this report are times, the best instance-based mapping is to another class those of the authors and do not necessarily represent the than the class found by the lexical technique. Finally, some views of the FDA. drugs entities and biologicals (e.g., vaccines) are less well standardized than common drugs. For this reason, the in- REFERENCES stance-based alignment is unable to map these classes, when Ashburner, M., et al. (2000) Gene ontology: tool for the unification of simple lexical techniques can. biology. The Gene Ontology Consortium, Nat Genet, 25, 25-29. Anatomical Therapeutic Chemical (ATC) classification: 6.2 Limitations and future work http://www.whocc.no/atc/ Avillach, P., et al. (2013) Design and validation of an automated method to This exploratory investigation has several limitations, which detect known adverse drug reactions in MEDLINE: a contribution we plan to address in future work. from the EU-ADR project, J Am Med Inform Assoc, 20, 446-452. Bodenreider, O. and Taft, L.M. (2013) A mapping of RxNorm to the Evaluation. This exploratory investigation focuses primari- ATC/DDD Index helps analyze US prescription lists, AMIA Annu ly on the methodology and feasibility of the alignment, and Symp Proc, (submitted). does not include a formal evaluation. Since ATC and MeSH Euzenat, J. and Shvaiko, P. (2007) Ontology matching. Springer, New York. pharmacological actions are being integrated into RxNorm, Isaac, A., et al. (2007) An empirical study of instance-based ontology we will use the alignment created by RxNorm experts as the matching. In Aberer, K., et al. (eds), Proceedings of the 6th gold standard to evaluate our methods. international The semantic web and 2nd Asian conference on Asian semantic web conference (ISWC'07/ASWC'07). Springer-Verlag, pp. Perspective. Our perspective in this investigation is ATC- 253-266. centric, because we consider the best MeSH mapping for Kirsten, T., Thor, A. and Rahm, E. (2007) Instance-based matching of large life science ontologies In Cohen-Boulakia, S. and Tannen, V. (eds), each ATC class, but not the best ATC mapping for each Data Integration in the Life Sciences: 4th International Workshop, MeSH class. One future goal is to explore both directions DILS 2007, Philadelphia, PA, USA. Springer, pp. 172-187. using the same methodology. McCray, A.T., Srinivasan, S. and Browne, A.C. (1994) Lexical methods for managing variation in biomedical terminologies, Proc Annu Symp Bias towards equivalence mappings. Because we restrict Comput Appl Med Care, 235-239. our exploration to the MeSH class with the best Jaccard Merabti, T., et al. (2011) Mapping the ATC classification to the UMLS similarity for each ATC class (which we subsequently cate- metathesaurus: some pragmatic applications, Stud Health Technol gorize as equivalence or inclusion), and because of the dif- Inform, 166, 206-213. Medical Subject Headings (MeSH): http://www.nlm.nih.gov/mesh/ ferential threshold for Jaccard similarity between equiva- RxNorm: http://www.nlm.nih.gov/research/umls/rxnorm/ lence (0.5) and inclusion mappings (0.25), we potentially RxNorm API: http://rxnavdev.nlm.nih.gov/RxNormAPI.html fail to consider a good inclusion mapping (e.g., with a simi- Unified Medical Language System (UMLS): https://uts.nlm.nih.gov/ larity score of 0.39 [> 0.25]), when the best MeSH class is a Winnenburg, R. and Bodenreider, O. (2012) Mapping drug entities between the European and American standards, ATC and RxNorm, Poster bad equivalent mapping (e.g., with a similarity score of 0.41 Proceedings of the Eighth International Conference on Data [< 0.5]). Integration in the Life Sciences (DILS 2012), 22. 6