Aligning Pharmacologic Classes Between MeSH and ATC
    Rainer Winnenburg1, Laritza Rodriguez1, Fiona Callaghan1, Alfred Sorbello2, Ana Szarfman2,
                                    and Olivier Bodenreider1
        1
         Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA
             2
               Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA


ABSTRACT                                                                          across vocabularies, pharmacologic classes exhibit greater
    Objective: To align pharmacologic classes in ATC and MeSH with lex-           variability, not only in their names, but also in granularity.
ical and instance-based techniques.
    Methods: Lexical alignment: we map the names of ATC classes to                For example, the drug lisinopril is classified as Angiotensin-
MeSH through the UMLS, leveraging normalization and additional synon-             Converting Enzyme Inhibitors in MeSH, but as ACE inhibi-
ymy. Instance-based alignment: we associate ATC and MeSH classes                  tors, plain in ATC.
through the drugs they share, using the Jaccard coefficient to measure
class-class similarity. We use a metric to distinguish between equivalence        The objective of this study is to investigate various ontology
and inclusion mappings.                                                           matching techniques for aligning pharmacologic classes
    Results: We found 221 lexical mappings, as well as 343 instance-based
mappings, with a limited overlap (61). From the 343 instance-based map-           between MeSH and ATC. Such methods are expected to
pings we classify 113 as equivalence mappings and 230 as inclusion map-           facilitate the curation of a mapping by experts. To our
pings. A limited failure analysis is presented.                                   knowledge, this work represents the first effort to map
    Conclusion: Our instance-based approach to aligning pharmacologic
classes has the prospect of effectively supporting the creation of a mapping
                                                                                  pharmacologic classes between MeSH and ATC using a
of pharmacologic classes between ATC and MeSH. This exploratory inves-            sophisticated instance-based alignment technique.
tigation needs to be evaluated in order to adapt the thresholds for similarity.
                                                                                  2     BACKGROUND
1    INTRODUCTION                                                                 The general framework of this study is that of ontology
The National Library of Medicine (NLM) and the Food and                           alignment (or ontology matching). Various techniques have
Drug Administration (FDA) Center for Drug Evaluation and                          been proposed for aligning concepts across ontologies, in-
Research (CDER) are collaborating on a research project to                        cluding lexical techniques (based on the similarity of con-
extract adverse drug reactions from the biomedical litera-                        cept names), structural techniques (based on the similarity
ture. More specifically, this investigation leverages the in-                     of hierarchical relations), semantic techniques (based on
dexing of MEDLINE citations to extract associations be-                           semantic similarity between concepts), and instance-based
tween co-occurring drug entities and clinical manifestations                      techniques (based on the similarity of the set of instances of
in the context of adverse events.                                                 two concepts). An overview of ontology alignment is pro-
The biomedical literature is indexed with the Medical Sub-                        vided in (Euzenat and Shvaiko, 2007).
ject Headings (MeSH) vocabulary. For data mining purpos-                          The main contribution of this paper is not to propose a novel
es, however, adverse drug reactions are usually analyzed in                       technique, but rather to apply existing techniques to a novel
reference to other standard vocabularies, namely the Ana-                         objective, namely aligning pharmacologic classes between
tomical Therapeutic Chemical (ATC) drug classification                            MeSH and ATC. To this end, we use lexical and instance-
system for drug entities, and the Medical Dictionary for                          based techniques, because the names of pharmacologic clas-
Regulatory Activities (MedDRA) for clinical manifesta-                            ses and the list of drugs that are members of these classes
tions. Toward this end, drug entities have to be mapped                           are the main two features available in these resources.
from MeSH to ATC, and manifestations from MeSH to
MedDRA. This paper focuses only on the drug entities.                             2.1    Lexical techniques
Drug entities include not only individual drugs (e.g.,                            Lexical techniques for ontology matching compare concept
atorvastatin), but also drug classes (e.g., statins). In previ-                   names across ontologies. When synonyms are available,
ous work, we have mapped individual drugs between                                 they can be used to identify additional matches. Matching
RxNorm (which includes MeSH drugs) and ATC                                        techniques beyond exact match utilize edit distance or nor-
(Bodenreider and Taft, 2013; Winnenburg and Bodenreider,                          malization to account for minor differences between concept
2012). In contrast, no mapping is available between phar-                         names.
macologic classes in MeSH and in ATC. Moreover, unlike                            As part of the Unified Medical Language System (UMLS),
individual drugs, whose names are relatively standardized                         linguistically-motivated normalization techniques have been
                                                                                  developed specifically for biomedical terms (McCray, et al.,
*
 To whom correspondence should be addressed: obodenrei-                           1994). UMLS normalization abstracts away from inessential
der@mail.nih.gov


                                                                                                                                              1
Winnenburg et al.


differences, such as inflection, case and hyphen variation, as      ever, their mapping was limited to individual drugs and did
well as word order variation. The UMLS normalization                not include pharmacologic classes.
techniques form the basis for integrating terms into the            Lexical techniques are a component of most ontology
UMLS Metathesaurus, but can be applied to terms that are            alignment systems (Euzenat and Shvaiko, 2007). While
not in the UMLS. For example, the ATC class Thiouracils             there have been attempts to map individual drugs from ATC
(H03BA) and the MeSH class Thiouracil (D013889) match               to concepts in the UMLS and MeSH through lexical tech-
after normalization (ignoring singular/plural differences).         niques, (Merabti, et al., 2011) note that these techniques are
Lexical techniques typically compare the names of concepts          not appropriate for the mapping of pharmacologic classes.
across two ontologies as provided by these ontologies.              While instance-based techniques are also available in many
However, additional synonyms can be used, for example,              systems, the applicability of this technique is limited, be-
synonyms from the UMLS Metathesaurus. In other words,               cause there is often no available information about instances
we leverage cosynonymy similarity for matching pharmaco-            as part of the ontologies to be aligned. For example, most
logic classes. In this case, although the ATC class Anticho-        biomedical terminologies and ontologies are simple class
linesterases (N06DA) and the MeSH class Cholinesterase              hierarchies. The instances of these classes are present in
Inhibitors (D002800) do not match lexically, both names             electronic medical record systems and clinical data ware-
are cosynonyms, because they are found among the syno-              houses, but typically not distributed along with the ontolo-
nyms of the UMLS Metathesaurus concept C0008425.                    gies. One exception in the biomedical domain is the Gene
2.2    Instance-based techniques                                    Ontology (GO) (Ashburner, et al., 2000), for which the gene
                                                                    products annotated to GO terms can be considered instances
Also called extensional techniques, instance-based tech-
                                                                    of the corresponding classes. (Kirsten, et al., 2007) have
niques compare classes based on the sets of individuals (i.e.,
                                                                    aligned GO terms across the three hierarchies of GO
instances) of each class. Many biomedical ontologies con-
                                                                    through the gene products to which they are co-annotated.
sist of class hierarchies, but do not contain information
about instances. Here, however, individual drugs (e.g.,             To our knowledge, our work is the first attempt to align
atorvastatin) are the members – not subclasses – of pharma-         pharmacologic classes with instance-based techniques (i.e.,
cologic classes (e.g., statins). In other words, pharmacologic      beyond name matching), and the first application of aligning
classes have individual drugs as instances, not subclasses.         pharmacologic classes in ATC and MeSH. Moreover, while
                                                                    most ontology alignment systems mainly consider matches
Several methods have been proposed to implement instance-
                                                                    between equivalent classes, we are also interested in identi-
based matching. (Isaac, et al., 2007) decompose these meth-
                                                                    fying those cases where one class is included in another
ods into three basic elements: (1) A measure is used for
                                                                    class.
evaluating the association between two classes based on the
proportion of shared instances. Typical measures include
                                                                    3     MATERIALS
information-based measures (e.g., Jaccard similarity coeffi-
cient) and statistical measures (e.g., log likelihood ratio). (2)   3.1    Anatomical Therapeutic Chemical Drug
A threshold is applied to the measures and pairs of classes                Classification System (ATC)
for which the measure is above the threshold are deemed             The ATC is a clinical drug classification system developed
closely associated and mapping candidates. (3) Hierarchical         and maintained by the World Health Organization (WHO)
relations in the two ontologies to be aligned can also be lev-      as a tool for drug utilization research to improve quality of
eraged by deriving instance-class relations between instanc-        drug use (ATC, 2013). The system is organized as a hierar-
es of a given class and the ancestors of this class. In other       chy that classifies clinical drug entities at five different lev-
words, in addition to asserted classes (i.e., the classes of        els: 1st level anatomical (e.g., A: Alimentary tract and me-
which individual drugs are direct members), we also consid-         tabolism), 2nd level therapeutic (e.g., A10: Drugs used in
er inferred classes (i.e., the classes of which asserted classes    diabetes), 3rd level pharmacological (e.g., A10B: Blood
are subclasses). For example, the class asserted in MeSH for        glucose lowering drugs, excluding insulins), 4th level chem-
the drug atorvastatin is Hydroxymethylglutaryl-CoA Reduc-           ical (e.g., A10BA: Biguanides), and 5th level chemical sub-
tase Inhibitors (i.e., statins), whose parent concepts include      stance or ingredient (e.g., A10BA02: metformin). The 2013
Anticholesteremic Agents. Therefore, the class Anticho-             version of ATC integrates 4,516 5th-level drugs and 1,255
lesteremic Agents is an inferred pharmacologic class for            drug groups (levels 1-4).
atorvastatin.
                                                                    3.2    MeSH
2.3    Related work
                                                                    The Medical Subject Headings (MeSH) is a controlled vo-
As part of the EU-ADR project, (Avillach, et al., 2013) ex-         cabulary produced and maintained by the NLM (NLM,
tracted adverse drug reactions from the biomedical literature       2013). It is used for indexing, cataloging, and searching the
and mapped MeSH drugs to ATC through the UMLS. How-


2
                                                                      Aligning Pharmacologic Classes Between MeSH and ATC


biomedical literature in the MEDLINE/PubMed database,
and other documents. The MeSH thesaurus includes 26,853
descriptors (or “main headings”) organized in 16 hierarchies
(e.g., Chemical and Drugs). Additionally, MeSH provides
about 210,000 supplementary concept records (SCRs), of
which many represent chemicals and drugs. Each SCR is
linked to at least one descriptor. While most chemical de-
scriptors provide a structural perspective on drugs, some
descriptors play a special role as they can be used to denote   Figure 1. Alignment of ATC and MeSH classes, alignment
pharmacological actions in drug descriptors and SCRs.             via their instances (left) in comparison to direct lexical
MeSH 2013 is used in this study.                                             mapping of the class names (right).

3.3    RxNorm                                                   broad classes. We also excluded 164 of the 1,241 ATC
RxNorm is a standardized nomenclature for medications           groups (2nd – 4th level) corresponding to drug combinations,
produced and maintained by the U.S. National Library of         because combination drugs are often underspecified in
Medicine (NLM) (NLM, 2013). RxNorm concepts are                 ATC.
linked by NLM to multiple drug identifiers for commercial-      Similarly, in MeSH, we excluded the top-level descriptors
ly available drug databases and standard terminologies, in-     of the Chemicals and Drugs hierarchy (i.e., D01 - D27), as
cluding MeSH. RxNorm serves as a reference terminology          well as the top-level of the pharmacological action de-
for drugs in the US. The March 2013 version of RxNorm           scriptors (Pharmacologic Actions, Molecular Mechanisms
used in this study integrates about 10,500 base and salt in-    of Pharmacological Action, Physiological Effects of Drugs,
gredients. NLM also provides an application programming         and Therapeutic Uses).
interface (API) for accessing RxNorm data programmatical-
ly (NLM, 2013).                                                 4.1   Lexical alignment
                                                                We mapped all 1,077 eligible ATC classes (2nd – 4th level)
3.4    Unified Medical Language System (UMLS)
                                                                to MeSH descriptors in the Chemicals and Drugs [D] tree
The UMLS is a terminology integration system created and        using the UMLS Terminology Services (UTS). More pre-
maintained by the National Library of Medicine (NLM)            cisely, we used the ExactString and NormalizedString
(NLM, 2013). The UMLS Metathesaurus integrates over             search function of the UTS API 2.0 to establish mappings
150 terminologies, including MeSH, but not ATC. Synony-         from the names of the ATC classes to UMLS concepts. We
mous terms across terminologies are grouped into concepts       used normalization only when the exact technique did not
and assigned the same concept unique identifier. The Me-        result in a mapping. We then mapped the UMLS concepts to
tathesaurus provides a comprehensive set of synonyms for        MeSH descriptor IDs.
biomedical concepts and is often used for integrating termi-
nologies beyond its own. NLM provides an application pro-       4.2   Instance-based alignment
gramming interface (API) for accessing UMLS data pro-           Mapping ATC drugs to RxNorm ingredients. In previous
grammatically. Version 2012AB of the UMLS is used in            work we have mapped ATC single-ingredient drugs to In-
this study.                                                     gredients (IN) and Precise Ingredients (PIN) in RxNorm
                                                                using a lexical approach with additional normalization steps
4     METHODS                                                   (Winnenburg and Bodenreider, 2012). We used these map-
Our approach to aligning pharmacologic classes between          pings to establish the alignment of ATC and RxNorm drugs
MeSH and ATC based on their instances is depicted in Fig-       in this study.
ure 1 and can be summarized as follows. First, we estab-        Mapping MeSH drugs to RxNorm ingredients. Since
lished a lexical alignment of MeSH and ATC classes based        MeSH drugs are integrated in RxNorm, mappings to equiva-
on the class names and their synonyms (Figure 1, right). We     lent drug concepts from MeSH can be obtained via the
then constructed an instance-based alignment of MeSH and        getProprietaryInformation function from the RxNorm API.
ATC classes considering the individual drugs shared by the      We systematically exploited this information for all Ingredi-
classes (Figure 1, left). We mapped individual drugs from       ents (IN) and Precise Ingredients (PIN) in RxNorm and cre-
MeSH and ATC via their ingredients (IN) or precise ingre-       ated a mapping table between RxNorm CUIs and MeSH
dients (PIN) in RxNorm. We used a similarity measure and        Main Headings (MH) and Supplementary Concept Records
thresholds to identify class mappings and compared them         (SCR).
with the mappings retrieved by the lexical approach.            Inferring class membership in ATC. We considered the
In our alignment work, we excluded the 14 ATC groups of         hierarchical relations from 5th level drugs to their 4th level
level 1 (anatomical classification), because they are too


                                                                                                                            3
Winnenburg et al.


chemical groups as asserted drug class membership. We
inferred membership between 5th level drugs and groups of                                          |A ∩ M|
level 3 and 2 through transitive closure. For example, te-                            JC(A, M) =
                                                                                                   |A ∪ M|
mafloxacin (J01MA05) is a member of the chemical group
Fluoroquinolones (J01MA - asserted), the pharmacological                                 �|A ∩ M| × (|A ∩ M| − 0.8)
group QUINOLONE ANTIBACTERIALS (J01M - inferred),                       JCmod(A, M) =
                                                                                                  |A ∪ M|
and the therapeutic group ANTIBACTERIALS FOR
SYSTEMIC USE (J01 - inferred).                                    where A ∩ M represents the number of drugs common to A
    Table 1. Asserted and inferred MeSH classes for the drug      and M, and A ∪ M the total number of unique drugs in both
     temafloxacin (C054745) with type of relationship to the      classes.
                drug and tree numbers in MeSH.                    The Jaccard coefficient measures the similarity between the
                                                                  two classes, but does not reflect whether one class is includ-
Type     Asserted Classes        Inferred Classes                 ed in the other. Because of the difference in granularity be-
                                                                  tween classes in ATC and MeSH, we introduce a simple
         Anti-Bacterial Agents   Anti-Infective Agents
PA       (D000900)               (D000890)                        metric for detecting whether the drugs that are not shared by
         [D27.505.954.122.085]   [D27.505.954.122]                both classes are primarily in one of the two classes. This
                                                                  “one-sidedness” coefficient is calculated as follows:
                                 Quinolones
                                 (D015363)                                    0,             for a = 0 and m = 0
                                 [D03.438.810.835]
                                                                              |a-m| / a+m,   otherwise.
         Fluoroquinolones        Quinolines
MH       (D024841)               (D011804)                        where a and m are the number of drugs specific to the ATC
         D03.438.810.835.322     [D03.438.810]                    class and the MeSH class, respectively. Thus, a “one-
                                 Heterocyclic Compounds, 2-Ring   sidedness” coefficient close to 0 indicates that the drugs that
                                 (D006574)                        are not shared by the two classes are evenly distributed be-
                                 [D03.438]
                                                                  tween the ATC and MeSH class. In contrast, a coefficient
                                                                  close to 1 indicates that only one of the classes contains
Inferring class membership in MeSH. We associated each            most of the drugs that are not shared by the other.
RxNorm ingredient (IN or PIN) with its corresponding              Thresholds. In order to select the best equivalent or inclu-
MeSH supplementary concept record (SCR) or main head-             sion mappings between ATC and MeSH, we characterize
ing (MH). In turn, we associated these drugs with their as-       each pair of ATC and MeSH classes with respect to Jaccard
serted classes. For an SCR, we considered its pharmacologi-       similarity and one-sidedness. Low one-sidedness indicates
cal actions, as well as the MeSH heading(s) mapped to. For        equivalence and high one-sidedness indicates inclusion.
a MH, we considered its pharmacological actions, as well as       High Jaccard similarity indicates strong overlap between the
its direct ancestors. These constitute the asserted classes.      two classes. Based on preliminary analysis, we selected of a
We inferred membership between the drugs and higher-level         threshold of 0.5 for the one-sidedness metric. Similarly, we
descriptors in the MeSH hierarchy. For example, as shown          selected of a threshold of 0.5 and 0.25 for Jaccard similarity
in Table 1, the SCR temafloxacin has Anti-Bacterial Agents        for equivalence (low one-sidedness) and inclusion (high
as pharmacological action and Fluoroquinolones as main            one-sidedness), respectively. The lower threshold for Jac-
heading mapped to. Form these asserted classes, we infer          card similarity for inclusion was determined empirically. As
membership to Anti-Infective Agents (from Anti-Bacterial          shown in Table 2, each pair of ATC and MeSH classes is
Agents) and to Quinolones, Quinolines, and Heterocyclic           characterized as an equivalence mapping (EQ+), an inclu-
Compounds, 2-Ring (from Fluoroquinolones).                        sion mapping (IN+), or not a mapping (EQ- and IN-).
Measure for aligning ATC and MeSH classes. Based on
the asserted and inferred class membership of drugs in ATC        5     RESULTS
and MeSH we conducted a pairwise comparison of all ATC
against all MeSH classes. For each pair of ATC class (A)
                                                                  5.1    Lexical alignment
and MeSH class (M), we computed the Jaccard coefficient.          For the 1,077 eligible ATC groups of level 2-4, we were
In order to reduce the similarity of pairs of classes with a      able to retrieve 226 mappings to descriptors from the Chem-
small number of shared members, we used a modified ver-           icals and Drugs [D] tree in MeSH. We have 18 mappings
sion of the Jaccard coefficient, JCmod, as suggested in           for therapeutic classes (2nd level), 42 for pharmacological
(Isaac, et al., 2007),                                            classes (3rd level), and 161 for chemical classes (4th level).
                                                                  We ignored mappings for the broad anatomical classes (1st
                                                                  level). Of the 221 mappings, 96 are to pharmacological ac-


4
                                                                       Aligning Pharmacologic Classes Between MeSH and ATC


tions (functional perspective) in MeSH, whereas 125 are to         Table 3. Comparison between lexical and instance-based
other descriptors at various levels of the MeSH hierarchy                               alignment.
(structural perspective).
                                                                                           Instance-based
5.2   Instance-based alignment                                                         Yes     No    No assoc.        Total
Of the 1,077 eligible ATC groups, 874 (81%) could be as-                      Yes        61     19         141          221
sociated with at least one descriptor or pharmacological            Lexical
                                                                              No        282    571          62          915
action in MeSH. We identified a total of 933 associations                     Total     343    590         203         1136
for the 874 ATC groups (multiple associations per ATC
group possible). As shown in Table 2, based on the one-        6     DISCUSSION
sidedness metric, we characterized 323 associations as
equivalence and 610 as inclusion. Of the 323 equivalence       6.1     Examples and failure analysis
associations, 113 (35%) exhibit high Jaccard similarity and    True positive for equivalent instance-based mappings. We
are selected as equivalence mappings (EQ+). Of the 610         identify an equivalence mapping between the 4th-level ATC
inclusion associations, 230 (38%) exhibit high Jaccard simi-   group Fluoroquinolones (J01MA) and the MeSH descriptor
larity and are selected as inclusion mappings (IN+). The       Fluoroquinolones (D024841). The two classes share 14
other associations (EQ- and IN-) are not deemed strong         drugs. The ATC group has one extra drug (moxifloxacin),
enough to denote mappings. In summary, we were able to         and the MeSH descriptor has 2 (flumequine and besifloxa-
characterize as a mapping (EQ+ and IN+) 343 (37%) of the       cin). Jaccard similarity is high (0.82) and the one-sidedness
associations between ATC and MeSH classes. It should be        score is low (0.33), because the 3 drugs that are not in
mentioned that we were not able to obtain mappings to          common are not all on the same side. This mapping is also
MeSH classes for 203 ATC classes, because they only con-       identified by the lexical technique (exact match).
tain drug instances that could not be mapped to drugs in       True positive for inclusion instance-based mappings. We
MeSH.                                                          identify an inclusion mapping between the 4th-level ATC
                                                               group Fluoroquinolones (S01AE) and the MeSH descriptor
Table 2. Characterization of the associations between ATC      Fluoroquinolones (D024841). Although the two classes are
and MeSH classes based on Jaccard similarity and score for     seemingly identical, our mapping is identified as an inclu-
one-sidedness. The numbers in grey fields indicate the asso-   sion, with 7 drugs in common, 1 drug specific to the ATC
  ciations that are not strong enough to denote mappings.      class and 9 drugs specific to the MeSH class. In fact, the
                                                               ATC class is not the same general class for anti-infective
                                    One-sidedness              agents as in the example above (J01MA), but rather the spe-
                          ≥ .5            < .5      Total      cific class of fluoroquinolones for ophthalmic use (S01AE).
              ≥ .5                     EQ+ (113)     343       The fluoroquinolones used for eye disorders are a subset of
                           IN+                                 all fluoroquinolones and the ATC class S01AE is appropri-
            [.25-.5[      (230)           EQ-                  ately characterized as being included in the MeSH class for
Jaccard                                              590
             < .25      IN- (380)        (210)                 fluoroquinolones. This example also illustrates a false posi-
                                                               tive for the lexical mapping, since it is generally assumed
             Total        610             323        933       that lexical mappings are equivalence mappings.
5.3   Comparison between lexical and instance-                 False negative for equivalent instance-based mappings.
      based alignment                                          Many ATC and MeSH classes share only one or very few
As illustrated in Table 3, from the 221 lexical mappings       drugs, making it difficult to assess equivalence or inclusion.
between ATC and MeSH classes, we could confirm 61 with         For example, the 4th-level ATC group Silver compounds
our instance-based approach (30 as equivalence mappings,       (D08AL) and the MeSH descriptor Silver Compounds
31 as inclusion mappings). For 19 of the lexical mappings      (D018030) share only one drug (silver). The modified ver-
we found an association with low Jaccard similarity (IN- /     sion of the Jaccard coefficient has a score of 0.45 in this
EQ -), and for 141 of the lexical mappings we did not find     case, which is below our threshold of 0.5 for equivalence.
any association through the instance-based alignment (main-    During this failure analysis, we discovered that some MeSH
ly due to the lack of any mapping for the drug instances in    drugs did not have a pharmacological action assigned to
these classes). Finally, the instance-based approach pro-      them as we expected. For example, while pyrantel is listed
duced 282 additional drug class mappings that were not         as Antinematodal Agents, oxantel is not. We are investigat-
detected by the lexical approach, whereas 633 (571 + 62)       ing whether the pharmacological action for this SCR should
ATC classes could neither be mapped by the lexical nor the     be inferred from the descriptor to which it is mapped (Py-
instance-based approach.                                       rantel in this case). Because of these missing pharmacologic


                                                                                                                            5
Winnenburg et al.


actions, the 3rd-level ATC group ANTINEMATODAL                   6.3     Significance
AGENTS (P02C) fails to be mapped to the MeSH pharma-             To our knowledge, our work is the first attempt to align
cological action Antinematodal Agents (D000969), the Jac-        pharmacologic classes with instance-based techniques, dis-
card similarity being just below the threshold (0.49).           tinguishing between equivalence and inclusion relations, as
Discrepancy between lexical and instance-based alignment         well as the first application of alignment between pharmaco-
(missed lexical mapping). Despite the use of UMLS synon-         logic classes in ATC and MeSH. Our instance-based ap-
ymy and normalization, the lexical alignment fails to identi-    proach to aligning pharmacologic classes has yielded 343
fy a mapping between the 3rd-level ATC group                     mappings, and has the prospect of effectively supporting the
POTASSIUM-SPARING AGENTS (C03D) and the MeSH                     creation of a mapping of pharmacologic classes between
pharmacological action Diuretics, Potassium Sparing              ATC and MeSH. This exploratory investigation needs to be
(D062865). In contrast, the instance-based alignment identi-     evaluated in order to adapt the thresholds for similarity.
fies an equivalence mapping with very high Jaccard similar-
ity (0.99). This finding is consistent with the conclusions of   ACKNOWLEDGEMENTS
(Merabti, et al., 2011).                                         This work was supported by the Intramural Research Pro-
Discrepancy between lexical and instance-based alignment         gram of the NIH, National Library of Medicine and by the
(missed instance-based mapping). We have identified sev-         Center for Drug Evaluation and Research of the Food and
eral causes for discrepancies between lexical and instance-      Drug Administration. The authors want to thank Rave
based alignments. As mentioned earlier, some ATC classes         Harpaz and Anna Ripple for useful discussions.
only contain drugs that cannot be mapped to MeSH through         DISCLAIMER
RxNorm, which we used to bridge between the two. Some-
                                                                 The findings and conclusions expressed in this report are
times, the best instance-based mapping is to another class
                                                                 those of the authors and do not necessarily represent the
than the class found by the lexical technique. Finally, some
                                                                 views of the FDA.
drugs entities and biologicals (e.g., vaccines) are less well
standardized than common drugs. For this reason, the in-         REFERENCES
stance-based alignment is unable to map these classes, when      Ashburner, M., et al. (2000) Gene ontology: tool for the unification of
simple lexical techniques can.                                         biology. The Gene Ontology Consortium, Nat Genet, 25, 25-29.
                                                                 Anatomical        Therapeutic       Chemical       (ATC)      classification:
6.2   Limitations and future work                                      http://www.whocc.no/atc/
                                                                 Avillach, P., et al. (2013) Design and validation of an automated method to
This exploratory investigation has several limitations, which          detect known adverse drug reactions in MEDLINE: a contribution
we plan to address in future work.                                     from the EU-ADR project, J Am Med Inform Assoc, 20, 446-452.
                                                                 Bodenreider, O. and Taft, L.M. (2013) A mapping of RxNorm to the
Evaluation. This exploratory investigation focuses primari-            ATC/DDD Index helps analyze US prescription lists, AMIA Annu
ly on the methodology and feasibility of the alignment, and            Symp Proc, (submitted).
does not include a formal evaluation. Since ATC and MeSH         Euzenat, J. and Shvaiko, P. (2007) Ontology matching. Springer, New
                                                                       York.
pharmacological actions are being integrated into RxNorm,        Isaac, A., et al. (2007) An empirical study of instance-based ontology
we will use the alignment created by RxNorm experts as the             matching. In Aberer, K., et al. (eds), Proceedings of the 6th
gold standard to evaluate our methods.                                 international The semantic web and 2nd Asian conference on Asian
                                                                       semantic web conference (ISWC'07/ASWC'07). Springer-Verlag, pp.
Perspective. Our perspective in this investigation is ATC-             253-266.
centric, because we consider the best MeSH mapping for           Kirsten, T., Thor, A. and Rahm, E. (2007) Instance-based matching of large
                                                                       life science ontologies In Cohen-Boulakia, S. and Tannen, V. (eds),
each ATC class, but not the best ATC mapping for each                  Data Integration in the Life Sciences: 4th International Workshop,
MeSH class. One future goal is to explore both directions              DILS 2007, Philadelphia, PA, USA. Springer, pp. 172-187.
using the same methodology.                                      McCray, A.T., Srinivasan, S. and Browne, A.C. (1994) Lexical methods for
                                                                       managing variation in biomedical terminologies, Proc Annu Symp
Bias towards equivalence mappings. Because we restrict                 Comput Appl Med Care, 235-239.
our exploration to the MeSH class with the best Jaccard          Merabti, T., et al. (2011) Mapping the ATC classification to the UMLS
similarity for each ATC class (which we subsequently cate-             metathesaurus: some pragmatic applications, Stud Health Technol
gorize as equivalence or inclusion), and because of the dif-           Inform, 166, 206-213.
                                                                 Medical Subject Headings (MeSH): http://www.nlm.nih.gov/mesh/
ferential threshold for Jaccard similarity between equiva-       RxNorm: http://www.nlm.nih.gov/research/umls/rxnorm/
lence (0.5) and inclusion mappings (0.25), we potentially        RxNorm API: http://rxnavdev.nlm.nih.gov/RxNormAPI.html
fail to consider a good inclusion mapping (e.g., with a simi-    Unified Medical Language System (UMLS): https://uts.nlm.nih.gov/
larity score of 0.39 [> 0.25]), when the best MeSH class is a    Winnenburg, R. and Bodenreider, O. (2012) Mapping drug entities between
                                                                       the European and American standards, ATC and RxNorm, Poster
bad equivalent mapping (e.g., with a similarity score of 0.41          Proceedings of the Eighth International Conference on Data
[< 0.5]).                                                              Integration in the Life Sciences (DILS 2012), 22.


6