=Paper=
{{Paper
|id=None
|storemode=property
|title=Aligning Pharmacologic Classes Between MeSH and ATC
|pdfUrl=https://ceur-ws.org/Vol-1061/Paper5_vdos2013.pdf
|volume=Vol-1061
|dblpUrl=https://dblp.org/rec/conf/icbo/WinnenburgRCSSB13
}}
==Aligning Pharmacologic Classes Between MeSH and ATC==
Aligning Pharmacologic Classes Between MeSH and ATC
Rainer Winnenburg1, Laritza Rodriguez1, Fiona Callaghan1, Alfred Sorbello2, Ana Szarfman2,
and Olivier Bodenreider1
1
Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA
2
Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
ABSTRACT across vocabularies, pharmacologic classes exhibit greater
Objective: To align pharmacologic classes in ATC and MeSH with lex- variability, not only in their names, but also in granularity.
ical and instance-based techniques.
Methods: Lexical alignment: we map the names of ATC classes to For example, the drug lisinopril is classified as Angiotensin-
MeSH through the UMLS, leveraging normalization and additional synon- Converting Enzyme Inhibitors in MeSH, but as ACE inhibi-
ymy. Instance-based alignment: we associate ATC and MeSH classes tors, plain in ATC.
through the drugs they share, using the Jaccard coefficient to measure
class-class similarity. We use a metric to distinguish between equivalence The objective of this study is to investigate various ontology
and inclusion mappings. matching techniques for aligning pharmacologic classes
Results: We found 221 lexical mappings, as well as 343 instance-based
mappings, with a limited overlap (61). From the 343 instance-based map- between MeSH and ATC. Such methods are expected to
pings we classify 113 as equivalence mappings and 230 as inclusion map- facilitate the curation of a mapping by experts. To our
pings. A limited failure analysis is presented. knowledge, this work represents the first effort to map
Conclusion: Our instance-based approach to aligning pharmacologic
classes has the prospect of effectively supporting the creation of a mapping
pharmacologic classes between MeSH and ATC using a
of pharmacologic classes between ATC and MeSH. This exploratory inves- sophisticated instance-based alignment technique.
tigation needs to be evaluated in order to adapt the thresholds for similarity.
2 BACKGROUND
1 INTRODUCTION The general framework of this study is that of ontology
The National Library of Medicine (NLM) and the Food and alignment (or ontology matching). Various techniques have
Drug Administration (FDA) Center for Drug Evaluation and been proposed for aligning concepts across ontologies, in-
Research (CDER) are collaborating on a research project to cluding lexical techniques (based on the similarity of con-
extract adverse drug reactions from the biomedical litera- cept names), structural techniques (based on the similarity
ture. More specifically, this investigation leverages the in- of hierarchical relations), semantic techniques (based on
dexing of MEDLINE citations to extract associations be- semantic similarity between concepts), and instance-based
tween co-occurring drug entities and clinical manifestations techniques (based on the similarity of the set of instances of
in the context of adverse events. two concepts). An overview of ontology alignment is pro-
The biomedical literature is indexed with the Medical Sub- vided in (Euzenat and Shvaiko, 2007).
ject Headings (MeSH) vocabulary. For data mining purpos- The main contribution of this paper is not to propose a novel
es, however, adverse drug reactions are usually analyzed in technique, but rather to apply existing techniques to a novel
reference to other standard vocabularies, namely the Ana- objective, namely aligning pharmacologic classes between
tomical Therapeutic Chemical (ATC) drug classification MeSH and ATC. To this end, we use lexical and instance-
system for drug entities, and the Medical Dictionary for based techniques, because the names of pharmacologic clas-
Regulatory Activities (MedDRA) for clinical manifesta- ses and the list of drugs that are members of these classes
tions. Toward this end, drug entities have to be mapped are the main two features available in these resources.
from MeSH to ATC, and manifestations from MeSH to
MedDRA. This paper focuses only on the drug entities. 2.1 Lexical techniques
Drug entities include not only individual drugs (e.g., Lexical techniques for ontology matching compare concept
atorvastatin), but also drug classes (e.g., statins). In previ- names across ontologies. When synonyms are available,
ous work, we have mapped individual drugs between they can be used to identify additional matches. Matching
RxNorm (which includes MeSH drugs) and ATC techniques beyond exact match utilize edit distance or nor-
(Bodenreider and Taft, 2013; Winnenburg and Bodenreider, malization to account for minor differences between concept
2012). In contrast, no mapping is available between phar- names.
macologic classes in MeSH and in ATC. Moreover, unlike As part of the Unified Medical Language System (UMLS),
individual drugs, whose names are relatively standardized linguistically-motivated normalization techniques have been
developed specifically for biomedical terms (McCray, et al.,
*
To whom correspondence should be addressed: obodenrei- 1994). UMLS normalization abstracts away from inessential
der@mail.nih.gov
1
Winnenburg et al.
differences, such as inflection, case and hyphen variation, as ever, their mapping was limited to individual drugs and did
well as word order variation. The UMLS normalization not include pharmacologic classes.
techniques form the basis for integrating terms into the Lexical techniques are a component of most ontology
UMLS Metathesaurus, but can be applied to terms that are alignment systems (Euzenat and Shvaiko, 2007). While
not in the UMLS. For example, the ATC class Thiouracils there have been attempts to map individual drugs from ATC
(H03BA) and the MeSH class Thiouracil (D013889) match to concepts in the UMLS and MeSH through lexical tech-
after normalization (ignoring singular/plural differences). niques, (Merabti, et al., 2011) note that these techniques are
Lexical techniques typically compare the names of concepts not appropriate for the mapping of pharmacologic classes.
across two ontologies as provided by these ontologies. While instance-based techniques are also available in many
However, additional synonyms can be used, for example, systems, the applicability of this technique is limited, be-
synonyms from the UMLS Metathesaurus. In other words, cause there is often no available information about instances
we leverage cosynonymy similarity for matching pharmaco- as part of the ontologies to be aligned. For example, most
logic classes. In this case, although the ATC class Anticho- biomedical terminologies and ontologies are simple class
linesterases (N06DA) and the MeSH class Cholinesterase hierarchies. The instances of these classes are present in
Inhibitors (D002800) do not match lexically, both names electronic medical record systems and clinical data ware-
are cosynonyms, because they are found among the syno- houses, but typically not distributed along with the ontolo-
nyms of the UMLS Metathesaurus concept C0008425. gies. One exception in the biomedical domain is the Gene
2.2 Instance-based techniques Ontology (GO) (Ashburner, et al., 2000), for which the gene
products annotated to GO terms can be considered instances
Also called extensional techniques, instance-based tech-
of the corresponding classes. (Kirsten, et al., 2007) have
niques compare classes based on the sets of individuals (i.e.,
aligned GO terms across the three hierarchies of GO
instances) of each class. Many biomedical ontologies con-
through the gene products to which they are co-annotated.
sist of class hierarchies, but do not contain information
about instances. Here, however, individual drugs (e.g., To our knowledge, our work is the first attempt to align
atorvastatin) are the members – not subclasses – of pharma- pharmacologic classes with instance-based techniques (i.e.,
cologic classes (e.g., statins). In other words, pharmacologic beyond name matching), and the first application of aligning
classes have individual drugs as instances, not subclasses. pharmacologic classes in ATC and MeSH. Moreover, while
most ontology alignment systems mainly consider matches
Several methods have been proposed to implement instance-
between equivalent classes, we are also interested in identi-
based matching. (Isaac, et al., 2007) decompose these meth-
fying those cases where one class is included in another
ods into three basic elements: (1) A measure is used for
class.
evaluating the association between two classes based on the
proportion of shared instances. Typical measures include
3 MATERIALS
information-based measures (e.g., Jaccard similarity coeffi-
cient) and statistical measures (e.g., log likelihood ratio). (2) 3.1 Anatomical Therapeutic Chemical Drug
A threshold is applied to the measures and pairs of classes Classification System (ATC)
for which the measure is above the threshold are deemed The ATC is a clinical drug classification system developed
closely associated and mapping candidates. (3) Hierarchical and maintained by the World Health Organization (WHO)
relations in the two ontologies to be aligned can also be lev- as a tool for drug utilization research to improve quality of
eraged by deriving instance-class relations between instanc- drug use (ATC, 2013). The system is organized as a hierar-
es of a given class and the ancestors of this class. In other chy that classifies clinical drug entities at five different lev-
words, in addition to asserted classes (i.e., the classes of els: 1st level anatomical (e.g., A: Alimentary tract and me-
which individual drugs are direct members), we also consid- tabolism), 2nd level therapeutic (e.g., A10: Drugs used in
er inferred classes (i.e., the classes of which asserted classes diabetes), 3rd level pharmacological (e.g., A10B: Blood
are subclasses). For example, the class asserted in MeSH for glucose lowering drugs, excluding insulins), 4th level chem-
the drug atorvastatin is Hydroxymethylglutaryl-CoA Reduc- ical (e.g., A10BA: Biguanides), and 5th level chemical sub-
tase Inhibitors (i.e., statins), whose parent concepts include stance or ingredient (e.g., A10BA02: metformin). The 2013
Anticholesteremic Agents. Therefore, the class Anticho- version of ATC integrates 4,516 5th-level drugs and 1,255
lesteremic Agents is an inferred pharmacologic class for drug groups (levels 1-4).
atorvastatin.
3.2 MeSH
2.3 Related work
The Medical Subject Headings (MeSH) is a controlled vo-
As part of the EU-ADR project, (Avillach, et al., 2013) ex- cabulary produced and maintained by the NLM (NLM,
tracted adverse drug reactions from the biomedical literature 2013). It is used for indexing, cataloging, and searching the
and mapped MeSH drugs to ATC through the UMLS. How-
2
Aligning Pharmacologic Classes Between MeSH and ATC
biomedical literature in the MEDLINE/PubMed database,
and other documents. The MeSH thesaurus includes 26,853
descriptors (or “main headings”) organized in 16 hierarchies
(e.g., Chemical and Drugs). Additionally, MeSH provides
about 210,000 supplementary concept records (SCRs), of
which many represent chemicals and drugs. Each SCR is
linked to at least one descriptor. While most chemical de-
scriptors provide a structural perspective on drugs, some
descriptors play a special role as they can be used to denote Figure 1. Alignment of ATC and MeSH classes, alignment
pharmacological actions in drug descriptors and SCRs. via their instances (left) in comparison to direct lexical
MeSH 2013 is used in this study. mapping of the class names (right).
3.3 RxNorm broad classes. We also excluded 164 of the 1,241 ATC
RxNorm is a standardized nomenclature for medications groups (2nd – 4th level) corresponding to drug combinations,
produced and maintained by the U.S. National Library of because combination drugs are often underspecified in
Medicine (NLM) (NLM, 2013). RxNorm concepts are ATC.
linked by NLM to multiple drug identifiers for commercial- Similarly, in MeSH, we excluded the top-level descriptors
ly available drug databases and standard terminologies, in- of the Chemicals and Drugs hierarchy (i.e., D01 - D27), as
cluding MeSH. RxNorm serves as a reference terminology well as the top-level of the pharmacological action de-
for drugs in the US. The March 2013 version of RxNorm scriptors (Pharmacologic Actions, Molecular Mechanisms
used in this study integrates about 10,500 base and salt in- of Pharmacological Action, Physiological Effects of Drugs,
gredients. NLM also provides an application programming and Therapeutic Uses).
interface (API) for accessing RxNorm data programmatical-
ly (NLM, 2013). 4.1 Lexical alignment
We mapped all 1,077 eligible ATC classes (2nd – 4th level)
3.4 Unified Medical Language System (UMLS)
to MeSH descriptors in the Chemicals and Drugs [D] tree
The UMLS is a terminology integration system created and using the UMLS Terminology Services (UTS). More pre-
maintained by the National Library of Medicine (NLM) cisely, we used the ExactString and NormalizedString
(NLM, 2013). The UMLS Metathesaurus integrates over search function of the UTS API 2.0 to establish mappings
150 terminologies, including MeSH, but not ATC. Synony- from the names of the ATC classes to UMLS concepts. We
mous terms across terminologies are grouped into concepts used normalization only when the exact technique did not
and assigned the same concept unique identifier. The Me- result in a mapping. We then mapped the UMLS concepts to
tathesaurus provides a comprehensive set of synonyms for MeSH descriptor IDs.
biomedical concepts and is often used for integrating termi-
nologies beyond its own. NLM provides an application pro- 4.2 Instance-based alignment
gramming interface (API) for accessing UMLS data pro- Mapping ATC drugs to RxNorm ingredients. In previous
grammatically. Version 2012AB of the UMLS is used in work we have mapped ATC single-ingredient drugs to In-
this study. gredients (IN) and Precise Ingredients (PIN) in RxNorm
using a lexical approach with additional normalization steps
4 METHODS (Winnenburg and Bodenreider, 2012). We used these map-
Our approach to aligning pharmacologic classes between pings to establish the alignment of ATC and RxNorm drugs
MeSH and ATC based on their instances is depicted in Fig- in this study.
ure 1 and can be summarized as follows. First, we estab- Mapping MeSH drugs to RxNorm ingredients. Since
lished a lexical alignment of MeSH and ATC classes based MeSH drugs are integrated in RxNorm, mappings to equiva-
on the class names and their synonyms (Figure 1, right). We lent drug concepts from MeSH can be obtained via the
then constructed an instance-based alignment of MeSH and getProprietaryInformation function from the RxNorm API.
ATC classes considering the individual drugs shared by the We systematically exploited this information for all Ingredi-
classes (Figure 1, left). We mapped individual drugs from ents (IN) and Precise Ingredients (PIN) in RxNorm and cre-
MeSH and ATC via their ingredients (IN) or precise ingre- ated a mapping table between RxNorm CUIs and MeSH
dients (PIN) in RxNorm. We used a similarity measure and Main Headings (MH) and Supplementary Concept Records
thresholds to identify class mappings and compared them (SCR).
with the mappings retrieved by the lexical approach. Inferring class membership in ATC. We considered the
In our alignment work, we excluded the 14 ATC groups of hierarchical relations from 5th level drugs to their 4th level
level 1 (anatomical classification), because they are too
3
Winnenburg et al.
chemical groups as asserted drug class membership. We
inferred membership between 5th level drugs and groups of |A ∩ M|
level 3 and 2 through transitive closure. For example, te- JC(A, M) =
|A ∪ M|
mafloxacin (J01MA05) is a member of the chemical group
Fluoroquinolones (J01MA - asserted), the pharmacological �|A ∩ M| × (|A ∩ M| − 0.8)
group QUINOLONE ANTIBACTERIALS (J01M - inferred), JCmod(A, M) =
|A ∪ M|
and the therapeutic group ANTIBACTERIALS FOR
SYSTEMIC USE (J01 - inferred). where A ∩ M represents the number of drugs common to A
Table 1. Asserted and inferred MeSH classes for the drug and M, and A ∪ M the total number of unique drugs in both
temafloxacin (C054745) with type of relationship to the classes.
drug and tree numbers in MeSH. The Jaccard coefficient measures the similarity between the
two classes, but does not reflect whether one class is includ-
Type Asserted Classes Inferred Classes ed in the other. Because of the difference in granularity be-
tween classes in ATC and MeSH, we introduce a simple
Anti-Bacterial Agents Anti-Infective Agents
PA (D000900) (D000890) metric for detecting whether the drugs that are not shared by
[D27.505.954.122.085] [D27.505.954.122] both classes are primarily in one of the two classes. This
“one-sidedness” coefficient is calculated as follows:
Quinolones
(D015363) 0, for a = 0 and m = 0
[D03.438.810.835]
|a-m| / a+m, otherwise.
Fluoroquinolones Quinolines
MH (D024841) (D011804) where a and m are the number of drugs specific to the ATC
D03.438.810.835.322 [D03.438.810] class and the MeSH class, respectively. Thus, a “one-
Heterocyclic Compounds, 2-Ring sidedness” coefficient close to 0 indicates that the drugs that
(D006574) are not shared by the two classes are evenly distributed be-
[D03.438]
tween the ATC and MeSH class. In contrast, a coefficient
close to 1 indicates that only one of the classes contains
Inferring class membership in MeSH. We associated each most of the drugs that are not shared by the other.
RxNorm ingredient (IN or PIN) with its corresponding Thresholds. In order to select the best equivalent or inclu-
MeSH supplementary concept record (SCR) or main head- sion mappings between ATC and MeSH, we characterize
ing (MH). In turn, we associated these drugs with their as- each pair of ATC and MeSH classes with respect to Jaccard
serted classes. For an SCR, we considered its pharmacologi- similarity and one-sidedness. Low one-sidedness indicates
cal actions, as well as the MeSH heading(s) mapped to. For equivalence and high one-sidedness indicates inclusion.
a MH, we considered its pharmacological actions, as well as High Jaccard similarity indicates strong overlap between the
its direct ancestors. These constitute the asserted classes. two classes. Based on preliminary analysis, we selected of a
We inferred membership between the drugs and higher-level threshold of 0.5 for the one-sidedness metric. Similarly, we
descriptors in the MeSH hierarchy. For example, as shown selected of a threshold of 0.5 and 0.25 for Jaccard similarity
in Table 1, the SCR temafloxacin has Anti-Bacterial Agents for equivalence (low one-sidedness) and inclusion (high
as pharmacological action and Fluoroquinolones as main one-sidedness), respectively. The lower threshold for Jac-
heading mapped to. Form these asserted classes, we infer card similarity for inclusion was determined empirically. As
membership to Anti-Infective Agents (from Anti-Bacterial shown in Table 2, each pair of ATC and MeSH classes is
Agents) and to Quinolones, Quinolines, and Heterocyclic characterized as an equivalence mapping (EQ+), an inclu-
Compounds, 2-Ring (from Fluoroquinolones). sion mapping (IN+), or not a mapping (EQ- and IN-).
Measure for aligning ATC and MeSH classes. Based on
the asserted and inferred class membership of drugs in ATC 5 RESULTS
and MeSH we conducted a pairwise comparison of all ATC
against all MeSH classes. For each pair of ATC class (A)
5.1 Lexical alignment
and MeSH class (M), we computed the Jaccard coefficient. For the 1,077 eligible ATC groups of level 2-4, we were
In order to reduce the similarity of pairs of classes with a able to retrieve 226 mappings to descriptors from the Chem-
small number of shared members, we used a modified ver- icals and Drugs [D] tree in MeSH. We have 18 mappings
sion of the Jaccard coefficient, JCmod, as suggested in for therapeutic classes (2nd level), 42 for pharmacological
(Isaac, et al., 2007), classes (3rd level), and 161 for chemical classes (4th level).
We ignored mappings for the broad anatomical classes (1st
level). Of the 221 mappings, 96 are to pharmacological ac-
4
Aligning Pharmacologic Classes Between MeSH and ATC
tions (functional perspective) in MeSH, whereas 125 are to Table 3. Comparison between lexical and instance-based
other descriptors at various levels of the MeSH hierarchy alignment.
(structural perspective).
Instance-based
5.2 Instance-based alignment Yes No No assoc. Total
Of the 1,077 eligible ATC groups, 874 (81%) could be as- Yes 61 19 141 221
sociated with at least one descriptor or pharmacological Lexical
No 282 571 62 915
action in MeSH. We identified a total of 933 associations Total 343 590 203 1136
for the 874 ATC groups (multiple associations per ATC
group possible). As shown in Table 2, based on the one- 6 DISCUSSION
sidedness metric, we characterized 323 associations as
equivalence and 610 as inclusion. Of the 323 equivalence 6.1 Examples and failure analysis
associations, 113 (35%) exhibit high Jaccard similarity and True positive for equivalent instance-based mappings. We
are selected as equivalence mappings (EQ+). Of the 610 identify an equivalence mapping between the 4th-level ATC
inclusion associations, 230 (38%) exhibit high Jaccard simi- group Fluoroquinolones (J01MA) and the MeSH descriptor
larity and are selected as inclusion mappings (IN+). The Fluoroquinolones (D024841). The two classes share 14
other associations (EQ- and IN-) are not deemed strong drugs. The ATC group has one extra drug (moxifloxacin),
enough to denote mappings. In summary, we were able to and the MeSH descriptor has 2 (flumequine and besifloxa-
characterize as a mapping (EQ+ and IN+) 343 (37%) of the cin). Jaccard similarity is high (0.82) and the one-sidedness
associations between ATC and MeSH classes. It should be score is low (0.33), because the 3 drugs that are not in
mentioned that we were not able to obtain mappings to common are not all on the same side. This mapping is also
MeSH classes for 203 ATC classes, because they only con- identified by the lexical technique (exact match).
tain drug instances that could not be mapped to drugs in True positive for inclusion instance-based mappings. We
MeSH. identify an inclusion mapping between the 4th-level ATC
group Fluoroquinolones (S01AE) and the MeSH descriptor
Table 2. Characterization of the associations between ATC Fluoroquinolones (D024841). Although the two classes are
and MeSH classes based on Jaccard similarity and score for seemingly identical, our mapping is identified as an inclu-
one-sidedness. The numbers in grey fields indicate the asso- sion, with 7 drugs in common, 1 drug specific to the ATC
ciations that are not strong enough to denote mappings. class and 9 drugs specific to the MeSH class. In fact, the
ATC class is not the same general class for anti-infective
One-sidedness agents as in the example above (J01MA), but rather the spe-
≥ .5 < .5 Total cific class of fluoroquinolones for ophthalmic use (S01AE).
≥ .5 EQ+ (113) 343 The fluoroquinolones used for eye disorders are a subset of
IN+ all fluoroquinolones and the ATC class S01AE is appropri-
[.25-.5[ (230) EQ- ately characterized as being included in the MeSH class for
Jaccard 590
< .25 IN- (380) (210) fluoroquinolones. This example also illustrates a false posi-
tive for the lexical mapping, since it is generally assumed
Total 610 323 933 that lexical mappings are equivalence mappings.
5.3 Comparison between lexical and instance- False negative for equivalent instance-based mappings.
based alignment Many ATC and MeSH classes share only one or very few
As illustrated in Table 3, from the 221 lexical mappings drugs, making it difficult to assess equivalence or inclusion.
between ATC and MeSH classes, we could confirm 61 with For example, the 4th-level ATC group Silver compounds
our instance-based approach (30 as equivalence mappings, (D08AL) and the MeSH descriptor Silver Compounds
31 as inclusion mappings). For 19 of the lexical mappings (D018030) share only one drug (silver). The modified ver-
we found an association with low Jaccard similarity (IN- / sion of the Jaccard coefficient has a score of 0.45 in this
EQ -), and for 141 of the lexical mappings we did not find case, which is below our threshold of 0.5 for equivalence.
any association through the instance-based alignment (main- During this failure analysis, we discovered that some MeSH
ly due to the lack of any mapping for the drug instances in drugs did not have a pharmacological action assigned to
these classes). Finally, the instance-based approach pro- them as we expected. For example, while pyrantel is listed
duced 282 additional drug class mappings that were not as Antinematodal Agents, oxantel is not. We are investigat-
detected by the lexical approach, whereas 633 (571 + 62) ing whether the pharmacological action for this SCR should
ATC classes could neither be mapped by the lexical nor the be inferred from the descriptor to which it is mapped (Py-
instance-based approach. rantel in this case). Because of these missing pharmacologic
5
Winnenburg et al.
actions, the 3rd-level ATC group ANTINEMATODAL 6.3 Significance
AGENTS (P02C) fails to be mapped to the MeSH pharma- To our knowledge, our work is the first attempt to align
cological action Antinematodal Agents (D000969), the Jac- pharmacologic classes with instance-based techniques, dis-
card similarity being just below the threshold (0.49). tinguishing between equivalence and inclusion relations, as
Discrepancy between lexical and instance-based alignment well as the first application of alignment between pharmaco-
(missed lexical mapping). Despite the use of UMLS synon- logic classes in ATC and MeSH. Our instance-based ap-
ymy and normalization, the lexical alignment fails to identi- proach to aligning pharmacologic classes has yielded 343
fy a mapping between the 3rd-level ATC group mappings, and has the prospect of effectively supporting the
POTASSIUM-SPARING AGENTS (C03D) and the MeSH creation of a mapping of pharmacologic classes between
pharmacological action Diuretics, Potassium Sparing ATC and MeSH. This exploratory investigation needs to be
(D062865). In contrast, the instance-based alignment identi- evaluated in order to adapt the thresholds for similarity.
fies an equivalence mapping with very high Jaccard similar-
ity (0.99). This finding is consistent with the conclusions of ACKNOWLEDGEMENTS
(Merabti, et al., 2011). This work was supported by the Intramural Research Pro-
Discrepancy between lexical and instance-based alignment gram of the NIH, National Library of Medicine and by the
(missed instance-based mapping). We have identified sev- Center for Drug Evaluation and Research of the Food and
eral causes for discrepancies between lexical and instance- Drug Administration. The authors want to thank Rave
based alignments. As mentioned earlier, some ATC classes Harpaz and Anna Ripple for useful discussions.
only contain drugs that cannot be mapped to MeSH through DISCLAIMER
RxNorm, which we used to bridge between the two. Some-
The findings and conclusions expressed in this report are
times, the best instance-based mapping is to another class
those of the authors and do not necessarily represent the
than the class found by the lexical technique. Finally, some
views of the FDA.
drugs entities and biologicals (e.g., vaccines) are less well
standardized than common drugs. For this reason, the in- REFERENCES
stance-based alignment is unable to map these classes, when Ashburner, M., et al. (2000) Gene ontology: tool for the unification of
simple lexical techniques can. biology. The Gene Ontology Consortium, Nat Genet, 25, 25-29.
Anatomical Therapeutic Chemical (ATC) classification:
6.2 Limitations and future work http://www.whocc.no/atc/
Avillach, P., et al. (2013) Design and validation of an automated method to
This exploratory investigation has several limitations, which detect known adverse drug reactions in MEDLINE: a contribution
we plan to address in future work. from the EU-ADR project, J Am Med Inform Assoc, 20, 446-452.
Bodenreider, O. and Taft, L.M. (2013) A mapping of RxNorm to the
Evaluation. This exploratory investigation focuses primari- ATC/DDD Index helps analyze US prescription lists, AMIA Annu
ly on the methodology and feasibility of the alignment, and Symp Proc, (submitted).
does not include a formal evaluation. Since ATC and MeSH Euzenat, J. and Shvaiko, P. (2007) Ontology matching. Springer, New
York.
pharmacological actions are being integrated into RxNorm, Isaac, A., et al. (2007) An empirical study of instance-based ontology
we will use the alignment created by RxNorm experts as the matching. In Aberer, K., et al. (eds), Proceedings of the 6th
gold standard to evaluate our methods. international The semantic web and 2nd Asian conference on Asian
semantic web conference (ISWC'07/ASWC'07). Springer-Verlag, pp.
Perspective. Our perspective in this investigation is ATC- 253-266.
centric, because we consider the best MeSH mapping for Kirsten, T., Thor, A. and Rahm, E. (2007) Instance-based matching of large
life science ontologies In Cohen-Boulakia, S. and Tannen, V. (eds),
each ATC class, but not the best ATC mapping for each Data Integration in the Life Sciences: 4th International Workshop,
MeSH class. One future goal is to explore both directions DILS 2007, Philadelphia, PA, USA. Springer, pp. 172-187.
using the same methodology. McCray, A.T., Srinivasan, S. and Browne, A.C. (1994) Lexical methods for
managing variation in biomedical terminologies, Proc Annu Symp
Bias towards equivalence mappings. Because we restrict Comput Appl Med Care, 235-239.
our exploration to the MeSH class with the best Jaccard Merabti, T., et al. (2011) Mapping the ATC classification to the UMLS
similarity for each ATC class (which we subsequently cate- metathesaurus: some pragmatic applications, Stud Health Technol
gorize as equivalence or inclusion), and because of the dif- Inform, 166, 206-213.
Medical Subject Headings (MeSH): http://www.nlm.nih.gov/mesh/
ferential threshold for Jaccard similarity between equiva- RxNorm: http://www.nlm.nih.gov/research/umls/rxnorm/
lence (0.5) and inclusion mappings (0.25), we potentially RxNorm API: http://rxnavdev.nlm.nih.gov/RxNormAPI.html
fail to consider a good inclusion mapping (e.g., with a simi- Unified Medical Language System (UMLS): https://uts.nlm.nih.gov/
larity score of 0.39 [> 0.25]), when the best MeSH class is a Winnenburg, R. and Bodenreider, O. (2012) Mapping drug entities between
the European and American standards, ATC and RxNorm, Poster
bad equivalent mapping (e.g., with a similarity score of 0.41 Proceedings of the Eighth International Conference on Data
[< 0.5]). Integration in the Life Sciences (DILS 2012), 22.
6