=Paper=
{{Paper
|id=None
|storemode=property
|title=Standardized Drug and Pharmacological Class Network Construction
|pdfUrl=https://ceur-ws.org/Vol-1061/Paper1_vdos2013.pdf
|volume=Vol-1061
|dblpUrl=https://dblp.org/rec/conf/icbo/ZhuJWC13
}}
==Standardized Drug and Pharmacological Class Network Construction==
Standardized Drug and Pharmacological Class
Network Construction
Qian Zhu1 *, Guoqian Jiang 1, Liwei Wang2, Christopher G. Chute1
Department of Health Sciences Research,
1
Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
2
School of Public Health, Jilin University, Changchun, Jilin, China
ABSTRACT mation from diverse resources in a standard and integrated
Dozens of drug terminologies and resources capture the drug and/or manner.
drug class information, ranging from their coverage and adequacy of repre-
sentation. No transformative ways are available to link them together in a ATC and NDF-RT are the proposed sources of drug classi-
standard way, which hinders data integration and data representation for fication information. In the present study, we developed an
drug-related clinical and translational studies. In this paper, we introduce approach to map drug and drug class entities from ATC and
our preliminary work for building a standardized drug and drug class net-
work that integrates multiple drug terminological resources, using Anatom- NDF-RT to UMLS (Unified Medical Language System) [4]
ical Therapeutic Chemical (ATC) and National Drug File Reference Ter- and generated these mappings as a drug network backbone.
minology (NDF-RT) as network backbone, and expanding with RxNorm Furthermore, we extended such network with RxNorm [5]
and Structured Product Label (SPL). In total, the network consists of
39,728 drugs and drug classes. Meanwhile, we calculated and compared
and Structured Product Labeling (SPL) [6] integration, ben-
structure similarity for each drug / drug class pair from ATC and NDF-RT, efited from the broad drug relevant knowledge provided by
and analysed constructed drug class network from chemical structure per- these two resources. RxNorm provides links among differ-
spective.
ent vocabularies, e.g. NDF-RT. SPL contains full drug in-
teraction information, such as drug and drug interaction, and
1 INTRODUCTION adverse drug event, etc., which has been explored and im-
Drug classes are group names for drugs that have similar plemented by investigators and relevant applications have
activities or are used for a same type of disease and disor- been developed, such as LinkedSPLs [7], ADEPedia [8].
der. There are different ways to classify drugs. One way is Additionally, to extend and compare the drug classes de-
to group drugs based on their therapeutic use or class (e.g., fined by ATC and NDF-RT from chemical structure point
antiarrhythmic or diuretic drugs) as used by Anatomical of view, we introduced chemical structure similarity with
Therapeutic Chemical (ATC) [1]. Another way is to group the assumption that similar molecules have similar activi-
drugs using their dominant mechanism of action as used by ties.
National Drug File Reference Terminology (NDF-RT) [2]. The paper is organized in several sections. We introduce the
However, drug classes defined by different systems are not background knowledge about the resources and tools used
compatible. It is worth to compare and integrate them in a in material section; in the methods section, we introduce the
universal fashion in order to support clinical related studies workflow details for network construction; then followed by
better. For example, Mougin, et al. [3] conducted a study for discussion and conclusion sections.
comparing drug classes between ATC and NDF-RT focus-
ing on the relations between drugs and pharmacological 2 MATERIALS AND METHODS
classes (i.e., drug-class membership relations), which will NDF-RT is a well-known drug terminological resource, and
facilitate the integration of these two resources. snapshot of NDF-RT was downloaded as of Nov. 8, 2012.
Drug terminologies define drug entities as well as relevant In ATC classification system, drugs are categorized into
properties and relationships with pharmacological classes. different groups at five different levels according to the or-
Drug terminologies are usually developed and maintained gan or system on which they act and/or their therapeutic and
by different institutions using site-specific drug coding sys- chemical characteristics [9]. ATC with a released version on
tems. Heterogeneous drug representations across different January 2012 was used in this study. RxNorm provides
systems make it difficult to navigate diverse drug resources. normalized names for clinical drugs and links them to sev-
The lack of a transformative way to link heterogeneous drug eral drug vocabularies differentiating by “SAB” label. For
resources hinders data integration and data representation example, “SAB=MTHSPL” indicates the source from SPL
for drug-related clinical and translational studies. To over- and “SAB=NDFRT” from NDF-RT. Two files are used in
come this obstacle, we proposed to represent drug infor- this study: 1) RXNCONSO.RRF, including all connections
with source vocabularies. 2) RXNREL.RRF including rela-
* To whom correspondence should be addressed: zhu.qian@mayo.edu tionships among concepts. RxNorm used in this study was
1
Zhu, et al.
the version of Oct. 2012. SPL contains structured content of structure similarity among the drug pairs from ATC and
labeling (all text, tables and figures), along with additional NDF-RT, and grouped them using the score of structure
machine readable information. The mappings between SPL similarity as Tanimoto Coefficient, i.e., similarity between
and RxNorm used in this study are extracted from RxNorm these pairs of descriptors [16]. The cutoff value of the struc-
RXNCONSO files with SAB = MTHSPL. ture similarity is set as the score greater than 0.85, as it ex-
In this paper, we introduce a drug and drug class network by hibits similar biological activity between the two molecules.
utilizing multiple drug terminological resources: ATC, We first converted NDF-RT drug name and ATC name to
NDF-RT, RxNorm, and SPL. ATC and NDF-RT are used as SIMILES (Simplified molecular-input line-entry system)
the network backbone, from which we integrated RxNorm [17] as chemical representation by invoking PubChem en-
and SPL as extension. Meanwhile, we calculated structure trez web service [18] and NCI resolver [19] REST API.
similarity for drug pairs from ATC and NDF-RT, and clus- Then we translated SMILES to chemical fingerprint and
tered them by structural similarity. The details of each step calculated Tanimoto similarity by using the aforementioned
conducted in this study are described in the following sec- CDK functions.
tions.
2.3 Integrating RxNorm and SPL mappings
2.1 Mapping NDF-RT with ATC Mappings among RxNorm, SPL and NDF-RT are provided
To map NDF-RT with ATC via UMLS, we translated NUI, by RxNorm and available in the RxNorm RXNCONSO
NDF-RT Numerical Unique Identifier, and ATC name to table. Two steps were performed to retrieve these mappings.
UMLS CUI, UMLS concept unique identifier. First, we obtained concepts labeled as “SAB=NDFRT” and
“SAB=RXNORM”, denoted as RxNorm and NDF-RT
3.1.1 ATC mapping to UMLS
mappings. Then, we searched for the concepts with
ATC is not well integrated with other drug terminologies “SAB=MTHSPL” label from the concepts identified in the
(e.g., NDF-RT), as it uses its own coding system to code first step. Then the final list of concepts is the common con-
drug entities. To map ATC with NDF-RT and present the cepts across the three resources.
drug network transformatively by using standard representa- The network has been expanded from NDF-RT nodes that
tion, UMLS, we employed NCBO annotator [10] to seman- have mappings with RxNorm and SPL. We extracted SPL
tically annotate each ATC name. Among more than 200 identifier (setId) from RXNREL table and saved for future
ontologies from UMLS Metathesaurus and NCBO BioPortal SPL relevant information, LinkedSPL integration.
[ 11 ], RxNorm and NDF-RT have higher priority in this In addition, we performed a case study to demonstrate the
study. To avoid unnecessary annotations by non-drug rele- usefulness of the drug and drug class network.
vant ontologies, we limited UMLS semantic types [ 12 ]
within “Chemicals & Drugs” semantic group [13]. We ex- 3 RESULTS
tracted ontology id and concept id, which are two mandato- There are total 5,717 individual entities, which correspond
ry input parameters to invoke NCBO BioPortal REST API to 4,483 distinct ATC names, i.e. one drug can be catego-
[14] for searching UMLS CUI, from the annotation results. rized into multiple therapeutic classes (more details de-
3.1.2 NDF-RT mapping to RxNorm and UMLS scribed in the Discussion section).
Of 48,266 NDF-RT concepts, 34,011 concepts were used in
NDF-RT concepts are organized into different categories this study, consisting of 15,857 VA Products, 486 VA clas-
with corresponding category labels. For example, ses, 9,960 Chemical/Ingredients, 7,184 Generic Ingredient
“N0000179008, 1,1,1-trichloroethane, [Chemi- Combinations, and 524 EPC. The child and parent relation-
cal/Ingredient]” and “N0000175641, Autonomic Ganglionic ships among these NDF-RT concepts are retrieved and
Blocker, [EPC]” are chemical ingredient and EPC class re- stored from RxNorm RXNREL table via “CHD” (concept 1
spectively. In this study, we retrieved the concepts that are is a child of concept 2) and “PAR” (concept 1 is a parent of
labeled as VA class, VA product, EPC, Chemical ingredient concept 2) labels.
and generic ingredient combination. RxNorm, SPL and NDF-RT mappings were extracted from
SQL query was executed to search RxCUIs (RxNorm Con- two RxNorm files: RXNCONSO and RXNREL, which
cept Unique Identifier) from RxNorm RXNCONSO table were loaded into MySQL database.
that was pre-loaded into our local MySQL database for
NUIs. We retrieved UMLS CUI by invoking NLM RxNav 3.1 Results for ATC and NDF-RT mappings
RESTful API [15] with each NUI as an input parameter. In order to build drug and drug class network with ATC and
NDF-RT as backbone, first of all, we mapped ATC entities
2.2 Calculating structure similarity with NDF-RT concepts via UMLS, four steps involved.
To analyze and expand the drug and drug class network
from chemical structure perspective, we calculated the 4.1.1 ATC Annotated by NCBO
2
Standardized Drug and Pharmacological Class Network Construction
3,607 ATC entities including 3,152 drugs and 455 drug 3.2 Results for structural similarity calculation
classes were mapped to UMLS CUIs by two ontologies, SMILES have been retrieved for all drugs from ATC and
RxNorm and NDF-RT from NCBO BioPortal. Of these
NDF-RT via PubChem Entrez web API and NCI Resolver
3607 ATC mappings, 2180 ATC entities were exactly
web API. 2,618 ATC entities have gotten SMILES from
matched with the preferred names from RxNorm and NDF-
NCI, 3,471 entries retrieved from PubChem. Combining
RT. 866 ATC entities including 211 drug classes and 655
drugs were mapped to other ontologies available from NCI and PubChem searching results, total 3,487 ATC en-
NCBO. There are 1,244 ATC entities (21.8%) including 657 tries got SMILES, and 9,132 unique NDF-RT concepts got
drugs and 587 drug classes failed to map to UMLS due to SMILES.
no annotations generated accordingly. We attempted to map We calculated the Tanimoto coefficient as structure similari-
these failed ATC names with RxNorm directly by invoking ty for each pair of concepts from ATC and NDF-RT sepa-
NLM RxNav RESTful API [20] with ATC names as input rately by converting SMILES to fingerprint. Then we got
parameter, but none of them got mapping results. The fail- 8,513 pairs from ATC and 69,882 pairs from NDF-RT with
ure reasons are discussed in the discussion section further. Tanimoto coefficient greater than 0.85, and integrated them
into the drug and drug class network.
4.1.2 NCBO annotation evaluation
The annotations were automated programmatically using 3.3 Results for NDF-RT, RxNorm and SPL map-
NCBO Annotator Web Services API. We manually evaluat- ping
ed the annotation results. Of the 4,473 annotations with We integrated RxNorm and SPL mappings with NDF-RT.
NDF-RT and RxNorm, 2,401 exact mappings were not fur- The mappings between RXNORM, NDF-RT and SPL re-
ther evaluated. The authors (QZ, LW) manually reviewed sulted in 5,838 unique RxNorm concepts with 36,408 NDF-
the rest of annotations (2,072 in total). As the evaluation RT concepts and 41,188 SPL labels. The mappings mostly
results, 88.7% is correct, 10.3% is partial mappings, and fall into two main categories according to term types de-
1.0% is incorrect. The precision was calculated as 99.5%, fined by RxNorm, 3,056 are Semantic Clinical Drugs and
recall as 78.2% and F-measure as 87.4%, in which we 1,543 are Ingredients.
counted exact mappings, partial mappings and correct map- It is worthy to note that one RxNorm concept may be
pings (4,453 in total) as true positive, 1,244 failed mappings mapped to multiple NDF-RT and/or SPL concepts, for ex-
as false negative and 20 incorrect mappings as false posi- ample, RxCUI “74” mapped to 3 NUIs in NDF-RT includ-
tive. ing N0000006481, N0000147349, N0000006481 and 11
4.1.3 Mapping NDF-RT to RxNorm and UMLS set_ids in MTHSPL such as 0d65128b-8eb7-440b-870a-
NDF-RT and RxNorm mappings exist in the RXNCONSO 7e3be18152b3,1e6d6cd5-ab14-4258-a0fe-5f6a3cae437f.
table with “SAB=NDFRT” label. Consequently, RxCUI
corresponding to each NDF-RT concept can be retrieved 4 DISCUSSION
from these mappings directly. In this study, we successfully built a drug and drug class
NDF-RT provides UMLS mappings. Hence, to retrieve network with 39,728 concepts from ATC and NDF-RT. All
UMLS for each NDF-RT concept, we called NLM NDF-RT concepts were mapped to UMLS and labeled as UMLS
RESTful API [9]. The searching results are shown in Table CUIs accordingly. We also integrated RxNorm and SPL
1. 99.2% NDF-RT concepts have been mapped to UMLS. mappings, and extended the network with structure similari-
ty calculation.
NDF-RT Concepts NUI UMLS CUI
Chemical/Ingredient (9,960) 9,934 9,932 4.1 ATC to UMLS mapping
VA Class (486) 486 483 In total, 77.9% ATC terms have been mapped to UMLS.
VA Product (15,857) 15,695 13,263 Comparing to 68.7% mapping results conducted by Merabti
EPC (524) 480 478 et al [21], our study shows the improvement of mappings
Generic ingredient combination 7,139 6,801 from ATC to UMLS by leveraging NCBO annotator. How-
(7,184) ever, 22.1% ATC terms failed to be mapped due to several
Total (34,011) 33,734 30,957 reasons as follows, 1) Many of the ATC terms are combina-
Table 1. UMLS CUI retrieval by RxNav NDF-RT API tions of multiple concepts, such as “calcium acetate and
magnesium carbonate”, “combinations of sulfonamides and
4.1.4 ATC and NDF-RT mapping
trimethoprim, including derivatives”; 2) The exclusions are
In total, 3,850 distinct mappings between ATC and NDF-
embedded in the ATC names, such as “platelet aggregation
RT were generated, including 2,015 chemical/ingredients,
inhibitors excluding heparin”, “nutrients without phenylala-
1,826 Generic Ingredient Combinations and 1 VA class. It
nine; 3) Non-standard representation is used by ATC though
includes distinct 2,226 ATC entities, covering 99 drug clas-
we corrected and expanded some abbreviations occurring in
ses, and 2,127 individual drugs.
3
Zhu, et al.
ATC name. For example, “DIGESTIVES, INCL. ACKNOWLEDGMENTS
ENZYMES” was corrected to “DIGESTIVES, This work was supported by the Pharmacogenomic Re-
INCLUDING ENZYMES”; 4) Non-drug terms are used, search Network (NIH/NIGMS-U19 GM61388) and the
especially for drug classes in ATC, such as “VARIOUS”, SHARP Area 4: Secondary Use of EHR Data
“SENSORY ORGANS”. Above obstacles were the main (90TR000201).
reasons for mapping failure. In the future study, we will
explore MMTx program that reported by Mougin et al. [3], REFERENCES
and more NLP (Nature Language Processing) algorithms to
parse ATC names for improving the mapping performance [ 1 ] ATC: http://www.who.int/classifications/atcddd/en/. Accessed by
between the ATC and the UMLS. Apr.11.2013.
[2] NDF-RT:
4.2 Benefits from structure similarity integration http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NDFRT/.
Structure similarity calculation applied in this study enables Accessed by Apr.11.2013.
[3] Mougin, F., Burgun, A., and Bodenreider, O. Comparing Drug-Class
connections among the drug nodes sharing common similar Membership in ATC and NDF-RT . Proceedings of the 2nd ACM SIGHIT
chemical substructures. Beside the benefit shown in the case International Health Informatics Symposium, 2012:437-443.
study, this integration also provides relevant clues for guid- [4] Bodenreider, O. The Unified Medical Language System (UMLS): inte-
grating biomedical terminology. Nucleic Acids Res. 2004, 32, 267–270
ing clinical decision support system from the structure per- [ 5 ] RxNorm: www.nlm.nih.gov/research/umls/rxnorm. Accessed by
spective as it offers a full profile of therapeutics for individ- Apr.11.2013.
ual drugs. ATC classification system categorizes drugs ac- [6] Structured Product Labeling:
http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/
cording to its therapeutic classes; hence, one ATC drug can default.htm. Accessed by Apr.11.2013.
be grouped into multiple categories due to its diverse thera- [7] Hassanzadeh O, Zhu Q, Freimuth R, Boyce R, Extending the "Web of
peutic functionalities. For instance, “Thonzylamine” is an Drug Identity" with Knowledge Extracted from United States Product
antihistamine and anticholinergic used as an antipruritic and Labels, submitted to AMIA Summit on Clinical Research Informatics,
2013
is grouped into two categories: “antiallergic agents” and [8] Jiang G, Solbrig H. R, Chute C.G. ADEpedia: a scalable and standard-
“antihistamines for topical use” within the ATC hierarchy. ized knowledge base of Adverse Drug Events using semantic web technol-
The corresponding two ATC entities (R01AC06 and ogy. AMIA Annu Symp Proc. 2011:607-16.
[9 ]http://en.wikipedia.org/wiki/Anatomical_Therapeutic_Chemical_Classif
D04AA01) for “Thonzylamine” in two separate classes (“R” ication_System. Accessed by Apr.11.2013.
and “D”) are connected based on similarity score that is [10] Jonquet C., Shah N., Musen M. The Open Biomedical Annotator.
equal to 1. Thus, the entities within these two categories are AMIA Summit on Translational Bioinformatics; 2009: 56–60. The NCBO
Annotator web service: http://www.bioontology.org/annotator-service.
connected, and physicians would be able to utilize such Accessed by Apr.11.2013.
knowledge for Thonzylamine for their clinical decision [11] Noy, N., Shah, N., Dai, B., Dorf, M., Gri_th, N., Jonquet, C., Mon-
from both therapeutics and structure point of view. tegut, M., Rubin, D., Youn, C., Musen, M.: Bioportal: A web repository for
biomedical ontologies and data resources. In: Demo session at 7th Interna-
4.3 Future work tional Semantic Web Conference (ISWC 2008)
[12] Semantic Type:
Drug entity mapping algorithm will be modified to enable http://www.nlm.nih.gov/research/umls/META3_current_semantic_types.ht
more connections detected; more human review will be ex- ml. Accessed by Apr.11.2013
pected to improve the accuracy of the mappings. Mean- [13] Bodenreider O, McCray AT Exploring semantic groups through visual
approaches. Journal of Biomedical Informatics 2003; 36(6):414-432.
while, we will seek possible collaborations with external [14] BioPortal REST services:
sites such as the NLM for improving such mapping algo- http://www.bioontology.org/wiki/index.php/NCBO_REST_services. Ac-
rithm development. We will integrate more drug related cessed by Apr.11.2013.
[15] NDF-RT RESTful API:
resources, such as Drugbank and PharmGKB, and drug in- http://rxnav.nlm.nih.gov/NdfrtRestAPI.html#label:r24. Accessed by
teraction data, drug and adverse event data as shown in Fig- Apr.11.2013.
ure 1. The entire data set generated in this project will be [16] Holliday JD, Hu CY, Willett P, Grouping of coefficients for the calcu-
lation of inter-molecular similarity and dissimilarity using 2D fragment bit-
released to public once the proposed action items accom- strings. Comb Chem High Throughput Screen, 2002, 5(2):155-66.
plished. [17] SMILES: http://en.wikipedia.org/wiki/Simplified_molecular-
input_line-entry_system. Accessed by Apr.11.2013.
5 CONCLUSION [18] PubChem Entrez: http://www.ncbi.nlm.nih.gov/books/NBK25500/.
Accessed by Apr.11.2013.
We successfully integrated NDF-RT, ATC, RxNorm and [19] NCI resolver: http://cactus.nci.nih.gov/chemical/structure. Accessed
SPL and built a drug and drug class network using standard- by Apr.11.2013.
[20] RxNorm RESTful API:
ized identifier for representing drug and drug class entities. http://rxnav.nlm.nih.gov/RxNormRestAPI.html. Accessed by Apr.11.2013.
In addition, the network was expanded from chemical struc- [21] Merabti et al, 2011, Stud Health Technol Inform. 2011;166:206-13
ture perspective by similarity calculation. More other drug
terminological resources and drug interaction information
will be integrated in the future study.
4