Scaling and Querying a Semantically Rich, Electronic Healthcare Graph Hayden Freedmana, Mark A. Millera, Heather Williamsa, Christian J. Stoeckert Jr.ab a Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, United States b Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 415 Curie Boulevard, Philadelphia, PA, 19104, United States Abstract. The Open Biomedical Ontologies Foundry (OBO) provides a set of ontologies that can be used to create a semantically rich clinical data model. However, concerns exist regarding the scalability of a pipeline to load clinical data into such a model using Resource Description Framework (RDF). In order to investigate this further, we describe our development of a pipeline for transforming clinical patient data to conform with a model designed using OBO Foundry ontologies, and discuss the runtime and disk space requirements. In order to determine the efficacy of our approach at various throughput levels, we used the synthetic patient data generation service Synthea to generate four patient cohort models of different sizes, the largest containing information about one million patients, and ran each through the pipeline. We also discuss the development of an exemplar question to simulate the type of request we might receive from researchers, and its implementation in the SPARQL query language. We report the results of executing the exemplar question query against each patient cohort model, and conclude that our approach produces a knowledge base which can be generated and queried roughly linearly, and handle requests against large clinical data sets in reasonable time. Keywords. clinical data, biomedical ontologies, OBO Foundry, semantic richness, common data model, Resource Description Framework 1. Introduction In recent years, ontologies have become popular for enriching and standardizing clinical data.[1] One particular advantage of this approach is that, while the relationships between data elements may be unclear to users of semantically shallow schemas, they become explicit when the same knowledge is manifest as instances of a well-designed ontology.[2] Specifically, members of the Open Biomedical and Biological Ontologies Foundry community (OBO)[3] develop ontologies following guidelines that foster consistency in the ways that biomedical data from heterogeneous sources can be represented and queried.[4] Additional benefits can be reaped by using well-designed ontologies represented with the Resource Description Framework (RDF) in a triplestore, a type of graph database. This technique facilitates searches involving relationships between classes in ways which may be harder to implement in relational systems. Specifically, searches which involve traversing a hierarchy of concepts some variable number of steps away from a starting node can take advantage of the convenient syntax of the Property Paths Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). feature in SPARQL, the query language for RDF. It has been shown that the equivalent of the Property Paths feature can be implemented for relational databases using recursive SQL; however, the syntax is more complex and may involve highly nested clauses that pose a challenge for relational optimizers.[5] In their paper submitted to the ICBO 2019 conference proceedings, Miller and Stoeckert discuss their pipeline for populating a RDF repository with synthetic Electronic Health Record (EHR) data, using a clinical data model constructed with classes from OBO Foundry ontologies. They incorporated information about roughly 1,000 synthetic patients, and implemented an exemplar question in order to compare the experience of querying both the original relational database and the RDF repository for the same information. Their results demonstrated the viability and advantages of modeling clinical data in a triplestore using a semantically rich model.[6] However, there are potential challenges involved with scaling such a pipeline to hundreds of thousands or millions of patients. In an effort to determine how well a similar pipeline would scale in terms of time and space requirements for creating, loading, transforming, and querying large quantities of clinical data, we developed synthetic patient cohort models of 1,000, 10,000, 100,000, and 1,000,000 patients, and ran each of them through a pipeline similar to the one described in the original paper. In this paper, we will first describe our methods for building the synthetic data pipeline with the goal of enabling reproducibility by others, including an analysis of the time and space requirements for each step. Then, we will discuss the development of an exemplar question that incorporates traversals of ontological hierarchies to retrieve information about diagnoses and medications. Finally, we will present the results and time performance of our query against each of the four patient cohort models of various sizes. 2. Methods Our pipeline utilizes Synthea[7] to generate the synthetic clinical data, the ETL-Synthea service[8] to transform the data into Observational Medical Outcomes Partnership (OMOP)[9] standards, the Carnival application[10] to generate concise, directly-mapped triples[11] from the OMOP database, and the Semantic Engine[12] to transform those triples into the semantically rich model. Each of these steps is described in more detail below. Step 1: Generating 1,000,000 patient synthetic dataset with Synthea Synthea is an open source, synthetic patient generation service. One of its goals is to enable research and development with clinical data that would otherwise be impossible if working with real patient data. The service provides realistic data on patients and their associated health records. The Apache-licensed source code can be downloaded from the Synthea Github Repository and run using the command line. After cloning the Synthea codebase from GitHub, we modified the ‘exporter.csv.export` parameter in the Synthea properties file to `true` so that the program would output the synthetic patient data in CSV format. We then ran the service with the following command: ./run_synthea -p 1000000 -o false Pennsylvania Philadelphia The “-p” flag designates the requested population size for the dataset to be generated. The “-o” flag is not listed on the main Synthea documentation page, but it ensures that dead patients will be counted towards the total patient count. If this flag were set to “true” or not explicitly set, the resulting dataset would include one million living patients, and some additional number of dead ones. Specifying a city, in this case Philadelphia, instructs Synthea to generate data representative of people from that city. Synthea formats the output as a set of fourteen CSV files, each representative of a component of a typical EHR. Step 2: Loading native Synthea data and converting to OMOP Common Data Model Once Synthea had generated the set of fourteen CSV files, we configured a PostgreSQL instance and installed Observational Health Data Science and Informatics (OHDSI)’s open source tool ETL-Synthea as an R package to load the data into PostgreSQL. After connecting our PostgreSQL database to the R environment, the script was able to load each CSV file into its own eponymously named table in the PostgreSQL database. The next step in the ETL-Synthea script was to populate OMOP Common Data Model (CDM) tables based on the native Synthea tables within the PostgreSQL database. OMOP is an OHDSI initiative to harmonize heterogeneous clinical coding formats and allow for the use of standardized analytics tools. Our use of the OMOP format in this case was mainly as a staging area for the generation of RDF triples. However, we anticipate that maintaining a large synthetic dataset in OMOP format will provide utility for future performance testing of our applications. The ETL-Synthea documentation includes a detailed diagram of how Synthea tables are mapped to OMOP tables.[13] The final result of the pipeline is a PostgreSQL database with two schemas. Schema “native” contains the data in Synthea format, and schema “cdm_synthea10” contains the data in OMOP format. Step 3: Creation of concise RDF triples from OMOP database The Carnival application was developed by our collaborators at the University of Pennsylvania as a clinical data aggregation service. It utilizes an in-memory Neo4j property graph database to store data from heterogeneous sources in a core model. Although its use of an in-memory database prevents it from storing large quantities of data, we chose to use this service because it already included facilities to transform data from an OMOP-formatted database into concise RDF triples. In order to overcome memory issues, we used a batched approach to process chunks of data from the PostgreSQL OMOP database and export them to an instance of Ontotext GraphDB[14] as RDF triples. After each batch was processed, the Carnival graph was cleared. For this experiment, we ran Carnival four times, creating concise RDF models of patient cohorts of 1,000, 10,000, 100,000, and 1,000,000 patients, which were each stored in their own GraphDB repository. Carnival automatically partitions the data into named graphs containing a designated number of entities, so that future processing of the data can also occur in manageable chunks. For this instantiation we set the maximum number of entities per named graph to 100,000. Step 4: Transformation of concise RDF triples to semantically rich model Once the four concise cohort models were loaded as Carnival output into GraphDB repositories, our final step was running the Semantic Engine against each of those repositories. As previously reported[12], we developed the Semantic Engine in order to enrich a concise RDF data set by applying a semantically rich ontological model. A user of the Semantic Engine who wishes to apply it to a new data source must design a new custom configuration that specifies the shape of their incoming RDF data as well as the relevant relationships in the semantically enriched target model. In our case, we had previously created a Semantic Engine configuration to read concise RDF output from Carnival’s triples generation service. The Semantic Engine executes transformations by running dynamically generated SPARQL statements against a set of named graphs. In our case, each of these named graphs contained information about one of the following: patient demographics (gender identity, race, centrally registered identifier, etc.), measurements (height, weight, BMI, blood pressure, etc.), diagnoses (disease or disorder code, versioned source terminology, description string), or medications (medication code, versioned source terminology, description string). The Semantic Engine is capable of launching queries that run in parallel and process data in multiple named graphs simultaneously. To understand the performance advantages of using the Semantic Engine’s parallel processing capabilities, we ran the Semantic Engine twice against each cohort model, once with parallel processing disabled and once with parallel processing enabled (see end of Section 3.1). Step 5: Import and Process Ontologies The final step in preparing our four patient cohort models was importing additional ontologies from outside sources and processing them for ease of querying. Table 1 shows the ontologies we included in each cohort model repository, the domains they describe, and how we obtained them in RDF. After the ontologies were imported, we used SPARQL queries which implemented the Property Path feature to execute a transitive subclass materialization update in each ontology to explicitly state all subclass relationships, regardless of depth in the subclass hierarchy. For example, if someOntology:classC is a subclass of someOntology:classB, which is itself a subclass of someOntology:classA, after running our update we will see explicitly that someOntology:classC is a subclass of someOntology:classA. The motivation behind materializing these relationships was to pre-process some of the work involved in answering the exemplar question, rather than having to do the same work repeatedly at query time. GraphDB’s RDFS+ reasoning service would also have been a reasonable choice to apply these transitive relationships. Figure 1 shows a visual representation of each of the steps described in this section, demonstrating the flow of data through the pipeline from Synthea CSV files to semantically rich RDF triples. Table 1: Terminologies included in patient cohort model repositories Terminology Name Domain Described Method of Obtainment in RDF Mondo Disease Ontology diseases downloaded from Monarch Initiative (Mondo)[15] GitHub Snomed Clinical Terms clinical terms built from UMLS terminologies download (SNOMED)[16] Chemical Entities of Biological molecular entities of downloaded from European Bioinformatics Interest (ChEBI)[17] biological interest and Institute their roles Drug Ontology (DrOn)[18] clinical drugs downloaded from BioPortal RxNORM[19] clinical drugs downloaded from BioPortal Figure 1: Diagram showing each step of the pipeline for creating our synthetic patient graph, referencing the steps declared in the Methods section of this paper 3. Performance 3.1 Creating, Loading, and Transforming Table 2 shows a comparison of the concise RDF data to the semantically rich RDF data in terms of time and space requirements, and number of triples outputted. The concise RDF dataset grows roughly fourfold on disk when it is transformed to conform with the semantically rich model. Table 2: Comparison of time and space required for first four steps from the Methods section for the 1,000,000 synthetic patient cohort. The disk space metric measures the data in an uncompressed format and does not include indexes or imported ontologies. Step Time Consumed by Step Disk Space Number of Triples Consumed by in Outputted Data Dataset Carnival creation of concise 45 hours 29 minutes 28 89.68 GB 903,373,439 RDF triples seconds Transformation to 20 hours 33 minutes 0 348.61 GB 3,522,298,786 semantically rich model by seconds Semantic Engine (parallel processing enabled) The large growth from concise to semantically rich RDF triples is expected and a necessary component of representing a semantically rich model. In order to accurately represent the semantic context of the data, we instantiate classes even if they are not directly mapped to data elements, rather than only creating one-to-one mappings between data elements and ontology terms. This provides advantages in terms of comprehending the context of what each data element is about, but also takes up more disk space than a concise model. The semantic enrichment is implemented in a uniform and standardized way based on the configurations provided to the Semantic Engine. The expansion ratio of roughly 4:1 between concise and semantically rich RDF datasets should be consistent even as the size of the dataset changes, as long as the same Semantic Engine configuration is used. Figure 2 provides a visual reference for how an example concise RDF dataset might be semantically enriched by the Semantic Engine. Figure 2: Visual examples of a concise RDF dataset that is input to the Semantic Engine, and a semantically rich RDF dataset that is the output of the Semantic Engine based on the input. We have completed a comparison of Semantic Engine completion time for various cohort model sizes with and without query parallel processing enabled. When this feature is enabled, up to four RDF named graphs will be pre-processed simultaneously. Although GraphDB allows only a single transaction to be committed at a time, the simultaneous pre-processing leads to time consumption savings, which become more significant as the size of the dataset increases. For the million patient dataset, it cut the time to about half. The imported ontologies shown in Table 1 along with the transitive subclass materializations require about 5 gigabytes of additional disc space per patient repository, on top of the space requirements for the patient data shown in Table 2. As these ontologies are static resources, their footprint in terms of storage space is not dependent on the size of the patient cohort. These ontologies are included in each patient repository, which leads to some reproduction of the same data across multiple repositories. However, storing these ontologies in a single, shared location and using a federated SPARQL query to access it from the patient repositories caused a steep performance degradation of our exemplar question query. 3.2 Querying for Exemplar Question For this project, we added two additional clauses about patient diagnosis and medication history to the original exemplar question. These clauses both require hierarchical traversals of imported ontologies. Specifically, we wrote a SPARQL query to count the number of patients who: ● are African-American males born between 1960 and 1980 ● have an average systolic blood pressure within the normal range of 110 and 130 ● have been diagnosed with a form of hypertensive disorder ● have been prescribed a hypoglycemic agent Our goal in creating the above exemplar question was to include fields of interest to researchers (demographics, assays, medications, diagnoses) without prioritizing that the actual ranges or classes make clinical sense or will be useful for any particular research study. Additionally, once a query template is created, modifications to the ranges and classes can easily be made. The representativeness of this query for the purpose of research information retrieval was assessed to be adequate based on the fields included in past requests from researchers. 3.2.1 Diagnosis Traversal Hypertensive disorder is a disease entity represented in the Mondo Disease Ontology by code MONDO:0005044. It would be trivial to search for all subclasses of hypertensive disorder within Mondo. However, Synthea diagnoses mention diseases and disorders within the SNOMED ontology. Mondo includes links from its own classes to SNOMED classes, each of which is described using one or more of these predicates:  http://www.w3.org/2002/07/owl#equivalentClass  http://www.w3.org/2004/02/skos/core#exactMatch  http://www.w3.org/2004/02/skos/core#closeMatch  http://www.w3.org/2004/02/skos/core#broadMatch  http://www.w3.org/2004/02/skos/core#narrowMatch  http://www.geneontology.org/formats/oboInOwl#hasDbXRef It is not safe to assume that diagnoses from other data sources will always contain references to SNOMED, and therefore some alterations may need to be made to the query in order to traverse to other terminologies. For example, it is common for hospital diagnosis data to reference ICD codes, which are helpful for billing. We have established a pattern to discover ICD codes from a given Mondo term by looking for direct mappings from Mondo to ICD as well as paths from Mondo through SNOMED to ICD. Data sources referencing other terminologies would require additional labor for pathway discovery before this type of search could be implemented. 3.2.2 Medication Traversal A hypoglycemic agent is a drug role represented in the Chemical Entities of Biological Interest (ChEBI) ontology by code CHEBI:35526. However, Synthea prescriptions mention specific drugs and compounds from the RxNorm terminology, not from ChEBI. RxNorm does not include drug roles, and there are no direct references in ChEBI to RxNorm or vice versa. The DrON ontology does include references to ChEBI and to RxNorm, although we found that a significant number of the DrON mappings to RxNorm point to inactive terms. Bioportal[20] includes mappings from DrON to RxNorm which we found to be useful and mostly accurate. In order to map DrON classes to RxNorm classes, we used the Bioportal API to discover mappings between the two and materialized the relationships in our graph. We could then start the medications component of our exemplar question query by finding DrON terms related to a given ChEBI term or any of its direct or indirect subclasses. We then traverse all direct and indirect subclasses of each DrON term discovered, and search for BioPortal mappings to RxNorm from each of the DrON terms. We then have a set of RxNorm terms to be matched against the synthetic patient instance data. 3.2.3 Results Table 3 shows the results of running our exemplar question against each of the four synthetic patient repositories after transformation to the semantically rich model. The query was implemented with SPARQL and ran using the graphical web-based interface of GraphDB. These trials were executed on an instance of GraphDB Standard Edition 9.1.1, running on a server with the Centos 6.9 operating system and 64 GB of RAM. We ran the query against each patient repository three times, and took an average of the results. The repositories were not repopulated between trials, so the query ran against the same synthetic data each time. The “Time for Query Completion” column was populated using the value reported by GraphDB after rendering the results. We can see that the query completion times generally scale linearly with respect to the cohort model size, with a slight performance degradation when run on the largest repository. Table 3: Number of patients found and query runtimes for running the exemplar question with SPARQL against each of the four synthetic patient repositories after transformation to the semantically rich model. We observed that the query completion time is roughly linear with respect to the cohort model size. Patient Cohort Model Size Qualifying Patients Found Time for Query Completion 1,000 2 Average: 1.1 seconds 10,000 11 Average: 9.1 seconds 100,000 153 Average: 95 seconds 1,000,000 1,760 Average: 1,360 seconds 4. Discussion 4.1 Limitations Additional optimizations could improve the time or space performance of some steps of the pipeline. One limitation was Carnival’s use of an embedded graph database with access to limited amounts of memory. Carnival could generate the concise RDF triples more quickly if the application were hooked up to a dedicated Neo4j property graph database server with significant allocated memory, which would allow for a greater number of patients to be processed at a time and reduce the amount of Neo4j cleanup operations required. Another improvement could reduce redundancy between the four patient repositories and save storage space by storing the imported ontologies with transitive subclass relationships materialized in a separate repository and using federated queries launched from the patient repositories to access them. Federated queries are a type of SPARQL query that traverse multiple RDF repositories, with some performance degradation compared to traversing a single repository. We attempted to answer our exemplar question using federated queries but found that the query time to completion was not tolerable even on the smaller repositories. Finding a way to launch efficient federated queries in GraphDB would avoid duplicating the ontology data between each of the four repositories, and mean that transitive subclass materializations, mappings between ontologies, and any other changes would only have to be implemented once. 4.2 Using Real Data Although we used synthetic data for the purposes of this paper, we anticipate that our system will be useful for the storage and retrieval of real patient data as well. We have previously generated a repository containing data about 51,031 real patients in the Penn Medicine Biobank, including information about loss of function mutation predictions about specific genes. This data originated in Penn Data Store, a large clinical data warehouse that is part of the University of Pennsylvania Health System. Similarly to the pipeline described in this paper, we were able to use Carnival to generate concise RDF triples about these patients and the Semantic Engine to transform the data into the semantically rich model. In this case we did not import external ontologies and execute medication and diagnosis related searches, but this exercise did provide a proof-of- concept that our pipeline can be modified to work with various data sources and types of data. A potential complication of using real data is that it may be frequently changed at the source. For example, the result of a laboratory test assay could be updated in the clinical data warehouse from which our system pulls data. Since our system captures a snapshot of the relevant data at a given point in time, such a change would not be immediately reflected in the RDF output. If the output is expected to include recent updates to the source data, setting up software infrastructure to perform automatic rebuilds of the semantically rich electronic healthcare graph could be a reasonable option. 5. Conclusion Previous work has shown low-throughput semantic instantiation and querying of clinical data from relational sources using a toy dataset of roughly 1,000 patients and their associated data. In addition to improving the representation of clinical data relative to relational models, we now show that generating and querying these models scales in a roughly linear fashion. Based on the previous work, we created a new pipeline and used it to instantiate a more practically sized dataset of one million synthetic patients and their associated health record data. We report performance for each of the necessary steps so that others wishing to use these methods can anticipate the time and space requirements. We discuss the development of an exemplar question to include hierarchy traversals of biomedically-oriented ontologies. Asking questions regarding patients who have taken any of a given class of drugs or received any of a given class of diagnoses using traditional relational database systems can be cumbersome. However, the hierarchy- based nature of ontologies allows these questions to be answered easily using SPARQL against the semantically rich electronic health care graph. We tracked the time to completion of our exemplar question query against each of our generated repositories, and present evidence that our exemplar question can be answered when run against clinical datasets of a practical size. Since the query completion time scales linearly based on the size of the data, it may not be performant to use this pipeline against significantly larger datasets. Although we would prefer to make every component of our pipeline available as open source projects, one relevant Carnival module, which contains sensitive information about internal data structures, could not be made publically available. However, there are many other open-source tools available to convert data from relational to concise RDF format. We encourage those interested in our methods to provide feedback as we continue to develop and improve the pipeline. References [1] M. Ivanović, and Z. Budimac, An overview of ontologies and data resources in medical domains, Expert Syst. Appl. 41 (2014) 5158–5166. [2] K. Munir, and M. Sheraz Anjum, The use of ontologies for effective knowledge modelling and information retrieval, Applied Computing and Informatics. 14 (2018) 116–126. [3] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L.J. Goldberg, K. Eilbeck, A. Ireland, C.J. Mungall, OBI Consortium, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S.-A. Sansone, R.H. Scheuermann, N. Shah, P.L. Whetzel, and S. Lewis, The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration, Nat. Biotechnol. 25 (2007) 1251–1255. [4] B. Smith, and W. Ceusters, Ontological realism: A methodology for coordinated evolution of scientific ontologies, Appl. Ontol. 5 (2010) 139–188. [5] N. Yakovets, P. Godfrey, and J. Gryz, Evaluation of SPARQL Property Paths via Recursive SQL, AMW. 1087 (2013). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.403.2132&rep=rep1&type=pdf. [6] Miller and Stoeckert. A Collaborative, Realism-Based, Electronic Healthcare Graph: Public Data, Common Data Models, and Practical Instantiation. ICBO 2019 Conference Proceedings. https://drive.google.com/file/d/1eYXTBl75Wx3XPMmCIOZba-8Cv0DIhlRq/view. [7] J. Walonoski, M. Kramer, J. Nichols, A. Quina, C. Moesel, D. Hall, C. Duffett, K. Dube, T. Gallagher, and S. McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, J. Am. Med. Inform. Assoc. 25 (2018) 230–238. [8] Simulacra and Simulation: How simulated data can enable OHDSI application development, methods research, and user adoption – OHDSI, (n.d.). https://www.ohdsi.org/2019-us-symposium-showcase-9/ (accessed March 4, 2020). [9] G. Hripcsak, J.D. Duke, N.H. Shah, C.G. Reich, V. Huser, M.J. Schuemie, M.A. Suchard, R.W. Park, I.C.K. Wong, P.R. Rijnbeek, J. van der Lei, N. Pratt, G.N. Norén, Y.-C. Li, P.E. Stang, D. Madigan, and P.B. Ryan, Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers, Stud. Health Technol. Inform. 216 (2015) 574–578. [10] D. Birtwell, H. Williams, R. Pyeritz, S. Damrauer, and D.L. Mowery, Carnival: A Graph-Based Data Integration and Query Tool to Support Patient Cohort Generation for Clinical Research, Stud. Health Technol. Inform. 264 (2019) 35–39. [11] J. Sequeda, F. Priyatna, and B. Villazón-Terrazas, Relational database to RDF mapping patterns, in: Proceedings of the 3rd International Conference on Ontology Patterns-Volume 929, CEUR-WS. org, 2012: pp. 97–108. [12] H.G. Freedman, H. Williams, M.A. Miller, D. Birtwell, D.L. Mowery, and C.J. Stoeckert, A novel tool for standardizing clinical data in a realism-based common data model, bioRxiv. (2020) 2020.05.12.091223. doi:10.1101/2020.05.12.091223. [13] ETL-Synthea. (n.d.). https://ohdsi.github.io/ETL-Synthea/ (accessed March 30, 2020). [14] GraphDB - Semantic Web Standards, (n.d.). https://www.w3.org/2001/sw/wiki/GraphDB (accessed May 21, 2020). [15] O.T. Wg, Mondo Disease Ontology, (n.d.). http://www.obofoundry.org/ontology/mondo.html (accessed May 18, 2020). [16] S. El-Sappagh, F. Franda, F. Ali, and K.-S. Kwak, SNOMED CT standard ontology based on the ontology for general medical science, BMC Med. Inform. Decis. Mak. 18 (2018) 76. [17] K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, M. Zbinden, A. McNaught, R. Alcántara, M. Darsow, M. Guedj, and M. Ashburner, ChEBI: a database and ontology for chemical entities of biological interest, Nucleic Acids Res. 36 (2008) D344–50. [18] J. Hanna, E. Joseph, M. Brochhausen, and W.R. Hogan, Building a drug ontology based on RxNorm and other sources, J. Biomed. Semantics. 4 (2013) 44. [19] S.J. Nelson, K. Zeng, J. Kilbourne, T. Powell, and R. Moore, Normalized names for clinical drugs: RxNorm at 6 years, J. Am. Med. Inform. Assoc. 18 (2011) 441–448. [20] P.L. Whetzel, N.F. Noy, N.H. Shah, P.R. Alexander, C. Nyulas, T. Tudorache, and M.A. Musen, BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications, Nucleic Acids Res. 39 (2011) W541–5.