<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Open Drug Knowledge Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mark Mann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Ilievski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Rostami</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aastha</string-name>
          <email>aastha@usc.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Basel Shbita</string-name>
          <email>shbitag@isi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Sciences Institute</institution>
          ,
          <addr-line>Marina del Rey, CA 90292</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Southern California</institution>
          ,
          <addr-line>Los Angeles CA 90007</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automatic knowledge-based systems can assist medical professionals in making more informed recommendations and decisions. Unfortunately, as no comprehensive knowledge base (with both medical and non-medical) knowledge exists today, much manual e ort is required to consolidate knowledge across sources heterogeneous in content and formats. This paper proposes a knowledge-based method that aims to harmonize four such heterogeneous sources into a single drug-centric knowledge graph. The graph is based on the drugs found in Wikidata and extended with specialized sources through an extraction and transformation pipeline, including data acquisition, entity resolution, and semantic modeling. Our analyses show that the resulting graph and its embeddings can capture drug similarity through their associated symptoms and address common, knowledge-intensive medical search scenarios. As such, it holds the promise to be adapted for drug recommendation in the future. Given the modular setup of our method, new sources can be included to accommodate healthcare object use cases relating to diagnoses and claims. We make the resulting knowledge source available in both relational database and property graph format.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Healthcare systems heavily rely on the knowledge and the experiences of the
physicians for drug prescription based on diagnosed symptoms of patients.
Despite being dominant, this traditional process is limited to the knowledge scope
of one person and faces several challenges.
1. Several di erent types of drugs may be appropriate to treat the same disease.</p>
      <p>Other (non-medical) factors such as price, accessibility, and insurance policy
may help healthcare professions reach optimal decisions in such situations.</p>
    </sec>
    <sec id="sec-2">
      <title>Copyright © 2021 for this paper by its authors. Use permitted under Creative Com</title>
      <p>
        mons License Attribution 4.0 International (CC BY 4.0).
2. Many healthcare professionals who are not physicians are not supposed to
prescribe drugs in normal situations. Yet, they may need to act upon
symptoms that they can diagnose in emergencies to initiate treatment before an
accurate examination can be performed by a doctor.
3. When a novel disease emerges, clinical data and standard treatment
protocols are limited in the beginning, as in COVID19. Physicians may want to
search for all potential existing drugs which may have a positive e ect given
the observed symptoms of disease and then re-purpose them for potential
early-stage treatment options [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
4. Patients may desire to be more involved in the prescription process, e.g.,
knowing more about particular drugs and their side e ects to improve the
prescription process. Patients also may need to nd the right drug at a
reasonable price to purchase, particularly in the case of over-the-counter
drugs.
      </p>
      <p>
        Automated knowledge-based systems could assist with such tasks that
involve intelligent searching of a database to arrive at a valid conclusion. We
observe that, while a set of very valuable sources is publicly available, no
comprehensive database exists that can accommodate the listed challenges. Existing
medicinal drug databases, e.g., DrugBank3, are helpful, but they are more
similar to specialized encyclopedias. These databases are mostly unstructured with
an abundant amount of thorough information about each entity, scattered across
documents and not tailored to particular use cases. As a result, these sources are
suboptimal for practicing medicine e ciently [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Consequently, the user must
spend a considerable amount of time searching across disjointed databases and
narrowing down to nd the right treatment and consider non-medical constraints
such as price and avoid adverse interactions with the current medications. For
example, GoodRx4 has structured drug prices and store availability for each
medicine, but it can only be used for shopping after prescription as it lacks
mapping of symptoms to drugs. WebMD5 contains structured treatment data
for each symptom but does not inform what over-the-counter drugs could help.
DrugBank is an open-source database that can help determine which drugs are
safe to consume with the current medications the patient is taking. However, the
average person or even physician is not a computer scientist and cannot query
this rich resource. Having structured knowledge bases that integrate such
existing distributed knowledge would help healthcare professionals transcend the
above challenges and obtain accurate answers for their queries quickly. It can
also assist patients in buying drugs at better prices and improving their shopping
experience.
      </p>
      <p>
        In this paper, we develop a structured database for drugs in terms of a
knowledge graph (KG) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. KGs have been found helpful in AI-aided medicine,
particularly for clinical decision support systems for diagnosis and treatment [3,
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 https://go.drugbank.com/releases/latest</title>
    </sec>
    <sec id="sec-4">
      <title>4 https://www.goodrx.com/</title>
    </sec>
    <sec id="sec-5">
      <title>5 https://www.webmd.com/drugs/2/conditions/index</title>
      <p>
        14]. Building KGs using unstructured medical data helps performing more
complex tasks using AI, including adverse drug reactions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], drug discovery [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
repropose [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and predicting drug-drug interaction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Our goal is to construct
a KG to help the user nd potentially helpful drugs that can serve as potential
treatments given a list of symptoms or a disease. Additionally, information about
the availability of drugs at nearby stores is provided to the user. Building upon
the existing healthcare literature [
        <xref ref-type="bibr" rid="ref16 ref17">17, 16</xref>
        ], our goal is to integrate existing sources
to create a comprehensive, fast search experience for users who manage
conditions, budget, and control adverse drug interactions for patients. By integrating
multiple knowledge sources, we enable the users to have more expressive search
results quickly. Our knowledge graph builds on the knowledge of symptoms to
disease mapping. This helps to nd possible drugs that can be used to treat a
symptom. It incorporates information on prices and drug availability. This helps
the user to zero down and research the drugs that are a ordable and available.
      </p>
      <p>We list the contributions of this paper as follows:
1. We present a pipeline for extraction and consolidation of relevant
knowledge about symptoms, drugs, and their interaction, as well as non-medical
information, such as drug prices. We apply our pipeline method to four
relevant and complementary sources, resulting in an integrated knowledge base.
(Section 2)
2. We make the resulting data publicly available, both in the form of a relational
database and a knowledge graph.6 The two formats support complementary
use cases.
3. We analyze the contents of the resulting database. We provide statistics of
its constituting nodes and relations and run graph embedding-based queries
to nd similar products or drugs. (Section 3)
4. We assess the applicability of our integrated KG by designing a user-friendly
web interface and showing its utility in two representative scenarios.
(Section 4)
2</p>
      <sec id="sec-5-1">
        <title>Approach</title>
        <p>
          The overall architecture of our approach is shown in Figure 1. We start by
describing the data acquisition from the four sources that we will use in this paper:
Wikidata [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], DrugBank, WebMD, and GoodRx (Section 2.1). We next describe
their consolidation through entity linking and resolution between pairs of sources
(Section 2.2). The resulting ontology of our data is described in Section 2.3.
2.1
        </p>
        <sec id="sec-5-1-1">
          <title>Sources and data acquisition</title>
          <p>We sought to construct a knowledge graph from several drug-centric data sources.
Each source contributes with a particular set of information about drugs, prices,
and relations to conditions, which can be complimentary. To ease the e ort</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 https://www.kaggle.com/mannbrinson/open-drug-knowledge-graph</title>
      <p>
        required in entity linkage, we chose a well-adopted, drug-centric external id
(Drugbank ID) as the primary key of our drug entity. For each data source,
we identi ed the target features needed and devised methods for extraction of
the data:
1. Wikidata [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is one of the largest publicly available knowledge graphs,
describing over 90 million entities with more than a billion statements. To
retrieve relevant data from Wikidata, we query it for medication (Q12140)
entities with any Drugbank ID (P715) that treats any condition (P2175). We also
retrieve additional, optional features: the medication's active ingredient in
(P3780), signi cant drug interaction (P769), and ATCCode (P267). The
total amount of rows extracted from said query was 1,560.
2. Drugbank is a drug-centric database focused on drug-drug interactions and
bioinformatics-related features. Its knowledge is provided as a data dump in
XML format. We extracted all 2,166 drugs from Drugbank's XML dump,
each with a maximum of 20 products and 100 interactions.
3. WedMD is a site focused on helping users search for treatments for a given
condition. The site displays an index of all possible conditions, sorted
alphabetically. From each condition, a list of drug treatments is provided. As the
website provides no public API, we scraped its content programmatically.
The crawler obtained a total of 58,921 condition-drug relations and 12,857
unique drugs. Features extracted include condition, product, user reviews,
and prescription type.
4. GoodRx is a healthcare company that tracks prescription drug prices in the
United States and provides free drug coupons for discounts on medications.
As GoodRx does not provide a public API service, we extracted knowledge
on GoodRx's drug products directly from their website, starting from a
Wikidata-based seed list. The features of drug products that we extracted
were: zipcode, store, price type, price, and price link. A total of 20,688 prices
and 23 stores were extracted for the 997 matched drug products.
Metric Value
      </p>
      <p>All Pairs 178,519
Pairs matched 1,701</p>
      <p>Pairs found 38</p>
      <p>Recall 97.36</p>
      <p>Precision 100
True positives 0.973
False positives 0
True Negatives 1</p>
      <p>False negatives 0.026
An entity resolution step follows the data extraction step. As the entities across
sources are originally disjoint, linking them is essential for constructing a
wellconnected drug knowledge graph. To avoid introducing false positives, we rst
perform entity resolution across sources based on their external drug
identier (Drugbank ID). In this way, the Drugbank ID allowed us to link all data
sources to Wikidata in a `hub-and-spoke' manner. This design choice enriched
the information about the entities found in Wikidata but excludes the remaining
entities in the other three sources, which are not mapped to Wikidata through
the Drugbank ID. For this purpose, we consider further linking on these entities.
Speci cally:
Wikidata to Drugbank: Linkage occurred only between Wikidata drugs
(containing a Drugbank ID) and the subset of Drugbank drugs with matching
Drugbank ID. In this case, we did not perform fuzzy matching, as we found it to
decrease the overall quality of matching. Drugbank IDs were found on 787
wikidata medications.</p>
      <p>Wikidata to GoodRx: We matched Wikidata and GoodRx based on an
exact matching query on the GoodRx website. URL requests to GoodRx return a
result if the drug name is exactly matched and otherwise give a 404 error. Of
all 1560 wikidata drug products, we found 997 matches in GoodRx (recall of
63.9%).</p>
      <p>Wikidata to WebMD: Due to the absence of a shared identi er between
Wikidata and WebMD, we resorted to fuzzy matching between their drug products.
A scoring function was leveraged to create matches for pairs if the pair had a
Jaccard similarity greater than 0.7. For each search term, bi-gram sets were
generated before Jaccard similarity was calculated. A development set of 50 true
pairs was manually compiled to enable the evaluation of this matching approach.</p>
      <p>Hash-based blocking upon the entity's rst two characters was utilized to reduce
candidate pairs from 20M to 178k. Our scoring function obtained 97.3% recall
and 100% precision on these development pairs. Detailed results are shown in
Table 1. We judge this level of error to be acceptable; thus, we proceed with this
linkage strategy.</p>
      <p>An overview of the number of entities mapped between Wikidata and each of
the three other sources is shown in Figure 2. We allowed a one-to-one match for
Wiki-Drugbank and Wiki-GoodRx matching tasks. However, we allowed
one-tomany match for Wiki-WebMD drug product matching. This is because WebMD
displayed many su x variations for a product (ex: Adriamycin vial,
AdriamycinPfs Solution) that we wanted to include in our graph. This design choice allowed
for more matches (1701) than products existing in Wikidata (1560).
The ontology was designed in a top-down manner to t our ultimate goal of
enabling queries to connect patients with treatment based on their search
parameters. We preserved all binary relations: treatment, interaction, active ingredient in,
and drug price, and used them to model information in all their suitable sources.
To contain scope for our proof-of-concept, we selected these relations from
Wikidata and Drugbank while using all extracted relations from WebMD and GoodRx.
We decided to categorize drug-like entities into two nodes - drug and product
- to represent the active ingredient and its name in the market. Our ontology
map is described in gures 2 and 3. It is a simple yet powerful ontology, which
allows us to achieve the project goals, including: (a) Store symptoms to drugs
mapping (b) Capture drug interactions (c) Capture drug prices and variation
across stores/zipcodes.</p>
      <p>Figure 3 is an Entity-Relation diagram of the entities in our relational data
model, created after entity linkage was completed. The relational data model was
stored in a MySQL instance and used as a back-end for our front-end application
(Section 4.1). Figure 4 is composed of the same data but expressed as a property
graph and stored in Neo4j. The property graph format opened opportunities for
us to leverage Neo4js robust graph-centric libraries for path- nding, centrality,
and computation of embeddings.
After extraction and entity linkage scripts steps, the resulting data model was
loaded to a MySQL instance using another python script. The relational schema
is displayed in Figure 3. The resulting relational database was then exported in
.csv format and loaded to Neo4j. Neo4j import commands were utilized to load
the data to create nodes and edges corresponding to the data model in Figure
4.
3</p>
      <sec id="sec-6-1">
        <title>Analysis</title>
        <p>3.1</p>
        <sec id="sec-6-1-1">
          <title>Statistics</title>
          <p>In this section, we analyze the contents of our knowledge base. First, we provide
basic statistics (Section 3.1). Then, we compute drug embeddings and cluster
them to investigate possible emerging patterns in the graph (Section 3.2).
After we loaded the open drug knowledge graph into MySQL, we computed
statistics of the coverage of each class across di erent sources. The results are
shown in Table 2. Each source contributed a di erent pro le of features, and
some sources contributed unique classes. Speci cally, Drugbank distinctly
contributed the `Manufacturer' and `Interaction' classes while GoodRx contributed
the `Store' and `Price' classes. Each source was linked back to Wikidata as a
centralized source, based on the linkage methods described above. This shows
the bene t of integrating sources with complementary foci in a single knowledge
source, which is ultimately more than a sum of its parts.
3.2</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Graph Embedding Analysis</title>
          <p>
            We sought to further explore the higher-level structure of the extracted
knowledge graph via graph embeddings. Our goal was to explore the relation treatment
(drug, condition) embedding to con rm whether drugs that treat similar
conditions are clustered together. If drugs are clustered in this fashion, the graph
embeddings could enable drug recommendations, given a source drug, for providers
in the future. Our embedding is built from all 3,654 instances in our treatment
table, sourced from Wikidata. We then utilized the Ampligraph [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and
Tensorboard [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] libraries with TransE and [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] Complex [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] models to project our
data into the 150-d embedding space. The training occurred for 200 epochs with
an Adam optimizer. A training set was generated from 90% of the data, with
the remaining 10% set aside as testing data.
          </p>
          <p>
            In Table 3, our embedding models are evaluated using the following entity
ranking tasks described by Wang et al. [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]: 1) mean reciprocal rank (MRR),
and 2) Hits@K. MRR asks the embedding model to rank unseen test triples. A
model that produces higher ranks for known true triples (i.e., test triples) is
considered superior at predicting missing links. The Hits@K metric computes how
many elements of a vector of rankings make it to the top K positions. When
visualizing the embedding vectors, we utilized embeddings from the Complex model
to perform best on our entity ranking tasks. To visualize the embedding, we
reduced our embeddings into 3-d using T-SNE [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] as our dimensionality
reduction method. We then inspected the result for nearest neighbors based on cosine
similarity in the initial embedding space. In Figures 5 and 6, we selected results
from our embedding visualization. The visualizations are from Tensorboard and
displayed using the aforementioned model parameters and visualization settings.
We found that drugs that treat similar conditions are somewhat clustered in this
embedding space, while similar conditions are grouped together. In Figure 5a,
we nd the 10 nearest cosine neighbors to source drug \insulin aspart" for drugs.
Two of the neighbors are also insulin variants. However, more domain expertise
is required to deem whether this clustering is a meaningful representation of
drugs that treat similar conditions. For conditions, in Figure 5b, the 10 nearest
cosine neighbors are located for \bipolar disorder". Many of the neighbors
logically represent similar conditions such as \mood disorder", \schizophrenia", and
\anxiety". Further experiments are required to con rm how meaningful these
initial embeddings can be for recommending drug products. Other relations that
may be helpful to include to achieve drug similarity embedding may be ICD-10
codes of the treatment's condition or the products of the treatment's drug.
However, these embeddings show early signs of progress for achieving goals around
drug recommendation via nearest neighbor search within a graph embedding
space.
          </p>
          <p>Fig. 5: Graph Embedding Visualization. Visualization of all entities within the
reduced embedding created by the Complex embedding model. In Figure 5a,
the source entity `insulin aspart' is selected. We observe clustering for this
entity in the embedding space. In Figure 5b, `bipolar disorder' is selected,
which also exists within an observable cluster of similar entities.
In this Section, we present our web interface that allows user exploration of the
relational data model. We also explore the associated property graph to gain
motivation for future hypothesis and functionality.
We prepare a web interface for our Intelligent Drug Shopper, shown in Figures 8
and 9. The web interface was developed using the Python Django framework. The
user can input search parameters for a patient's condition, current medications,
and price range. These parameters are inserted into a SQL query template that
checks our data model for any matching results.</p>
          <p>For example, in Figure 8 below, a patient is present with osteoarthritis and
has a budget of 20 dollars to spend on medicine. These parameters are inputted
and the query retrieves matching treatments, its active ingredient, and average
price. The user can then navigate to di erent views of the Active Ingredient or
Product entity via hyperlinks.</p>
          <p>To demonstrate further searching capabilities our data model provides,
consider Figure 9. Extending the same search from Figure 8, a patient may also
be taking some current medication like Zyvox, an antibiotic. This parameter is
added to the search, and we nd many of the previous recommended treatments
from Figure 8 are removed as they interact with this antibiotic. This feature will
enable users to nd treatments that avoid adverse drug interactions, while still
treating a condition and adhering to the patient's budget.
In addition to exploring the relational database via the web application, we also
directed queries to the equivalent property graph stored in Neo4j. In Figure 9,
we consider a user search for treatments of \medullary thyroid carcinoma" and
all possible drug interactions with these treatments. The resulting visualization
shows two possible treatments (cluster centers), with drug interactions branching
outward. We can see there is an intersection of six drugs that interact with
either treatment. Therefore, if a patient is currently prescribed a drug in this
intersection, they cannot safely be prescribed either of the two treatments. Neo4j
was utilized to perform this visualization. As the data model is loaded as a
property graph in Neo4j, we can leverage Neo4j's wide set of graph analytics
tools to compute such paths automatically. We also plan to use Neo4j to compute
centrality metrics over our graph in the future.
5</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Discussion and future work</title>
        <p>Hub graph: While we were able to link data sources with Wikidata, there are
some bene ts and drawbacks to the chosen design methodology. In our design,
we link all sources back to Wikidata in a `hub-and-spoke' fashion. No other
sources are permitted to link to each other. This design functions to extend the
Wikidata knowledge graph, enabling new drug features (e.g., drug price) to be
analyzed with all other connected nodes to medication (Q12140). The drawback
to this approach is that Wikidata does not contain nearly as many drugs or drug
product entities as Drugbank, thus bottle-necking the number of possible entity
links made with other data sources. Depending on the application, the extension
of Wikidata with drug-centric data may be less important. In this case, we would
suggest using Drugbank as a centralized source for entity linkage to maximize
the number of links on drug and drug product entities with other data sources.</p>
        <sec id="sec-6-2-1">
          <title>Integration of more data sources: To answer even more healthcare</title>
          <p>centric questions, we propose to extend the knowledge graph with additional
healthcare datasets. These datasets could relate to healthcare objects, such as
prescriptions, procedures, diagnoses, claims, providers, payers, and healthcare
facilities. Many of these datasets are made publicly available by government-run
healthcare agencies, such as Food and Drug Administration (FDA7), National
Institutes of Health (NIH8), and Center for Medicare Services (CMS9). Standard
identi ers for each healthcare object are very common and, therefore, reduce the
amount of fuzzy matching required to extend the knowledge graph. For example,
the FDA gives drugs a National Drug Code (NDC), representing labeler,
product, and package size. CMS gives each healthcare provider a National Provider
Identi er (NPI). ICD-10 codes can be used to label medical conditions.</p>
          <p>Drug product similarity in embedding space: A future hypothesis to
check whether our knowledge graph can enable drug product similarity
searching via kNN search within graph embeddings. This application would enable
healthcare workers to nd similar products for a source product. The
embedding should be produced to cluster products together in embedding space if the
products treat similar conditions (ex: mental disorders, nervous, ocular).
Addi</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7 https://www.fda.gov/home</title>
    </sec>
    <sec id="sec-8">
      <title>8 https://www.nih.gov/</title>
    </sec>
    <sec id="sec-9">
      <title>9 https://www.cms.gov/</title>
      <p>tional sources must be integrated to enable this work - such as ICD(condition,
ICD code) - to enable ground truth checking of clusters.</p>
      <p>Application features: Currently, our application does not support
searching based on multiple conditions or current medications. In order to support this,
our query templates must be updated to allow for these additional search
parameters. Another improvement would be to enable eager fuzzy n-gram searching,
triggered by characters inputted in real-time, to nd a matching indexed search
term. This can be enabled via indexing of keywords and real-time searches upon
the index. This feature would enable higher success with user searches compared
to current functionality.
6</p>
      <sec id="sec-9-1">
        <title>Conclusion</title>
        <p>In this paper, we proposed the Open Drug Knowledge Graph: an integrated
drugcentric data model used to enable customers to make well-informed purchasing
decisions by including prices, availability, and drug interactions in a single view
without referencing ne print about drug interactions. This data model
leverages healthcare objects stored in pre-existing knowledge bases and integrates
knowledge from previously disjoint systems. Our acquisition pipeline consists
of three key steps: source data acquisition, entity resolution, and ontology
mapping. When performing entity linkage, external drug identi ers such as Drugbank
ID were heavily utilized to reduce the need for fuzzy matching. We created a
web application to visualize the relational data model (MySQL), and showed
its potential to be used by healthcare workers and patients to inform treatment
decisions. The model was also loaded into a property graph (Neo4j), which was
anecdotally shown to enable visualization and network analytics upon the graph.
We computed graph embeddings upon the treatment class using the TransE and
Complex models. Nearest neighbor search based on cosine distance over these
embeddings showed their potential to aid product and condition searches. We
expect that such a single integrated source can help users make medically safe
and nancially smart decisions. Future work should investigate the graph's
usefulness for customers, integrate additional sources, and explore novel ways to
leverage the data through graph centrality and path- nding methods. To
facilitate further exploration and development of drug knowledge graphs, the data
model10 is made publicly available to the research community.
10 https://www.kaggle.com/mannbrinson/open-drug-knowledge-graph</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Citro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudlur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mane</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olah</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasudevan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viegas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warden</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wattenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wicke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>TensorFlow: Large-scale machine learning on heterogeneous systems (</article-title>
          <year>2015</year>
          ), http://tensor ow.org/, software available from tensor ow.org
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bean</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iqbal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dzahini</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ibrahim</surname>
            ,
            <given-names>Z.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broadbent</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobson</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records</article-title>
          .
          <source>Scienti c reports 7(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>11</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bisson</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komm</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernas</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fineberg</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marzo</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauh</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smolinski</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wind</surname>
            ,
            <given-names>W.M.:</given-names>
          </string-name>
          <article-title>Accuracy of a computer-based diagnostic program for ambulatory patients with knee pain</article-title>
          .
          <source>The American journal of sports medicine 42(10)</source>
          ,
          <volume>2371</volume>
          {
          <fpage>2376</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Duran</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakhnenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          . In: Burges,
          <string-name>
            <given-names>C.J.C.</given-names>
            ,
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          . vol.
          <volume>26</volume>
          , pp.
          <volume>2787</volume>
          {
          <fpage>2795</fpage>
          . Curran Associates, Inc. (
          <year>2013</year>
          ), https://proceedings.neurips.cc/paper/2013/ le/1cecc7a77928ca8133fa24680a88d2f9- Paper.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Celebi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uyar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yasar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gumus</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dikenelli</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings</article-title>
          .
          <source>BMC bioinformatics 20(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>14</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Costabello</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGrath</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tabacof</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>AmpliGraph: a Library for Representation Learning on Knowledge Graphs (Mar</article-title>
          <year>2019</year>
          ). https://doi.org/10.5281/zenodo.2595043, https://doi.org/10.5281/zenodo.2595043
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roweis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Stochastic neighbor embedding</article-title>
          . In: Becker,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Thrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Obermayer</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          . vol.
          <volume>15</volume>
          , pp.
          <volume>857</volume>
          {
          <fpage>864</fpage>
          . MIT Press (
          <year>2003</year>
          ), https://proceedings.neurips.cc/paper/2002/ le/6150ccc6069bea6b5716254057a194efPaper.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Real-world data medical knowledge graph: construction and applications</article-title>
          .
          <source>Arti cial intelligence in medicine 103</source>
          ,
          <issue>101817</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mohanty</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>M.H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mridul</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohanty</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swayamsiddha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Application of arti cial intelligence in covid-19 drug repurposing</article-title>
          .
          <source>Diabetes &amp; Metabolic Syndrome: Clinical Research &amp; Reviews</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Sematyp: a knowledge graph based literature mining method for drug discovery</article-title>
          .
          <source>BMC bioinformatics 19(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>11</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colloc</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacquet-Andrieu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Emerging medical informatics with case-based reasoning for aiding clinical decision in multi-agent system</article-title>
          .
          <source>Journal of biomedical informatics 56</source>
          ,
          <volume>307</volume>
          {
          <fpage>317</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Trouillon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welbl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eric</surname>
            <given-names>Gaussier</given-names>
          </string-name>
          , Bouchard, G.:
          <article-title>Complex embeddings for simple link prediction (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <issue>10</issue>
          ),
          <volume>78</volume>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Safe medicine recommendation via medical knowledge graph embedding</article-title>
          .
          <source>arXiv preprint arXiv:1710.05980</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ru nelli</surname>
          </string-name>
          , D.,
          <string-name>
            <surname>Gemulla</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broscheit</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>On evaluating embedding models for knowledge base completion (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zamborlini</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoekstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Da</given-names>
            <surname>Silveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Pruski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>ten Teije</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Generalizing the detection of clinical guideline interactions enhanced with lod</article-title>
          .
          <source>In: International Joint Conference on Biomedical Engineering Systems and Technologies</source>
          . pp.
          <volume>360</volume>
          {
          <fpage>386</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zamborlini</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoekstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Da</given-names>
            <surname>Silveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Pruski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ten</surname>
          </string-name>
          <string-name>
            <surname>Teije</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Van Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Inferring recommendation interactions in clinical guidelines 1</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <volume>421</volume>
          {
          <fpage>446</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>