A Healthcare Knowledge Graph-based Approach
      to Enable Focused Clinical Search

Maulik R. Kamdar, Will Dowling, Michael Carroll, Cailey Fitzgerald, Sujit Pal,
      Steve Ross, Katie Scranton, Dru Henke, and Mevan Samarasinghe

             Elsevier Health and Commercial Markets, Philadelphia, PA
     {m.kamdar, w.dowling, m.carroll, c.fitzgerald, sujit.pal, s.ross.1,
             k.scranton, dru.henke, m.samarasinghe}@elsevier.com


1     Introduction
For the diagnosis, prognosis, and treatment of their patients, clinicians need
access to accurate, succinct, updated, and trustworthy information, provided by
renowned medical organizations and disseminated through patient guidelines,
medical textbooks, journals, and synoptic overviews. Medical specialists must
search and synthesize information on focused, yet esoteric, questions from a
broad set of literature sources (textbooks, guidelines, journal articles) in the
course of a busy practice using search engines (e.g., ClinicalKey – https://
www.clinicalkey.com/). There are several barriers for clinicians to do focused
clinical search queries at the point of care (e.g., “Drug for condition X?”, “Cause
of symptom Y?”): growth and evolution of medical knowledge, insufficient time,
unanswered questions, search on patient comorbidities, lack of awareness on
which resource to search, and lack of trust around quality of search results [2].
    Advanced technological solutions that automate the search and retrieval of
the right excerpts from a corpus of trusted medical literature sources in the
right context for such questions are critical for the practice of medicine and
patient care. Knowledge graphs can be effective tools to address diverse search
and knowledge inference problems in several domains, including healthcare and
biomedicine [5, 3]. We present our research and development on a Focused Clin-
ical Search Service (HGFCSS), powered by the Elsevier Healthcare Knowledge
Graph (termed HG henceforth), that interprets the intent behind focused clini-
cal search queries and retrieves relevant, updated, and trusted medical content
from a diverse corpus of medical literature sources.


2     Methods
HG is a knowledge platform built to power diverse content discovery and clin-
ical decision support applications [4, 1]. HG includes knowledge and data from
heterogeneous healthcare sources about diseases, drugs, findings, guidelines, co-
horts, journals, and books. As of August 2021, HG consists of more than 400,000
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       M. Kamdar et al.

medical concepts, 1.5 million medical term labels for these concepts, and more
than 8 million hierarchical and associative relations of specific relation types.
Subject matter experts (SMEs) regularly update medical knowledge in HG us-
ing novel exploration interfaces. Excerpts from medical literature are tagged with
HG concepts and relations by SMEs or by automated NLP models [4], and are
ingested into HG daily through automated pipelines.
    Given a clinical search query (e.g., ketamine for pain dosage, pancreatic pseu-
docyst management), the HGFCSS parses and identifies the set of medical con-
cepts and their semantic types from HG, as well as relevant medical phrases (e.g.,
dosage) or cohorts (e.g., pregnancy), in that query. The parser uses HG concept
embedding vectors (i.e., representation of concepts in a geometric space) and the
HG hierarchy and labels to identify similar concepts and to correct misspelled
words for query expansion. Medical phrases are mapped to structural elements in
literature sources and HG relation types. The natural language query is rewritten
to a structured query using the identified concepts and relation types, which is
then executed against multiple HG indexes through a federated querying infras-
tructure to retrieve the right excerpt in a medical document. Search performance
is measured over multiple sets of real-world clinical queries.

3    Conclusion and Lessons Learned
Structured representation of medical content in HG improves performance in
focused clinical search over conventional text-based search methods. Moreover,
the use of concept embedding vectors, in conjunction with HG semantics, fur-
ther improved the search performance significantly, since we were able to parse
clinical queries more intelligently. Using automated pipelines to regularly update
concepts, labels, and relations within HG from several sources and to tag medical
excerpts, we can ensure that clinicians are able to retrieve accurate, recent, suc-
cinct, and trustworthy medical content. This research signifies the importance of
a scalable healthcare knowledge graph ecosystem and the use of machine learning
and knowledge representation methods for focused clinical search.

References
1. DeJong, A., et al.: Elsevier’s healthcare knowledge graph and the case for enterprise
   level linked data standards. In: Proceedings of the ISWC 2018 Posters & Demon-
   strations, Industry and Blue Sky Ideas Tracks 2018. (2018)
2. Del Fiol, G., et al.: Clinical questions raised by clinicians at the point of care: a
   systematic review. JAMA internal medicine 174(5), 710–718 (2014)
3. Kamdar, M.R., et al.: Enabling web-scale data integration in biomedicine through
   linked open data. NPJ digital medicine 2(1), 1–14 (2019)
4. Kamdar, M.R., et al.: Text snippets to corroborate medical relations: An unsu-
   pervised approach using a knowledge graph and embeddings. AMIA Summits on
   Translational Science Proceedings p. 288 (2020)
5. Noy, N., et al.: Industry-scale knowledge graphs: lessons and challenges. Communi-
   cations of the ACM 62(8), 36–43 (2019)