<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Healthcare Knowledge Graph-based Approach to Enable Focused Clinical Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maulik R. Kamdar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Will Dowling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Carroll</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cailey Fitzgerald</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sujit Pal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steve Ross</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katie Scranton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dru Henke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mevan Samarasinghe</string-name>
          <email>m.samarasingheg@elsevier.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Elsevier Health and Commercial Markets</institution>
          ,
          <addr-line>Philadelphia, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        For the diagnosis, prognosis, and treatment of their patients, clinicians need
access to accurate, succinct, updated, and trustworthy information, provided by
renowned medical organizations and disseminated through patient guidelines,
medical textbooks, journals, and synoptic overviews. Medical specialists must
search and synthesize information on focused, yet esoteric, questions from a
broad set of literature sources (textbooks, guidelines, journal articles) in the
course of a busy practice using search engines (e.g., ClinicalKey { https://
www.clinicalkey.com/). There are several barriers for clinicians to do focused
clinical search queries at the point of care (e.g., \Drug for condition X?", \Cause
of symptom Y?"): growth and evolution of medical knowledge, insu cient time,
unanswered questions, search on patient comorbidities, lack of awareness on
which resource to search, and lack of trust around quality of search results [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Advanced technological solutions that automate the search and retrieval of
the right excerpts from a corpus of trusted medical literature sources in the
right context for such questions are critical for the practice of medicine and
patient care. Knowledge graphs can be e ective tools to address diverse search
and knowledge inference problems in several domains, including healthcare and
biomedicine [
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ]. We present our research and development on a Focused
Clinical Search Service (HGFCSS), powered by the Elsevier Healthcare Knowledge
Graph (termed HG henceforth), that interprets the intent behind focused
clinical search queries and retrieves relevant, updated, and trusted medical content
from a diverse corpus of medical literature sources.
HG is a knowledge platform built to power diverse content discovery and
clinical decision support applications [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ]. HG includes knowledge and data from
heterogeneous healthcare sources about diseases, drugs, ndings, guidelines,
cohorts, journals, and books. As of August 2021, HG consists of more than 400,000
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
medical concepts, 1.5 million medical term labels for these concepts, and more
than 8 million hierarchical and associative relations of speci c relation types.
Subject matter experts (SMEs) regularly update medical knowledge in HG
using novel exploration interfaces. Excerpts from medical literature are tagged with
HG concepts and relations by SMEs or by automated NLP models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and are
ingested into HG daily through automated pipelines.
      </p>
      <p>Given a clinical search query (e.g., ketamine for pain dosage, pancreatic
pseudocyst management ), the HGFCSS parses and identi es the set of medical
concepts and their semantic types from HG, as well as relevant medical phrases (e.g.,
dosage) or cohorts (e.g., pregnancy), in that query. The parser uses HG concept
embedding vectors (i.e., representation of concepts in a geometric space) and the
HG hierarchy and labels to identify similar concepts and to correct misspelled
words for query expansion. Medical phrases are mapped to structural elements in
literature sources and HG relation types. The natural language query is rewritten
to a structured query using the identi ed concepts and relation types, which is
then executed against multiple HG indexes through a federated querying
infrastructure to retrieve the right excerpt in a medical document. Search performance
is measured over multiple sets of real-world clinical queries.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusion and Lessons Learned</title>
      <p>Structured representation of medical content in HG improves performance in
focused clinical search over conventional text-based search methods. Moreover,
the use of concept embedding vectors, in conjunction with HG semantics,
further improved the search performance signi cantly, since we were able to parse
clinical queries more intelligently. Using automated pipelines to regularly update
concepts, labels, and relations within HG from several sources and to tag medical
excerpts, we can ensure that clinicians are able to retrieve accurate, recent,
succinct, and trustworthy medical content. This research signi es the importance of
a scalable healthcare knowledge graph ecosystem and the use of machine learning
and knowledge representation methods for focused clinical search.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>DeJong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Elsevier's healthcare knowledge graph and the case for enterprise level linked data standards</article-title>
          .
          <source>In: Proceedings of the ISWC 2018 Posters &amp; Demonstrations, Industry and Blue Sky Ideas Tracks</source>
          <year>2018</year>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Del</given-names>
            <surname>Fiol</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          , et al.:
          <article-title>Clinical questions raised by clinicians at the point of care: a systematic review</article-title>
          .
          <source>JAMA internal medicine 174(5)</source>
          ,
          <volume>710</volume>
          {
          <fpage>718</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kamdar</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          , et al.:
          <article-title>Enabling web-scale data integration in biomedicine through linked open data</article-title>
          .
          <source>NPJ digital medicine 2(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>14</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kamdar</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          , et al.:
          <article-title>Text snippets to corroborate medical relations: An unsupervised approach using a knowledge graph and embeddings</article-title>
          .
          <source>AMIA Summits on Translational Science Proceedings</source>
          p.
          <volume>288</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>Industry-scale knowledge graphs: lessons and challenges</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>62</volume>
          (
          <issue>8</issue>
          ),
          <volume>36</volume>
          {
          <fpage>43</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>