<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Enhanced Representations for Clinical Decision Support</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <email>stefano.marchesin@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maristella Agosti</string-name>
          <email>maristella.agosti@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The study presents a methodology that contributes to reduce the semantic gap in clinical decision support systems. The methodology integrates semantic information - provided by external knowledge resources - into unsupervised neural Information Retrieval (IR) models. The objective is to design and develop innovative methods that can be efective in real-case medical scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION AND RELATED WORK</title>
      <p>
        Clinicians struggle to keep up with the pace at which medical
literature is growing. This factor has given rise to Clinical Decision
Support (CDS) systems. CDS systems are designed to assist
clinicians in providing patient care – e.g. formulate diagnoses, decide
treatments, etc. Among the diferent tasks that CDS systems
perform, biomedical literature search is pivotal. However, very few
existing tools specifically target the clinical environment. To foster
their growth, the Text REtrieval Conference (TREC), in 2014,
introduced a CDS track. The TREC CDS track triggered the creation of
tools and resources to evaluate Information Retrieval (IR) systems
designed for CDS tasks. In 2017, TREC Precision Medicine (PM)
became a successor to the CDS track. Focused on an important
use case in clinical decision support, it provides useful precision
medicine-related information to clinicians treating cancer patients.1
TREC CDS and PM tracks highlight that the large presence of
synonyms and polysemous words found in biomedical literature and
clinical trials – along with the use of context-specific expressions –
significantly reduces the efectiveness of retrieval systems [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ].
Such features of medical language increase the semantic gap,
representing a long-standing problem in IR and Natural Language
Processing (NLP). In IR, the semantic gap reflects the diference
between low-level description of document and query contents and
high-level interpretation of meanings.
      </p>
      <p>
        To bridge the semantic gap, semantic models have long been
used in IR. Recent advances in neural language models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have led
the IR community to adopt them for retrieval tasks. Approaches
that inject low-dimensional text representations learned by neural
models within state-of-the-art IR models have emerged [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], along
with approaches that learn representations of words and documents
from scratch and use them directly for retrieval [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However,
distributed representations learned by neural language models sufer
by two main limitations: (i) polysemy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and (ii) synonymy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Few
approaches have been proposed in IR to address these problems.
In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], relational semantics are used to constrain word
representations applied in a document re-ranking scenario. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], latent
representations are built upon concepts linked to knowledge
resources and injected in a text-to-text matching process – according
to a query expansion technique. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a tripartite neural language
model is proposed that relies on external knowledge resources to
jointly constrain word, concept and document representations. The
model is then used for query expansion and document re-ranking.
      </p>
      <p>The methodology proposed to address the semantic gap in CDS
is shortly presented in the following and it is part of the H2020
ExaMode project,2 whose objective is to provide knowledge discovery
for exascale medical data. In Section 2, we present the structure and
contents of the histopathology clinical reports used in the ExaMode
project, while in Section 3 we introduce the methodology that can
help reduce the semantic gap in CDS tasks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>CLINICAL REPORT CONTENTS</title>
      <p>A histopathology report is a document that is written and signed by
a pathologist. It contains the results of the analyses performed on
specific tissues or cells to obtain a pathological-clinical diagnosis
that can lead to appropriate treatment options in case of disease.
To provide a general structure for pathology reports, the College of
American Pathologists (CAP) has set a series of guidelines.3 The
structure and contents of pathology reports are described below.
Patient Identifier and Clinical Information: contains the
patient’s identifier and specific information such as name, date of
birth, hospital and medical record number. Clinical information
provides details such as symptoms, medical conditions or data about
the target specimen. The source of the specimen sample is also
provided. Additionally, the pathologist’s name and signature, along
with the laboratory name and address, are also specified in this
section.</p>
      <p>Macroscopic Description: describes how a specimen appears to
the naked eye and details what portions of the target specimen are
examined under the microscope. The description includes the size,
colour, number of tissue samples and, when appropriate, the weight
of the specimen. In the presence of multiple tissues or organs within
the specimen, each one is described and sampled. Each sample
produces a microscope slide that is listed in the pathology report.
Microscopic Description: describes how the specimen looks
under the microscope compared to normal cells. It also describes
2https://www.examode.eu/
3https://www.cap.org/cancerprotocols/
whether the cancer has invaded nearby tissues. All the information
derived from microscopic descriptions can help provide guidelines
for treatments.</p>
      <p>Diagnosis: has to do with the final pathology diagnosis issued by
the pathologist following specimen examination. Cancer diagnoses
describe multiple aspects associated to the specific tumour(s). For
most of these diagnoses, the grade of the tumour – determined by
applying tumour-specific criteria to the microscopic features – is
included.</p>
      <p>Comments: contains additional information used by the
pathologist to describe challenging cases. This section may contain
additional data such as images, molecular studies, references and
addendum information, useful to the care team.</p>
      <p>The methodology we propose will be evaluated on real-case
medical scenarios, where clinical reports follow the above-mentioned
structure. Moreover, clinical reports provided by ExaMode consider
diferent use-cases and are written in multiple languages. Methods
capable of efectively representing the underlying medical data
are fundamental to reduce the semantic gap and retrieve relevant
medical data that support the pathologists in their decision making.</p>
    </sec>
    <sec id="sec-3">
      <title>3 METHODOLOGY</title>
      <p>This study explores how the semantic information contained within
external knowledge resources can be integrated into unsupervised
neural IR models. This ofers the opportunity to design, develop
and evaluate novel approaches that increase the understanding of
the semantic gap and its relation with retrieval efectiveness in the
real-case CDS scenarios provided by the ExaMode project.</p>
      <p>
        Our research is driven by the following research question:
How can external knowledge be integrated in
document/query representations so that, given a query
clinical report, the semantic gap between the query
and the documents can be reduced to efectively
retrieve medical knowledge?
We first addressed the research question in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], proposing a retrieval
approach for CDS based on document-level semantic networks,
comprising of two steps: (i) automatic creation of document-level
semantic networks, (ii) retrieval of relevant medical knowledge
using document-level semantic networks. The approach provides a
semantic-aware representation of documents and queries by means
of semantic networks embodying semantic concepts and relations.
Its aim is to reduce the semantic gap, especially considering aspects
of polysemy and synonymy. Nevertheless, the representation of
documents and queries as semantic networks, derived from a
reference Knowledge Base (KB), has three main limitations. Firstly, it
requires concept and relation extraction algorithms to achieve a
high level of accuracy, since the noise in creating a document-level
semantic network is likely to propagate even in the retrieval step.
Secondly, most state-of-the-art techniques to extract biomedical
relations are developed to detect specific relationships like
proteinprotein interactions, gene-disease interactions and so on. They,
however, cover only a fraction of the biomedical domain which
is not wide enough for CDS. Thirdly, the complexity of concept
and relation extraction algorithms makes it dificult to scale them
eficiently on IR collections – typically being orders of magnitude
larger than NLP collections. Despite preserving the initial idea, we
started investigating alternative approaches to efectively integrate
concepts and relations from external knowledge sources into the
retrieval process. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we proposed an IR framework that
combines implicit and explicit representations of documents to reduce
the semantic gap for CDS. Implicit representations are obtained
through distributional learning, whereas explicit representations
are derived from external knowledge sources. The combination of
these representations has the aim of enriching the semantic
understanding of documents – which in turns reduces the semantic gap
between documents and queries.
      </p>
      <p>
        Implicit representations can thus capture the latent semantics
existing between words (and documents) relying only on the
document collection as knowledge source. However, such
representations are hindered by two main limitations that knowledge-based
representations can reduce: (i) distributional learning models fail to
discriminate polysemous words [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; and (ii) distributional learning
models fail to learn close representations for synonymous words
occurring in diferent contexts [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Therefore, we are currently
performing an evaluation of state-of-the-art neural representation
models for IR. An in-depth evaluation of their efectiveness is
fundamental to understand how neural representation models can be
combined efectively with external knowledge sources, so as to
reduce the semantic gap and increase retrieval efectiveness. Based
on [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and on this analysis, we are developing an unsupervised
neural model to learn knowledge enhanced latent representations
of words, concepts and documents. To reduce prominent aspects of
the semantic gap, the model integrates relational semantics from
external knowledge sources in the learning process.
      </p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENTS</title>
      <p>The work is supported by the ExaMode project, as part of the
European Union H2020 program under Grant Agreement no. 825292.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Agosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>An Analysis of Query Reformulation Techniques for Precision Medicine</article-title>
          .
          <source>In Proc. of the 42nd ACM SIGIR (in print)</source>
          .
          <source>ACM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Improving Language Estimation with the Paragraph Vector Model for Ad-Hoc Retrieval</article-title>
          .
          <source>In Proc. of the 39th ACM SIGIR. ACM</source>
          ,
          <volume>869</volume>
          -
          <fpage>872</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Iacobacci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Pilehvar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>SENSEMBED: Learning Sense Embeddings for Word and Relational Similarity</article-title>
          .
          <source>In Proc. of the 53rd ACL and the 7th IJCNLP</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>95</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sordoni</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Constraining Word Embeddings by Prior Knowledge-Application to Medical Information Retrieval</article-title>
          . In AIRS. Springer,
          <fpage>155</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Case-Based Retrieval Using Document-Level Semantic Networks</article-title>
          .
          <source>In Proc. of the 41st ACM SIGIR. ACM</source>
          ,
          <volume>1451</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Implicit-Explicit Representations for Case-Based Retrieval</article-title>
          .
          <source>In Proc. of DESIRES</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In Proc. of NIPS</source>
          <year>2013</year>
          .
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tamine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soulier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Souf</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Learning Concept-Driven Document Embeddings for Medical Information Search</article-title>
          .
          <source>In Proc. of AIME 2017</source>
          . Springer,
          <fpage>160</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tamine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soulier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Souf</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A Tri-Partite Neural Document Language Model for Semantic Information Retrieval</article-title>
          .
          <source>In Proc. of ESWC 2018</source>
          . Springer,
          <fpage>445</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simpson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>State-of-the-Art in Biomedical Literature Retrieval for Clinical Cases: a Survey of the TREC 2014 CDS track</article-title>
          .
          <source>IRJ 19</source>
          ,
          <issue>1</issue>
          -
          <fpage>2</fpage>
          (
          <year>2016</year>
          ),
          <fpage>113</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>C. Van Gysel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Rijke</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kanoulas</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Neural Vector Spaces for Unsupervised Information Retrieval</article-title>
          .
          <source>ACM TOIS 36</source>
          ,
          <issue>4</issue>
          (
          <year>2018</year>
          ),
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>