=Paper= {{Paper |id=Vol-2167/short12 |storemode=property |title=Implicit-Explicit Representations for Case-Based Retrieval |pdfUrl=https://ceur-ws.org/Vol-2167/short12.pdf |volume=Vol-2167 |authors=Stefano Marchesin |dblpUrl=https://dblp.org/rec/conf/desires/Marchesin18 }} ==Implicit-Explicit Representations for Case-Based Retrieval== https://ceur-ws.org/Vol-2167/short12.pdf
        Implicit-Explicit Representations for Case-Based Retrieval
                                                           Stefano Marchesin
                                                Department of Information Engineering
                                                       University of Padua, Italy
                                                     stefano.marchesin@unipd.it

ABSTRACT                                                                phrases, implicit relations within documents can be considered
We propose an IR framework to combine the implicit representa-          too. Thus, distributional and knowledge-based representations of
tions — identified using distributional representation techniques —     complex semantics (i.e. words, sentences and documents) identify
and the explicit representations — derived from external knowledge      complementary semantic aspects of the underlying documents.
sources — of documents to improve medical case-based retrieval.            We propose to integrate documents’ knowledge-based repre-
Combining implicit-explicit representations of documents aims at        sentations in case-based retrieval [2], as a form of complementary
enriching the semantic understanding of documents and reducing          refinement for distributional representations. Recent approaches
the semantic gap between documents and queries.                         exploit semantic relations to enhance the quality of learned word or
                                                                        concept representations [1, 4], we propose to explicitly leverage se-
CCS CONCEPTS                                                            mantic relations to model document representations. Therefore, our
                                                                        approach extracts concepts from documents (and queries) and con-
• Information systems → Document representation; Query
                                                                        nect them using the semantic relations contained within a reference
representation; Information extraction;
                                                                        knowledge source — creating a knowledge graph representation for
                                                                        the document (or query). The intuition is that semantic relations
KEYWORDS                                                                carry high informative power that can boost precision.
Relation-based information retrieval; Semantic gap                         The knowledge graph representation can reduce the contextual
                                                                        dependency of distributional representations and help discriminat-
1     MOTIVATIONS AND METHODOLOGY                                       ing more effectively semantically similar from non-semantically
Medical literature published every year keeps growing drastically.      similar texts. Besides, since the concepts considered are only those
Clinicians have limited time to retrieve relevant information from      extracted from a document or a query, the problem of topic drift —
medical literature. Standard Information Retrieval (IR) systems are     occurring when the query is expanded with concepts that are not
not able to cope with the amount of literature and the limited time     pertinent to the information need — is reduced.
available to clinicians. Therefore, there has been a strong interest       We propose to combine implicit and explicit representations for
for Clinical Decision Support (CDS) systems designed to produce         case-based retrieval in two different ways: (i) considering document-
effective and timely knowledge, that can help clinicians in the         level knowledge graphs as additional inputs for end-to-end neural
decision making process. Such systems are known as case-based           scoring models that learn the relevance of document-query pairs
retrieval systems. Given a medical case of interest, a case-based       via semantic features; (ii) considering document-level knowledge
retrieval system should retrieve highly related medical literature      graphs with pseudo relevance feedback to boost documents in top
from a large collection of medical literature.                          positions that present a more similar graph compared with the
   A key characteristic of the medical literature is the large use of   query graph. Both approaches aim at reducing the semantic gap
synonyms and context-specific expressions. Such characteristics         between queries and documents.
increase the semantic gap between documents and queries.
   To tackle the problems above, both deep representation learning      ACKNOWLEDGMENTS
methods and external knowledge sources have been used. Deep             Supported by the CDC-STARS project of the University of Padua.
representation learning effectively discovers hidden structures that
relate — through latent semantic features — the different textual       REFERENCES
components, be them words, sentences or documents [3]. External         [1] Manaal Faruqui, Jesse Dodge, Sujay K. Jauhar, Chris Dyer, Eduard Hovy, and
                                                                            Noah A. Smith. 2014. Retrofitting word vectors to semantic lexicons. arXiv
knowledge resources, such as ontologies and knowledge bases,                preprint arXiv:1411.4166 (2014).
provide factual knowledge about the meaning of words and their          [2] Stefano Marchesin. 2018. Case-Based Retrieval Using Document-Level Semantic
semantic relationships.                                                     Networks. In 41st ACM SIGIR (SIGIR ’18). ACM, New York, NY, USA, 1451.
                                                                        [3] Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval.
   However, by formalizing the semantic relationships between               arXiv preprint arXiv:1705.01509 (2017).
different concepts, knowledge sources represent a partial (and spe-     [4] Gia-Hung Nguyen, Lynda Tamine, Laure Soulier, and Nathalie Souf. 2018. A Tri-
cific) representation of the world. Therefore, knowledge sources            Partite Neural Document Language Model for Semantic Information Retrieval.
                                                                            In European Semantic Web Conference. Springer, 445–461.
do not necessarily represent implicit relations that appear in doc-
uments. By learning distributional representations of words and


DESIRES 2018, August 2018, Bertinoro, Italy
© 2018 Copyright held by the author(s).