<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MeSHx-Notes: Web-System for Clinical Notes Information Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henrique D. P. dos Santos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael O. Nunes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jo~ao E. Soares</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renata Vieira</string-name>
          <email>renata.vieira@pucrs.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Technology at Ponti cal Catholic University of Rio Grande do Sul joao.etchichury</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present MeSHx-Notes, MeSH eXtended for clinical notes, a multi-language web system based on the Django framework to present information selected in clinical notes. MeSHx-Notes extends Medical Subject Headings (MeSH) terms with Word Embeddings with similar semantic/syntactic words. Since MeSH is available for 15 languages, MeSHx-Notes is easily extendable by replacing the MeSH thesaurus with the target language. In this demo, we show examples with Portuguese and English.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-language</kwd>
        <kwd>Web System</kwd>
        <kwd>Clinical Notes</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>MeSH</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Electronic Health Records (EHR) play an important role in hospital
environments, bringing many bene ts in terms of patient safety, e ectiveness and e
ciency of care, and patient satisfaction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Records of health care practices in
hospitals generate a rich and large amount of patient information and an
intrinsic relation between symptoms, diseases, drug interaction, and diagnoses that
may be used for many purposes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>This study aims to help healthcare professionals concerning the
understanding of what has been informed by a clinical note. This is possible through the
use of Natural Language Processing (NLP), combined with the MeSH
dictionary. The system consists of a web application that exhibits the meaning and
the related words of the main terms used in clinical notes, thus enhancing the
understanding of what is reported.</p>
      <p>
        Other systems, such as cTAKES [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], rely on several UMLS sources for English
to provide several information from clinical notes. We focus on developing a
userfriendly and easy-handling UI through a web application, portable for languages
other than English, using a language-speci c MeSH thesaurus.
      </p>
      <p>In this context, we present an easy-to-use system that provides users with
extra knowledge of the information given in clinical notes, which can be used by
anyone with access to the internet.
The system consists of a web application that receives clinical notes, identi es
the main terms, and then, returns their de nition, similar words and a link
to the MeSH dictionary. Its development is based on Python, Django, Pandas,
Bootstrap, JQuery, Word Embeddings, XPath, and the MeSH thesaurus.
2.1</p>
      <sec id="sec-1-1">
        <title>Data Source</title>
        <p>
          Three resources are used to develop MeSHx-Notes, as described below. In
addition, we describe the process to generate the word embedding vectors.
Electronic Health Records The Portuguese dataset was obtained from
Hospital Nossa Senhora da Conceic~ao (HNSC). The English dataset was obtained
from i2b2 Challenge [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] from 2008 to 2012. It is a set of nine datasets from several
shared tasks promoted by Informatics for Integrating Biology and the Bedside
(i2b2).
        </p>
        <p>Medical Subject Headings (MeSH) MeSH is a "controlled vocabulary"
Metathesaurus, developed by the National Library of Medicine (NLM). As of
2013, MeSH has 54,935 entries where each entry has a unique tree number and
consists of 26,851 main headings and 213,000 entry terms that increase the power
of classi cation of medical documents.</p>
        <p>
          Word Embeddings Word vectors are a way of mapping words in a numerical
space. A latent syntactic/semantic vector for each word is induced from a large
unlabeled corpus. The Portuguese and English model for the word embeddings
was trained with Word2Vec [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For the Portuguese version, we used 21 million
sentences from HNSC's medical records, trained with 50 dimensions per word
and 100 minimum word count. This training resulted in 63 thousand word vectors
used as a semantic model in the neural network below. For the English version,
we used 171 thousand sentences from the i2b2 challenge dataset, trained with 50
dimensions and 10 minimum word count, resulting in 17 thousand word vectors.
2.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Back-end</title>
        <p>First, the MeSH dictionary is generated, using previously saved data in an
XML le, containing ID, name, scope, terms, and quali er. The dictionary is
enriched by identifying similar words using Word Embeddings, so that we
provide a greater range of terms, which are stored in the terms eld. We consider
higher similarity degrees to identify those words. An initial evaluation made by
the authors resulted in 67% accuracy.</p>
        <p>MeSHx-Notes: Web-System for Clinical Notes Information Extraction
Heading Original Terms New Similar Terms
Abdomem abdomem, belly abd, abdome...</p>
        <p>Celecoxib celecoxib, celebrex norvasc, losartan...</p>
        <p>Abscess abcesso, absceso abscess, abscesses...</p>
        <p>In Table 1 we show some concepts that are commonly used in clinical notes.
Each concept has a heading in the MeSh dictionary, its terms, and the new
identi ed terms. For example, the heading "Abscesso" had "abcesso" and "absceso"
as the original terms, and "abscess" and "abscesses" were added as new terms.</p>
        <p>After that step, we read the clinical notes, using Pandas, in the web
application, using Django as the development framework. Each word found in the
dictionary is captured and the lists of original and new similar words are stored.
2.3</p>
      </sec>
      <sec id="sec-1-3">
        <title>Front-end</title>
        <p>When a clinical note is shown to the user, the words from the (enriched)
dictionary are highlighted. These words are shown in di erent colors, according to the
classes: medication, diagnosis, procedure or anatomy. We provide users with not
only that, but also a navigation bar to go through all the desired clinical notes.
This page is developed using JQuery and Bootstrap.
MeSHx-Notes is presented for Portuguese and English samples. We use Word
Embeddings for the dictionary expansion, in Portuguese and English.</p>
        <p>In the web page, buttons are provided to navigate between clinical notes and
to change the language. Besides, the clinical note description is given with data
about the patient record and its modi cation date with a concomitant section of
legends that are related to the classi cation of the terms. Nonetheless, identi ed
words are underlined according to their classi cation, so that, when clicked, they
show their technical name, ID, description, terms with similar meanings, and a
link to the MeSH description website.
4</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusion and Further Work</title>
      <p>MeSHx-Notes is able to provide, both for health professional and for non-specialists,
a simple tool that enables a better understanding of the terms used in clinical
notes in a clear, concise, accessible way. The source code is available on the
project's Github page1, and the demo is found on the group's website2. As
further work, we plan to use bigram and trigram embeddings to nd similar
multi-word expressions.</p>
      <p>Acknowledgments This work was partially supported by CAPES (Coordenac~ao
de Aperfeicoamento de Pessoal de N vel Superior) Foundation (Brazil), PUCRS
(Ponti cal Catholic University of Rio Grande do Sul) and UFRGS (Federal
University of Rio Grande do Sul).
1 https://github.com/nlp-pucrs/meshx-notes
2 http://grupopln.inf.pucrs.br/meshx</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Buntin</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoaglin</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blumenthal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The bene ts of health information technology: a review of the recent literature shows predominantly positive results</article-title>
          .
          <source>Health a airs</source>
          <volume>30</volume>
          (
          <issue>3</issue>
          ),
          <volume>464</volume>
          {
          <fpage>471</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Savova</surname>
            ,
            <given-names>G.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masanz</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogren</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohn</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kipper-Schuler</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chute</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          (
          <issue>5</issue>
          ),
          <volume>507</volume>
          {
          <fpage>513</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nogueira</surname>
            ,
            <given-names>V.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Ferramentas e tecnologias para a integraca~o e extraca~o de informaca~o hospitalar</article-title>
          . INF - Artigos em Livros de Actas/Proceedings (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          .,
          <string-name>
            <surname>South</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , DuVall, S.L.:
          <year>2010</year>
          i2b2/
          <article-title>va challenge on concepts, assertions, and relations in clinical text</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>18</volume>
          (
          <issue>5</issue>
          ),
          <volume>552</volume>
          {
          <fpage>556</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>