<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identi cation of Disease Symptoms in Multilingual Sentences: an Ontology-Driven Approach?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>DIBRIS</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universita degli Studi di Genova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy angelo.ferrando@dibris.unige.it</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>silviobeux@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>viviana.mascardi@unige.it</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>PRHLT</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universitat Politecnica de Valencia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spain prosso@dsic.upv.es</string-name>
        </contrib>
      </contrib-group>
      <fpage>6</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this paper we present a Multilingual Ontology-Driven framework for Text Classi cation (MOoD-TC). This framework is highly modular and can be customized to create applications based on Multilingual Natural Language Processing for classifying domain-dependent contents. In order to show the potential of MOoD-TC, we present a case study in the e-Health domain.</p>
      </abstract>
      <kwd-group>
        <kwd>Multilingual Natural Language Processing</kwd>
        <kwd>Ontology-Driven Text Classi cation</kwd>
        <kwd>BabelNet</kwd>
        <kwd>Symptom Disease Identi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The large amount of digital data made available in the last years from a wide
variety of sources raises the need for automatic methods to extract meaningful
information from them. The extracted information is precious for many purposes,
and especially for commercial ones. Jackson and Moulinier [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] observe that
\there is no question concerning the commercial value of being able to classify
documents automatically by content. There are myriad potential applications of
such a capability for corporate Intranets, government departments, and Internet
publishers".
      </p>
      <p>
        The problem of classifying multilingual pieces of text was addressed since the
end of the last millennium [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] but it is still a signi cant problem because each
language has its own peculiar features, making the automatic management of
multilingualism an open issue.
      </p>
      <p>
        The use of ontologies to classify multilingual texts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a good alternative
to standard machine learning approaches in all those situations where a training
set of documents is not available or it is too small to properly train the
classi er. Ontology-driven text classi cation does not depend on the existence of a
training set, as it relies solely on the entities, their relationships, and the
taxonomy of categories represented in an ontology, that becomes the driver of the
? The rst author of this paper is a PhD student in Computer Science at the
University of Genova, Italy. The work of the last author was in the framework of the
SomEMBED MINECO TIN2015-71147-C2-1-P research project.
classi cation. Another advantage of ontology-driven classi cation is that
ontology concepts are organized into hierarchies and this makes possible to identify
the category (or the categories) that best classify the document's content, by
traversing the hierarchical structure.
      </p>
      <p>
        In this paper we present MOoD-TC (Multilingual Ontology Driven Text
Classi er [
        <xref ref-type="bibr" rid="ref13 ref3">3, 13</xref>
        ]), a highly modular system which has been conceived, designed
and implemented to be customized by the system developer for obtaining di
erent domain-dependent behaviors, always centered around the multilingual text
classi cation process. The original contribution of this paper is the exploitation
of the core \multilingual word identi cation" functionalities of MOoD-TC for a
challenging scenario in the e-Health domain, where classi cation is a by-product
of disease symptoms identi cation in multilingual pieces of text, driven by a
standard symptoms ontology. A customization of MOoD-TC with an ad-hoc
module equipped with pre- and post-processing facilities suitable for the
scenarios that motivate our work, is also described.
      </p>
      <p>The paper is organized as follows: Section 2 introduces three motivating
scenarios where an ontology-driven multilingual text classi cation may prove
useful, Section 3 analyzes the state of the art, Section 4 describes MOoD-TC,
Section 5 provides examples and experimental results, and Section 6 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivating scenarios</title>
      <p>Alice is enjoying her holidays in Stockholm. Suddenly, she feels a painful spasm
to her stomach and in a few minutes a strong feeling of nausea appears. Spasms
go on for half an hour, and she starts to feel worried. She does not think it is
the case to go to the hospital, but she would at least ask for advice over the
phone. However, she cannot speak Swedish and, in the stressful situation she is
experiencing, she cannot recall how to express her health problems in English.
She could speak in her native language Italian, but it is not so likely that the
doctor can speak Italian as well.</p>
      <p>Bob is making a walk in his town. He notices a young man bending over
his knees, with a scared expression on his face. He runs to help him, and he
understands that the problem is with his chest. The young man speaks French
only and Bob cannot understand him: he calls the rst aid emergency number
and explains what he is seeing and what he supposes to be taking place. If he
could understand what the young man says, he would be de nitely more helpful.</p>
      <p>Carol is a volunteer in Honduras. She is neither a physician nor a nurse.
She has a very basic knowledge of rst aid procedures and a rst aid kit with
medicines that she knows how to administer, given a clear diagnosis. A woman
runs towards her asking for her assistance. The woman's small boy has a problem
with his head and he has a high fever but, without understanding the other
symptoms that the woman is trying to explain in Spanish, Carol cannot recognize
and classify the problem. In the remote place where she is, she cannot contact
the doctor. Carol should need to understand the other symptoms besides fever
and headache, in order to select the correct medicine.</p>
      <p>The three scenarios above are all characterized by the impossibility for the
doctor to visit the patient on-the- y and the need for the patient to be
understood despite language barriers, in order to get advice for minor problems or to
speed up the assistance procedure for major ones. The patient's need could be
suitably addressed by identifying and translating symptoms from her language
to the assistant's or the doctor's one. If automatic tools for facing this issue
were available, for example as an app installed on the mobile phone, the three
situations could evolve in the following way:
{ Scenario 1: through the use of an app, the person needing care
communicates with the \health emergency" software application in her own
language. The application performs a speech-to-text translation, identi es the
symptoms in the text based on a standard ontological
representation of symptoms, and sends the list of symptoms expressed in the doctor's
language to a center where they are managed either by intelligent software
agents or by human experts.
{ Scenario 2: the \health emergency" software application is not directly used
by the person needing care, but by the one who assists her. Like before, the
assisted person can \tell" her problems to the application which performs a
speech-to-text translation and identi es the symptoms represented in a
domain ontology which appear in the text. The symptoms, translated
into the language of the person who his giving the rst assistance, may
be read on the screen. That person can call the national rst aid number,
telling what is happening, what she sees, and the symptoms which have been
understood, classi ed, and translated by the app.
{ Scenario 3: also in this case, besides a speech-to-text translation, the
symptoms expressed in the language of the patient are identi ed
w.r.t. a symptoms ontology and translated into the target language.
The way this information is used can require a further automatic processing
stage, if the doctor cannot be involved in the loop and the person providing
aid needs an automatic support for making a diagnosis and identifying the
right therapy to administer.</p>
      <p>In all the three situations above, a standard machine translation application
and a symptoms classi er based on machine learning might not be powerful
enough: the pre- and post-processing stages require to have a machine-readable
explicit representation of symptoms, in some vocabulary agreed upon by all the
application components and by the humans involved in the loop, in order to share
them among the application components (both at the client and at the server
side) and to reason about them if needed. A multilingual ontology-driven text
classi cation approach seems the right way to face these challenging scenarios.
3</p>
    </sec>
    <sec id="sec-3">
      <title>State of the art</title>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in 1996 more than 80% of Internet users were native English
speakers. This percentage has dropped to 55% in 2000 and to 27.3% in 2010.
However, about 80% of the digital resources available today on the Web
(including deep Web and digital libraries) are in English [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This calls for the urgent
need of establishing multilingual information systems and Cross-Language
Information Retrieval (CLIR) facilities. How to manipulate the large volume of
multilingual data has now become a major research question.
      </p>
      <p>
        In this paper we are interested in Natural Language Processing (NLP)
techniques for solving multilingual term identi cation and text classi cation
problems in the e-Health domain where extracting information from clinical notes
has been the focus of a growing body of research in the past years [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Common
characteristics of narrative text used by physicians in electronic health records
make the automatic extraction of meaningful information hard. NLP techniques
are needed to convert data from unstructured text to a structured form
readily processable by computers [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This structured representation can be used
to extract meaning and enable Clinical Decision Support systems that assist
healthcare professionals and improve health outcomes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Signs and symptoms have seldom been studied for themselves in the eld
of biomedical information extraction. Indeed, they are often included in more
general categories such as \clinical concepts" [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], \medical problems" [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] or
\phenotypic information" [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Moreover, most of the available studies are based
on clinical reports or narrative corpora. In [
        <xref ref-type="bibr" rid="ref11 ref18">11, 18</xref>
        ], indeed, the aim consists in
symptom extraction from clinical records and in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] the authors identify the risk
factors for heart disease based on the automated analysis of narrative clinical
records of diabetic patients.
      </p>
      <p>Another recent project in e-Health NLP context is the IBM Watson for
Oncology1. It has an advanced ability to analyze the meaning and context of
structured and unstructured data in clinical notes and reports, easily assimilating
key patient information written in plain English that may be critical to select
a treatment pathway. These works are di erent from ours because they do not
address multilingual aspects and, furthermore, because they have to manage
the di erences between the \signs", which are identi ed by clinicians, and the
\symptoms", which can be described directly by the sick person.</p>
      <p>In our work we do not have to manage clinical records but directly the
information provided by the person who feels sick. This di erence is crucial in works
using an ontology-driven approach, because clinical reports contain many more
technical words2 compared to a text written (or a sentence told) by a normal
person describing how she feels. This allows us to use simpler ontologies.
Especially from the multilingual viewpoint, having an ontology containing simple
concepts, omitting useless technicalities, allows achieving better results with less
e ort, considering that a technical word could be less supported by the tools we
use during our text classi cation pipeline.</p>
      <p>
        The assumption upon which MOoD-TC relies, is the availability of
ontologies in the domain of interest. Even if the application developer might design and
implement her own domain ontology from scratch, integrating well-founded and
widely used ontologies into MOoD-TC would be the most modular, reusable
and scienti cally acceptable approach. Luckily, many domain ontologies exist, in
particular in the biomedical domain. Panacea [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the Ontology for General
Medical Science3, and the Gene Ontology4 are just a few recent examples, besides
the \symptoms ontology" used for our experiments and discussed in Section 5.
1 http://www.ibm.com/smarterplanet/us/en/ibmwatson/watson-oncology.html
2 A clinical report is written by a doctor.
3 https://bioportal.bioontology.org/ontologies/OGMS
4 http://geneontology.org/
      </p>
    </sec>
    <sec id="sec-4">
      <title>MOoD-TC</title>
      <p>
        MOoD-TC has been developed as part of Silvio Beux' Masters Thesis [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
starting from [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Its aim is to classify multilingual textual documents according to
classes described in a domain ontology. MOoD-TC consists of the Text
Classi er (TC) and the Application Domain Module (ADM). It provides a set of
core modules o ering functionalities which are common to any text classi
cation problem (text pre-processing, tagging, classi cation) plus a customizable
structure for those modules which can be implemented by the developer in order
to o er application-speci c functionalities. It returns a classi cation of the text
w.r.t. the ontology taken as input. The classi cation performed by TC which
is implemented in Java and exploits the Language Detector Library5, BabelNet
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and TreeTagger6.
      </p>
      <p>The Language Detector Library detects, with a precision greater than 99%,
53 languages making use of Naive Bayesian lters. It is devoted to recognize
the language Lo of the ontology o and the language Ld of the textual document
d. The TreeTagger tool performs the tagging of d in order to obtain, for each
word w 2 d di erent from a stop word, its lemma (the canonical form of the
word) and its part of speech (POS). This information is used by BabelNet to
perform the translation of w into the ontology language. Finally, the translated
word w0 is searched inside the ontology and contributes to the classi cation of
d in the category modeled by the ontology concept c having the same semantics
as w0. The Classi erObject is the object that stores a correctly classi ed word
(and additional information) of the document d with respect to o. TC returns
a list of such objects. ADM specializes the text classi er task by implementing
functionalities for pre- and post- processing a multilingual textual document. If
an ADM is used, the entire system specializes its behaviour in the domain
represented by that particular ADM (e.g., from text classi er to disease recognizer).
In our system TC can work alone, but an ADM is meant to work in close
connection with the core system. The core modules are implemented to work for the
European languages (which share some common features like, for example, the
relationship between noun and adjective), but they could be extended to cope
with the peculiar features of other languages; in fact, thanks to the modularity
of the system, it is possible to integrate di erent algorithms created speci cally
to handle that peculiarities, without modifying the entire system. The ADM
processes the TC input and output in order to obtain a new domain oriented
tool. An ADM is composed by two sub-components: pre-processing and
postprocessing. The pre-processing component takes as input a digital object (for</p>
      <sec id="sec-4-1">
        <title>5 https://code.google.com/p/language-detection/ 6 http://code.google.com/p/tt4j/</title>
        <p>example a spoken sentence, in the scenarios discussed in Section 2) and returns
a new processed text, while the post-processing component takes as input the
TC output and returns a domain dependent result. Figure 1 shows the entire
pipeline of the integration process between the TC and the ADM.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Exploiting MOoD-TC for Symptom Identi cation</title>
      <p>As illustrated in Section 2, the scenarios we aim to address require that disease
symptoms appearing in a text are correctly identi ed w.r.t. a domain ontology.
The pre-processing stage consists of moving from a spoken sentence to a text
and the post-processing in translating the identi ed symptoms into a target
language and, depending on the scenario, moving back from text to speech and/or
reasoning over them. In the sequel we discuss the experiments related with our
main task, namely that of symptoms identi cation.</p>
      <p>The domain ontology used for describing symptoms is a standard ontology
named the symptoms ontology 7, partially shown in Figure 2. It is an ontology of
disease symptoms with symptoms encompassing perceived changes in function,
sensations or appearance reported by a patient and indicative of a disease. We
stress that our experiments in exploiting MOoD-TC for symptom identi
cation did not require to build any new ontology. Rather, consistently with the
good principle of reusing existing software whenever available and, in particular,
reusing existing ontologies, we just passed the symptoms ontology as input to
the TC, obtaining the results discussed in the next section.</p>
      <p>In the sequel we discuss our initial experiments with phrases in ve di
erent languages (English, French, German, Italian, Spanish), where symptoms are</p>
      <sec id="sec-5-1">
        <title>7 http://purl.obolibrary.org/obo/symp.owl</title>
        <p>identi ed by the TC module. The classi cation of two sample sentences is shown
below, where the TC GUI screenshot associated with each sentence shows the
ontology concepts which appear in the text along with the number of their
occurrences in the text.</p>
        <p>Phrase 1 (Italian language): \Credo di
avere la febbre, continuo a sudare e ho i
brividi. Non la smetto di tossire e fatico
a mangiare a causa del male alla gola,
come un forte bruciore. Mi sento
stanchissimo e ho dolore a tutti i muscoli."
Phrase 3 (Spanish language): \Me
siento fatal. Tengo temperatura, vomito
y diarrea. Hace dos d as que no consigo
comer nada. Tengo nausea y mareos."</p>
        <p>
          The experiments have been carried out on 32 sentences for each of the 5
languages, for a total of 160 sentences. Each sentence describes symptoms
related to one of the following sixteen disease: tinnitus, food allergy, cervical,
dehydration, hyperthyroidism, u, appendicitis, food poisoning, labyrinthitis,
narcolessia, pneumonia, diabetes type 1, hyperglycemia, hypoglycemia,
bronchitis, jet lag (two sentences for each disease). To cover the widest range of cases
we considered the diseases with the most varied symptoms. The description of
symptoms associated with each disease has been retrieved from [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and each
sentence contains 2 up to 9 symptom words. The sentences were manually created
by the authors.
        </p>
        <p>Since the nal purpose of this work is to provide an automatic diagnostic
system with as many symptoms as possible, in order to devise the correct
diagnosis, we were mainly interested in symptoms which appear in the text but
which are not identi ed by our classi er (false negatives). We also looked for
false positives, but their number is so low to be irrelevant for our experiments.
Also, false positives are due to an under classi cation, rather than an actually
wrong classi cation: if the text contains the \abdominal cramp" symptom, for
example, and it is classi ed with the more general \abdominal symptom"
concept, we consider this result a false positive as a more speci c concept could have
been returned. Figure 3 shows the average number of symptoms that should have
been identi ed w.r.t the correctly identi ed symptoms in the ve considered
languages. Figure 4 shows the number of false negatives (y axis) for disease (x axis).
Figure 3 demonstrates that the results greatly vary with the disease. For
example, symptoms related to tinnitus are hardly classi ed, but this can be easily
explained by observing the ontology we used, where problems related to ears are
not modeled at all. By carefully analyzing the obtained results, we also realized
that sometimes the performances of the classi er are worsened by the presence of
a symptom in the text which has a di erent grammatical role than the symptom
in the ontology (usually a noun), making their matching impossible although the
word root and the meaning are the same. For example, the ontology contains
the noun \irritability", but if the text contains the adjective \irritable" (in any
language), the identi cation fails. This problem is due to the way the root of a
word is computed, and to the way words are managed in BabelNet.</p>
        <p>What emerges from Figure 4 is that false negatives have a very similar
behavior despite the language of the sentence. This is again due to the two reasons
discussed above. Despite these problems, which have a clearly understood
motivation and which can be addressed by extending the ontology and by re ning
the management of word root extraction, MOoD-TC has demonstrated to be
a exible and ready-to-use approach for multilingual symptoms identi cation
driven by a standard ontology we retrieved on the web.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper we presented the MOoD-TC architecture showing its possible use
in the symptoms identi cation problem. The speech-to-text pre-processing stage
can be faced using existing tools, and the post-processing stage with a translation
of the identi ed symptoms into the doctor's language can be addressed using
BabelNet, in the same way we exploit BabelNet for bridging the text, whatever its
language, and the ontology. The more challenging post-processing stage of
supporting the user in providing a diagnosis given a set of identi ed symptoms could
be addressed by means of sophisticated expert system such as the old and well
known MYCIN [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and more recent projects (http://www.easydiagnosis.com/,
https://www.diagnose-me.com/, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), some of which are ontology-driven [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Our framework does not face many well known open problems in multilingual
text classi cation and information extraction such as negation [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and named
entities, but rather it provides a exible and modular approach ready for
integrating, with limited e ort, the results and algorithms addressing the above
problems coming from the research community.
      </p>
      <p>In the short time, our work will be devoted to overcome the problems that
limit the performances of MOoD-TC in the considered scenario: we will make
the word identi cation more exible and we will extend the symptoms ontology
with those symptoms which have not been modeled so far.</p>
      <p>In the future, it would be interesting to run an experimental comparison
between our approach and a machine learning one. In case of a limited number
of labeled examples, in fact, it would be feasible to apply semi-supervised learning
methods. Depending on the comparison results, we will also consider to combine
both approaches, using a domain ontology to improve the results of a traditional
machine learning approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B.</given-names>
            <surname>Al-Hamadani</surname>
          </string-name>
          .
          <article-title>CardioOWL: An ontology-driven expert system for diagnosing coronary artery diseases</article-title>
          .
          <source>In 2014 IEEE Conference on Open Systems (ICOS)</source>
          , pages
          <fpage>128</fpage>
          {
          <fpage>132</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Ambilwade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Manza</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          .
          <article-title>Medical expert systems for diabetes diagnosis: A survey</article-title>
          .
          <source>Int. J. of ARCSSE</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Beux. MOoD-TC</surname>
          </string-name>
          :
          <article-title>A general purpose multilingual ontology driven text classi er</article-title>
          .
          <source>Master's Degree Thesis in Computer Science</source>
          , University of Genova, Italy,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Buchanan</surname>
          </string-name>
          and
          <string-name>
            <surname>E. H.</surname>
          </string-name>
          <article-title>Shortli e</article-title>
          .
          <source>Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project. Addison-Wesley</source>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. G. de Melo and
          <string-name>
            <given-names>S.</given-names>
            <surname>Siersdorfer</surname>
          </string-name>
          .
          <article-title>Multilingual text classi cation using ontologies</article-title>
          .
          <source>In ECIR Conference, Proceedings</source>
          , volume
          <volume>4425</volume>
          <source>of LNCS</source>
          , pages
          <volume>541</volume>
          {
          <fpage>548</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>McDonald</surname>
          </string-name>
          .
          <article-title>What can natural language processing do for clinical decision support?</article-title>
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>42</volume>
          (
          <issue>5</issue>
          ):
          <volume>760</volume>
          {
          <fpage>772</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Doulaverakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nikolaidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kleontas</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Kompatsiaris. Panacea,</surname>
          </string-name>
          <article-title>a semantic-enabled drug recommendations discovery framework</article-title>
          .
          <source>J. Biomedical Semantics</source>
          ,
          <volume>5</volume>
          :
          <fpage>13</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Global</given-names>
            <surname>Reach</surname>
          </string-name>
          .
          <source>Global internet statistics (by language)</source>
          .
          <source>Technical report, Global Reach</source>
          ,
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>H. W.</surname>
          </string-name>
          <article-title>Gri th. Complete guide to symptoms, illness &amp; surgery for people over 50</article-title>
          . Body Press/Perigee New York, NY,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. B.
          <string-name>
            <surname>Guo-Wei</surname>
            and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hsin-Hsi</surname>
          </string-name>
          .
          <article-title>Cross-language information access to multilingual collections on the Intenet</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          ,
          <volume>51</volume>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. H.
          <string-name>
            <surname>Harkema</surname>
            ,
            <given-names>I. Roberts</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gaizauskas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hepple</surname>
          </string-name>
          .
          <article-title>Information extraction from clinical records</article-title>
          . In S. Cox, editor,
          <source>Proceedings of the 4th UK e-Science All Hands Meeting</source>
          , Nottingham, UK,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>P.</given-names>
            <surname>Jackson</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Moulinier</surname>
          </string-name>
          .
          <source>Natural Language Processing for Online Applications: Text Retrieval, Extraction &amp; Categorization. John Benjamins</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>M. Leotta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Beux</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Mascardi</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Briola</surname>
          </string-name>
          .
          <article-title>My MOoD, a multimedia and multilingual ontology driven MAS: design and rst experiments in the sentiment analysis domain</article-title>
          .
          <source>In ESSEM Workshop</source>
          , Proceedings, pages
          <volume>51</volume>
          {
          <fpage>66</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Meystre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Savova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Kipper-Schuler</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Hurdle</surname>
          </string-name>
          .
          <article-title>Extracting information from textual documents in the electronic health record: a review of recent research</article-title>
          .
          <source>Yearbook of medical informatics</source>
          , pages
          <volume>128</volume>
          {
          <fpage>144</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>Nadkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ohno-Machado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          .
          <article-title>Natural language processing: An introduction</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ):
          <volume>544</volume>
          {
          <fpage>551</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          . Babelnet:
          <article-title>The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>193</volume>
          :
          <fpage>217</fpage>
          {
          <fpage>250</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Dorr</surname>
          </string-name>
          .
          <article-title>A survey of multilingual text retrieval</article-title>
          .
          <source>Technical report</source>
          , College Park, MD, USA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burgun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Mabo</surname>
          </string-name>
          .
          <article-title>Using regular expressions to extract information on pacemaker implantation procedures from clinical reports</article-title>
          .
          <source>In Proceedings of the AMIA Annual Symposium</source>
          , Washington DC, USA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>B. R.</given-names>
            <surname>South</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Garvin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Samore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Gundlapalli</surname>
          </string-name>
          .
          <article-title>Developing a manually annotated clinical document corpus to identify phenotypic information for in ammatory bowel disease</article-title>
          .
          <source>BMC Bioinformatics</source>
          , 10(S-9):
          <fpage>12</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>A.</given-names>
            <surname>Stubbs</surname>
          </string-name>
          , C. Kot la, H. Xu, and Ozlem Uzuner.
          <article-title>Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>58</volume>
          , Supplement:S67 {
          <fpage>S77</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>O.</given-names>
            <surname>Uzuner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>South</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. L.</surname>
          </string-name>
          <year>DuVall</year>
          .
          <year>2010</year>
          i2b2/
          <article-title>VA challenge on concepts, assertions, and relations in clinical text</article-title>
          .
          <source>JAMIA</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ):
          <volume>552</volume>
          {
          <fpage>556</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>K. B. Wagholikar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Torii</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jonnalagadda</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , et al.
          <article-title>Pooling annotated corpora for clinical concept extraction</article-title>
          .
          <source>J. Biomedical Semantics</source>
          ,
          <volume>4</volume>
          :
          <fpage>3</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>M. Wiegand</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Balahur</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Klakow</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Montoyo</surname>
          </string-name>
          .
          <article-title>A survey on the role of negation in sentiment analysis</article-title>
          .
          <source>In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing</source>
          , NeSp-NLP '
          <volume>10</volume>
          , pages
          <fpage>60</fpage>
          {
          <fpage>68</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2010</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>