<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Annotation and Extraction of Relations from Italian Medical Records</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Attardi</string-name>
          <email>attardi@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vittoria Cozza</string-name>
          <email>cozza@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Sartiano</string-name>
          <email>sartiano@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica Università di Pisa Largo B. Pontecorvo</institution>
          ,
          <addr-line>3 I-56127 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We address the problem of extracting knowledge from large scale clinical records written in Italian by physicians. We perform recognition of relevant entities such as symptoms, diseases, treatments, measurements, drugs and so forth, and then we determine their semantic relations. We developed suitable training corpora in order to apply machine learning techniques to this task. We report on experiments performed on medical data provided in the context of a regional research project on technologies for health care.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Information extraction</kwd>
        <kwd>Natural language processing</kwd>
        <kwd>Semantic analysis</kwd>
        <kwd>Medical ontologies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Clinical records are a vast potential source of information for healthcare systems,
whose analysis may produce valuable data for building systems to support diagnosis,
to predict drug risks, to estimate the effectiveness of treatments. An electronic
medical record (EMR) provides detailed information on patient history, laboratory tests
and findings of a patient consultation, often expressed in a narrative style. Such
records abound in mentions of clinical conditions, anatomical sites, medications, and
procedures. Many different surface forms are used to represent the same concept and
the mentions are interleaved with modifiers, e.g., adjectives, verb or adverbs, or are
abbreviated. Sophisticated techniques of language analysis are required for
recognizing these mentions. The extracted data, to be amenable to further analysis and data
mining, has to be normalized, for example by mapping or linking entities to their
definitions in a widely used standard taxonomy, e.g., Snomed-CT1, ICD92 or, more
generally, to their key terminology from UMLS metathesaurus [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Finally certain
information must be contextualized, for example to a temporal duration or within a
statement assessing explicitly their validity. All these issues pose relevant challenges
for the current techniques of machine reading.
      </p>
      <p>In this paper we report on our approach for dealing with the following tasks:
medical entities recognition, mapping entities to a thesaurus, extracting measurements and
their associated entity and identifying whether the context of an expression is positive
or negative. We exploit both supervised machine-learning techniques, which require
annotated training corpora, and unsupervised deep learning techniques, in order to
leverage unlabeled data.</p>
      <p>
        For English several medical corpora with syntactic and semantic information are
available, manually annotated as the Shared Annotated Resources [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], while in
Italian there is a lack of such resources.
      </p>
      <p>
        Within the RIS project [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] we had access to a relevant number of medical records
from the Italian healthcare system that we used for building in a semiautomatic way a
training corpus annotated with medical entities [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and temporal expression [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In
this paper we extend the corpus annotations, including expressions denoting
physiological measures and the entity to which they refer. We also developed a corpus
containing annotations about entities present within a negative context. These corpora
have been used for training several classifiers, identifying medical entities in clinical
records, linking entities to UMLS CUIs (Concept Unique Identifiers), associating
them to measurements and identifying negative or speculative expression.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Named entity recognition, normalization and linking to thesauri are essential
preliminary tasks in biomedical record analysis.</p>
      <p>Approaches to the extraction of clinical concepts range from early symbolic NLP
systems, strongly dependent on domain knowledge, to machine learning systems
driven by the increasing availability of annotated clinical corpora.</p>
      <p>
        The 2014 SemEval Task 7 presented a challenge on the analysis of clinical records
from the ShARe resource [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The task focused on the recognition and normalization
of named entity mentions, those classified within the semantic group disorders [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] in
UMLS. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] a survey of all the systems used for the task is available; the state of
the art solutions are those using machine learning approaches and the most applied
tools are those using Conditional Random Fields (CRF), Support Vector Machines
(SSV) and DNorm.
      </p>
      <p>
        The best results were obtained by Tang et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] using an ensemble of learning
based systems, i.e., a CRF NER and a Structural Support Vector Machine (SSVM) for
disorder entity recognition; they developed a Vector Space Model (VSM) based
approach to find the most suitable CUI for a given disorder entity: disorder entity was
used as query and all the UMLS terms were treated as documents, then they used
cosine similarity score to rank the candidate terms. A novelty of their approach was
investigating three different types of word representation (WR) features for the NER,
including clustering-based representations, distributional representations and word
embedding [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ]. They achieved a precision of 83.4%, a recall of 78.6% and F-score
of 81.3% [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the authors explore two approaches to medical documents information
extraction in Italian: (i) a cascaded, two-stage method based on pipelining two taggers
generated via the well-known Linear-Chain Conditional Random Fields (LC-CRFs)
learner and (ii) a confidence-weighted ensemble method that combines standard
LCCRFs with the two-stage method above. They experiment on a dataset of 500
radiology reports in Italian annotated with 9 broad topics, by two annotators independently,
190 reports each. They build individual binary classifiers for each tag and evaluate
them separately: this assumes independence of tags, which does not hold for all cases
dealt in this paper. An average F1 score of 79.3% is obtained by applying the
ensemble method to the two test sets annotated by the same annotator (~119 reports).
      </p>
      <p>
        As for the relation extraction task, the approach presented in this work recalls the
approach presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The authors proposed a supervised machine learning
approach to discover relations among medical problems, treatments and medical tests
mentioned in electronic medical records. A rich set of features was developed for the
classifier, their experiments showed that lexical and contextual features are very
relevant for relation extraction. They validated their techniques in the 2010 i2b2
Challenge and obtained the highest F-score for the relation extraction task of 73.7%.
      </p>
      <p>As for task of detecting negative and speculative information, this is a very
common problem for medical report analysis, since these language forms are widely used
to express impressions, hypotheses, or explanations of experimental results.</p>
      <p>
        The author in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] focused on developing a system based on machine-learning
techniques that identifies negation and speculation signals and their scope in clinical
texts. The proposed system works in two consecutive phases: first, a classifier decides
whether each token in a sentence is a negation/speculation signal or not. Then another
classifier determines, at sentence level, the tokens affected by the signals previously
identified. The system was trained and evaluated on the clinical texts of the BioScope
corpus, a freely available resource consisting of medical and biological texts:
fulllength articles, scientific abstracts, and clinical reports. In the signal detection task,
the F-score value was 97.3% in negation and 94.9% in speculation. In the
scopefinding task, a token was correctly classified if it had been properly identified as being
inside or outside the scope of all the negation signals present in the sentence. They
achieved an F-score of 93.2% in negation and 80.9% in speculation.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Medical Training Corpus</title>
      <p>Our approach to the analysis of clinical records relies on machine leaning techniques
which require either annotated corpora for supervised training or a large set of
unannotated documents for unsupervised learning.</p>
      <p>
        Since Italian corpora annotated with mentions of medical entities are not easily
available, we created a corpus of Italian medical reports (IMR) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], annotated with
mentions of active ingredient, body part, sign or symptom, disease or syndrome, drug
and treatment.
      </p>
      <p>
        As detailed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the distribution of categories in the annotated IMR is listed in
Table 1.
      </p>
      <sec id="sec-3-1">
        <title>Entity Type</title>
        <p>
          Active Ingredient
Body part
Disease or Syndrome
Drug
Sign or Symptom
Treatment
The IMR also contains mentions of temporal expressions, extracted as in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Unstructured medical texts may also refer to various kinds of physiological
measurements. To extract this valuable information we used a NER for measurements. To
this aim, the IMR corpus has been annotated with a basic rule-based approach
(regular expression).</p>
        <p>We started from the list of units in the metric system (see
http://en.wikipedia.org/wiki/Metric_system) and filtered a subset of those actually
used in the IMR for measuring the following quantities: area, amount of substance,
energy, frequency, length, mass, power, pressure, speed, time and volume. To these
we added units for: aerobic capacity, concentration, dosage and flow as well as simple
numeric quantities and percentages. We applied a regular expression matcher to
identify expression consisting of numeric values in combination with these units. The
matcher detected 82.240 occurrences of measurements within the IMR distributed
among the following measure categories:
Regular expression matching fails short of identifying all possible variants of
measurement expressions used by physicians. For example the dosage of a drug or a
therapy is written in many variants, like: “1 cpr/die per 10 giorni”, “80
ml/ora”, “0,125 mg/die”, “5 mg/kg ogni 8 ore per 5-7 giorni”, “40 mg
in 250 cc”, “1800 Kcal/die”. It is also hard to identify measurements expressed
only by numbers without any indication of units (i.e., classe nyha: III), by a
partitive or when unusual units are used (“una bustina /die”, “due fl di
Lasix”).</p>
        <p>
          The annotations obtained in this way are to be considered only as a baseline
annotated corpus. We are planning to extend the corpus with manual annotations either by
experts or by crowd-sourcing as discussed in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. As mentioned earlier exploiting a
supervised machine learning approach is much more promising than using
handcrafted rules. We explain later how we developed a tagger for measurement: while the
tagger has been trained on the baseline corpus, it can easily be trained on a corpus
annotated with richer or more varies kinds of expressions.
        </p>
        <p>The IMR has been annotated adding in particular a different column for each group
of annotations according to the IOB format3, each additional column being
respectively the first one for body part and treatment, the second one for active principles,
diseases, drugs and signs, the last one for measurements. In the following example there
are three entities with different annotations: "ecocardiogramma" as a treatment,
"versamento pericardico" as a disease or syndrome, "16 mm" as the length.
3 IOB annotation format guidelines: http://en.wikipedia.org/wiki/Inside_Outside_Beginning
To deal with the information extraction of clinical records, we performed two step of
analysis:
 recognition of bio-medical entity mentions;
 mapping of entities to their unique UMLS CUI (Concept Unique Identifiers), when
applicable.</p>
        <p>For example, in a sentence containing “ulcere da decubito” we must identify “ulcera
da decubito”, even if it is expressed in a different number, and then map it to its
UMLS CUI, in this case: “C0011127”. The UMLS CUI allows obtaining the
corresponding ICD9-CM code, in this case “707.0”, which is important, since ICD9-CM is
the official annotation for diseases and treatments used in Italian healthcare systems.</p>
        <p>After mention identification, a further step is to discover semantic relations
between entities or the presence of negations.</p>
        <p>To identify entities of interest in text we used three classifiers: NER A, for body
parts and treatments; NER B for other medical entities; NER C for measurements.
NER A and NER B are used on sets of disjoint categories, i.e., each mention belongs
to a single category. NER C is applied to the output of the two other classifiers.</p>
        <p>The IMR corpus has been annotated with mentions as detailed in the previous
section.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We built three specialized Named Entity recognizers, one for extracting mentions of
body parts and treatments, one for extracting other clinical entities, one for
recognizing measurements. The first two were built separately since there are occurrences of
body parts within diseases or symptoms, e.g., “dolore alla spalla destra” is a symptom
and “spalla destra” is a body part.</p>
      <p>For the experiments we split the annotated corpus into train, development and test
sets, of size 80%, 10% and 10% respectively.</p>
      <p>
        In our previous works [
        <xref ref-type="bibr" rid="ref4 ref6 ref8">4,6,8</xref>
        ] we tested different NE recognizers. In the current
experiments we used the Tanl NER [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a generic, customizable statistical sequence
labeler. The tagger implements a Conditional Markov Model and can be configured to
use different classification algorithms and to specify templates for extracting features.
In our experiments, it has given overall best results in a configuration using a
L2regularized L2-loss support vector classifier.
      </p>
      <p>
        We experimented with various feature sets, including word shape features, as in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], dictionary features, prefix and suffix features, bigrams, last words, first words and
frequent words, all extracted from the training corpus.
Extracting mentions can be useful for certain statistical analyses of the content of
clinical records, for example counting occurrences and computing correlations.
However there are aspects of the content that might be missed or interpreted incorrectly.
For example certain mentions may appear within a negative (assenza di febbre)
or speculative context (probabile trauma). Accurate analysis of the report requires
distinguishing these cases. This analysis requires identifying relations between parts
of the text, not just individual components like mentions.
      </p>
      <p>
        We explored the identification of relations of this kind in two particular cases:
negation identification and association of measures to entities. Both of these analysis
were based on examining the parse tree of a sentence, which we obtained by using the
dependency parser DeSR [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The features to be extracted from parse trees in order to perform this analysis
should also be learned from a training corpus.</p>
      <p>For this reason we manually annotated a small subset of the IMR, about 10%. The
corpus is useful for a preliminary analysis and for validating the effectiveness of our
approach, but it will have to be extended in the future.</p>
      <p>In order to prepare the training corpus for expressing negation, the IMR corpus was
extended with a column, according to IOB format, with a negation TAG if the entity
is in a negative context. In the following example the entities “diabete” and
“ipertensione” are in a negative context:
For representing relations between entities, each annotated entity is assigned a
sequence number, uniquely identifying the entity within the sentence. This id is added
as an extra attribute to each token, represented as an extra column in the tab separated
IOB file format for the NE tagger, ‘_’ means not involved in a relation. In the
example below the length measurement is associated to the disease mention “versamento
pericardico” refers with:</p>
      <p>FORM
versamento
pericardico
diffuso
,
fino
a
16
mm</p>
      <p>B
B-DISO
I-DISO
O
O
O
O
O
O</p>
      <p>C
O
O
O
O
O
O
B-LENGTH
I-LENGTH</p>
      <p>RELATION
1
1
_
_
_
_
1
1
We trained an extractor on mentions from the output of classifiers NER B and NER
C, e.g., associating measurements to medical entities except treatments and body
parts. Other cases might be exploited as well, given a suitable training set: for
example the outputs of NER A and NER B, to extract relationships between diseases and
body parts, NER A with NER C for relationships between body parts and
measurements.
6.1</p>
      <sec id="sec-4-1">
        <title>Negative context</title>
        <p>For identifying negative contexts in clinical report, we have trained on the above
corpus an SVM-based negation tagger that we are currently evaluating.</p>
        <p>Given a sentence and a named entity target, the tagger classifies the context of the
entity as positive or negative.</p>
        <p>
          The classifier uses as features patterns on dependency parse trees for negative
expressions, similar to those in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] that allow representing the syntactic context.
Examples of these patterns are negated verbs (la patologia non è presente),
negative verbs (il paziente nega di avere la patologia), negative adjective
(Il paziente è privo di patologia), negative nouns (assenza di
patologia).
6.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Measurement Associations</title>
        <p>The measurements extracted in the medical reports are considered as relevant only
when it is possible to detect a direct link to the entity they refer to. The task of
associating medical entities to measurements is performed by exploiting a binary SVM
classifier trained to recognize whether two mentions are related.</p>
        <p>The training instances for the pair-wise learner consist of all pairs of mentions
within a sentence of either a symptom, disease, active ingredient or drug and
measurements (frequency, weight and so forth). A positive instance is created if the terms
are associated, negative otherwise.</p>
        <p>
          The classifier was trained using the following features, extracted for each pair as
described above:
Distance features
Token distance: quantized distance between the two words;
Tree distance: distance between two words on the parse tree;
NER features
Ner: the entity type of the pair of words
Syntactic features
Pos: the POS of the pair of words
For computing the distance features we preprocessed the corpus by using the DeSR
parser [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For each pair of words in a parsed sentence that are tagged as mentions,
features are extracted and passed to the classifier.
        </p>
        <p>For instance, from the parse tree of the sentence “Ipertrofia totale cuore (500 g)
particolarmente evidente”, in Fig. 1, and given that ipertrofia is a mention of a
disease and 500 g is a mass, for tokens ipertrofia and 500 we extract these
features:
Pos_feat(Ipetrofia, 500) = Sfs-N
Ner_feat(Ipetrofia, 500) = DISOMASS
Sentence_distance(Ipetrofia, 500) = 4
Tree_distance(Ipetrofia, 500) = 2
Sentences are parsed and then for each pair of words that are tagged as mentions,
features are extracted and passed to the classifier.</p>
        <p>If the classifier assigns a probability greater than a given threshold, the two words
are combined into a larger mention. The process is then repeated trying to further
extend each relation with additional terms by combining mentions that share a word.</p>
        <p>The classifier has been trained with a small corpus, manually annotated. The
results can be improved using a richer corpus and trying other algorithms besides SVM.</p>
        <p>For example, given the sentence:
Ipertrofia totale cuore ( 500 g ) particolarmente evidente a
carico del ventricolo destro ( cuore polmonare , spessore 1 cm
).</p>
        <p>Applying classifiers A and C we identify the following entities:
 Ipertrofia (B-DISO)
 500 (B-MASS)
 g (I-MASS)
 cuore (B-DISO)
 polmonare (I-DISO)
 1 (B-LENGTH)
 cm (I-LENGTH)
The relation extraction classifier identifies these two relations:
 IpertrofiaDISO ↔ 500_gMASS
 cuore_polmonareDISO ↔ 1_cmLENGTH
Further examples of retrieved entities from the IMR are:
 rigurgitoSIGN ↔ 5_%PERC
 stenosiDISO ↔ 50_%PERC
 versamento pericardicoDISO ↔ 8_mmLENGTH
 FlumazelinACTI ↔ 1_flQUANTITY</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Applications</title>
      <p>Once we are able to extract hidden knowledge in data, several tools can be built to
help physicians.</p>
      <p>For instance, an interactive tool can be developed for helping physicians when
writing clinical records by suggesting a standard code, e.g., those from the ICD9
taxonomy, for the pathologies mentioned in the record.</p>
      <p>A visual tool might be provided for correlating the entities on a statistical basis.
This tool could be used to graphically visualise the correlations between signs and
diseases with different degree of probability, helping doctors in formulating diagnoses
as in the example of Fig. 2.
We presented an approach, based on linguistic analysis of biomedical text, for
annotating and extracting information from medical records written by Italian clinicians.</p>
      <p>Our experiments were carried out within the context of a project on technologies
for healthcare, were we had access to a sample of real medical records over a period
of 3 year for patients with a pair of major pathologies. The aim of the project was to
determine from these data, conditions that might lead to the evolution of these
pathologies into a chronic disease. We have extracted a significant amount of data from
these records that are being fed to a data mining system for further analysis.</p>
      <p>The results obtained are promising, though the corpora we produced need to be
further extended.</p>
      <p>Acknowledgment. Partial support for this work was provided by project RIS (POR RIS of the
Regione Toscana, CUP n° 6408.30122011.026000160).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou'</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Perozzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Polyglot: Distributed Word Representations for Multilingual NLP</article-title>
          .
          <source>In Proc. of Conference on Computational Natural Language Learning</source>
          ,
          <source>CoNLL 2013</source>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          , Sofia, Bulgaria.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. G. Attardi et al.,
          <year>2009</year>
          .
          <article-title>Tanl (Text Analytics and Natural Language Processing)</article-title>
          . SemaWiki project: http://medialab.di.unipi.it/wiki/SemaWiki
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          , et al.
          <year>2009</year>
          .
          <article-title>The Tanl Named Entity Recognizer at Evalita 2009</article-title>
          .
          <source>In Proc. of Workshop Evalita</source>
          '
          <fpage>09</fpage>
          -
          <article-title>Evaluation of NLP and Speech Tools for Italian, Reggio Emilia</article-title>
          ,
          <source>ISBN 978-88-903581-1-1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Experiments with a Multilanguage Non-Projective Dependency Parser</article-title>
          ,
          <source>Proc. of the Tenth Conference on Natural Language Learning</source>
          , New York, (NY).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buzzelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sartiano</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Machine Translation for Entity Recognition across Languages in Biomedical Documents</article-title>
          .
          <source>Proc. of CLEF-ER 2013 Workshop, September</source>
          <volume>23</volume>
          -26, Valencia, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sartiano</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>UniPi: Recognition of Mentions of Disorders in Clinical Text</article-title>
          .
          <source>Proc. of the 8th International Workshop on Semantic Evaluation. SemEval</source>
          <year>2014</year>
          , pp.
          <fpage>754</fpage>
          -
          <lpage>760</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Baronti</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Experiments in Identification of Temporal Expressions in Evalita 2014</article-title>
          .
          <source>Proc. of Evalita</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G.</given-names>
            <surname>Attardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cozza</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Sartiano</surname>
          </string-name>
          . “
          <article-title>Adapting Linguistic Tools for the Analysis of Italian Medical Records”</article-title>
          .
          <source>Vol. I: First Italian Conference on Computational Linguistics CLiCit</source>
          <year>2014</year>
          ,
          <fpage>9</fpage>
          -
          <issue>10</issue>
          <year>December 2014</year>
          , Pisa
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>The Unified Medical Language System (UMLS): integrating biomedical terminology</article-title>
          .
          <source>Nucleic Acids Research</source>
          , vol.
          <volume>32</volume>
          , no.
          <source>supplement 1</source>
          ,
          <fpage>D267</fpage>
          -
          <lpage>D270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          et al.
          <year>2011</year>
          .
          <article-title>Natural Language Processing (Almost) from Scratch</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          , pp.
          <fpage>2461</fpage>
          -
          <lpage>2505</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Cruz Díaz</surname>
          </string-name>
          , et al.
          <article-title>"A machine‐learning approach to negation and speculation detection in clinical texts</article-title>
          .
          <source>" Journal of the American society for information science and technology 63.7</source>
          (
          <year>2012</year>
          ):
          <fpage>1398</fpage>
          -
          <lpage>1410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcheggiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <article-title>An enhanced CRFs-based system for information extraction from radiology reports</article-title>
          ,
          <source>Journal of biomedical informatics 46 (3)</source>
          ,
          <fpage>425</fpage>
          -
          <lpage>435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>S.</given-names>
            <surname>Pradhan</surname>
          </string-name>
          , et al.
          <year>2014</year>
          . SemEval
          <article-title>-2014 Task 7: Analysis of Clinical Text</article-title>
          .
          <source>Proc. of the 8th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2014</year>
          ),
          <year>August 2014</year>
          , Dublin, Ireland, pp.
          <fpage>5462</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>B.</given-names>
            <surname>Rink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Harabagiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          .
          <article-title>Automatic extraction of relations between medical concepts in clinical texts</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ):
          <fpage>594</fpage>
          -
          <lpage>600</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. RIS:
          <article-title>Ricerca e innovazione nella sanità</article-title>
          .
          <year>2014</year>
          .
          <article-title>POR RIS of the Regione Toscana</article-title>
          . homepage: http://progetto-ris.it/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Saeed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lieu</surname>
            , G. Raber, and
            <given-names>R.G.</given-names>
          </string-name>
          <string-name>
            <surname>Mark</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring</article-title>
          .
          <source>Comput Cardiol</source>
          ,
          <volume>29</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>S.</given-names>
            <surname>Sohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Chute</surname>
          </string-name>
          .
          <article-title>"Dependency parser-based negation detection in clinical narratives</article-title>
          .
          <source>" AMIA Summits on Translational Science Proceedings</source>
          <year>2012</year>
          (
          <year>2012</year>
          ):
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          . “
          <article-title>UTH_CCB: A Report for SemEval 2014 - Task 7 Analysis of Clinical Text”</article-title>
          ,
          <source>Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2014</year>
          ), pages
          <fpage>802</fpage>
          -
          <lpage>806</lpage>
          , Dublin, Ireland,
          <source>August 23- 24</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>