<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Some Remarks on Automatic Semantic Annotation of a Medical Corpus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agnieszka Mykowiecka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Małgorzata Marciniak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, Polish Academy of Sciences</institution>
          ,
          <addr-line>J. K. Ordona 21, 01-237 Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>35</fpage>
      <lpage>42</lpage>
      <abstract>
        <p>In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many current methods of recognizing various types of information included
within natural language texts are based on statistical and machine learning
approaches. Such applications need specially prepared domain data for training
and testing. Clinical texts are hard to obtain because of privacy laws, in
particular, none of the Polish corpora include this type of texts. Corpora available
during the past decade more often contain biomedical than clinical texts (e.g.
corpora described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Recently, creating corpora containing clinical data has
started to attract much more attention, e.g. (Cincinnati Pediatric Corpus http:
//computationalmedicine.org/cincinnati-pediatric-corpus-available,
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or data collected within Informatics for Integrating Biology and the
Bedside (i2b2) https://www.i2b2.org/NLP/DataSets/Main.php). This year, the
Text REtrieval Conference (TREC) added the Medical Records Track devoted
to exploring methods for searching unstructured information in patient medical
records. In nearly all existing resources, semantical annotation is absent or very
limited. The one of the few exceptions is CLEF [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which contains cancer
patient records annotated with information about clinical relations, entities, and
temporal information.
      </p>
      <p>
        There are two main approaches to the task of annotating new linguistic data
– manual annotation, and manual correction of automatically assigned labels.
The traditional annotation methodology consists in preparing and accepting
annotation guidelines, annotating every text by at least two annotators and finally,
resolving differences by a third experienced annotator. This approach, applied
to part-of-speech annotation, is described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], semantic manual annotation is
described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Manual annotation is a time-consuming and expensive
process, moreover manual work is error-prone. Manually constructed data are
very hard to extend and modify – every change imposes extra effort for
checking the consistency of the result. Therefore, providing automatic methods to
So2me Remarks on Automatic Semantic Annotation of a Medical Corpus
facilitate the task is very important. Automatic annotation is much faster and
although it also does not guarantee complete correctness, the cost of correcting
already labeled data is lower than the cost of entirely manual annotation.
Automatic annotation of data was applied in the MUCHMORE project [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The
methods described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] can support automatic annotation of textual contents
with SNOMED concepts.
      </p>
      <p>
        A good starting point for automatic annotation are methods of Information
Extraction (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) based on regular expressions and lexicons (e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), which do
not require annotated corpora as machine learning techniques do. In this paper
we discuss the results of annotating a corpus of Polish diabetic records with a
set of complex semantic labels consisting of about 50 attributes. For this task
we reused an already existing rule based IE extraction system. In section 2 we
present the method used to create the annotated corpus and the methodology
accepted for the evaluation process. Then, in section 3 we describe the results
obtained. The paper concludes with a discussion of the evaluation results.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <sec id="sec-2-1">
        <title>Data description</title>
        <p>The corpus consists of 460 hospital discharge reports of diabetic patients,
collected from 2001 to 2006 in one of the hospitals in Warsaw. Each document is
about 1.5 – 2.5 pages long and written in MS Word. The documents are converted
into plain text files to facilitate their linguistic analysis and corpus construction.
As the data include information serving identification purposes (names and
addresses) they were substituted with symbolic codes before making the documents
accessible for analysis. The anonymization task was performed in order to make
the data available for scientific purposes.</p>
        <p>The entire dataset contains about 1,800,000 characters in more than 450,000
tokens, out of which 55% are words, abbreviations and acronyms, while 45% are
numbers, punctuation marks and other symbols.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Automatic annotation process</title>
        <p>
          In contrast to many annotated text corpora which were built by manually
assigning labels to appropriate text fragments, we decided to adopt an existing IE
system [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for the task. However, after inspecting the IE system’s results it turned
out that they do not contain all the information needed. For the IE system, the
main goal was to find out whether a particular piece of information is present
in an analyzed text, while the task of text annotation requires identifying the
boundaries of text fragments which are to be assigned a given label. To solve the
problem, the idea of combining two extraction grammars was introduced. On
the basis of the existing grammar a simplified version, consisting of a subset of
the original rules, was created. The final information associating text fragments
with semantic labels is the effect of a comparison of the results of these
correlated IE grammars. The limits of text fragments representing attribute values
are recognized in the simplified grammar, while their correctness is justified by
more complex grammar rules which describe the contexts in which a particular
phrase has a desired meaning. Thus, the annotation process (described in detail
in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) consists of the following steps:
– parsing the text with the existing full extraction grammar,
– parsing the entire text using the simplified grammar,
– removing unnecessary information from the output of both grammars,
– comparing and combining the results – only structures that are represented
in both results are represented in the final corpus data together with
information on boundaries of the entire phrase and its subphrases,
– combining the semantic information with morphological information (see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ])
to create a set of corpus XML files,
– manual correction of annotations.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Annotated data</title>
        <p>Within the semantic annotation layer about 50 simple attributes, 11 complex
structures and 3 lists types are defined. Below, they are described in the same
groups as the evaluation of the annotation given in the table (1).
– Identification of a patient’s visit in hospital: visit identification number and
information if it is a main document or a continuation; the date of the
document; dates when the hospitalization took place.
– Patient information: a structure with the patient’s identifier, sex and simple
attributes representing age, height, weight (in numbers or words) and BMI.
– Data about diabetes (in some cases grouped in a feature l str structure),
e.g.: type (d type); if the illness is balanced (d control); when diabetes
was first diagnosed (expressed as absolute or relative date); reasons of
hospitalization (as a list of attributes); and results of basic tests: HbA1c, acetone,
LDL, levels of microalbuminury and creatinine.
– Complications, other illnesses including autoimmunology and accompanying
illnesses, which may be correlated with diabetes.
– Diabetes treatment described by: insulin treat str that contains insulin type
and its doses; description of continuous insulin infusion therapy (ins inf treat );
description of oral medications; information that insulin therapy was started.
The applied therapy is sometimes given as a list of information that is
represented by a cure l str list of attributes.
– Diet description represented by diet str that contains information on type
of diet (diet type), and structures describing how many calories are
recommended and a similar structure representing numbers of meals.
– Information on therapy given in text form, e.g.: patient’s education, diet
observeing, therapy modiffication, self monitoring.</p>
        <p>Some of the attributes have values representing dates, e.g. hospit structure
has two substructures describing the beginning and the end of a visit in hospital
So4me Remarks on Automatic Semantic Annotation of a Medical Corpus
(h from and h to). To correctly label these attributes it is necessary to
recognize the different formats of dates and the appropriate contexts indicating the
meaning of a date. Dates are also recognized in case of document begining, and
for representing date when diabetes was first diagnosed.</p>
        <p>Most attributes representing results of tests have numbers as values. They are
usually attached to short phrases consisting of an introductory phrase indicating
a type of a test and its value, sometimes after one of the following characters:
‘=, :, -’. Values can also be given in brackets. Only the results of LDL cholesterol
levels need a wide context, because they are represented in a table form together
with other test results. This explains the average length of 27 tokens of a phrase
representing lipid str indicating the context of the ldl attribute.</p>
        <p>Some attributes, having boolean values, label relatively short phrases like
results of acetone tests. For example, a negative value is attached to the
following strings: ac. (-, ac. -, ac. /-, ac. nieobecny ‘absent’, bez acetonurii ‘without
acetone in urine’, ustąpiła acetonuria or ustąpienie acetonurii ‘acetone in urine
subsided’. Boolean values also have attributes that are represented by many
different, sometimes long, phrases. For example, the information if a therapy of
diabetes was modified or not is represented in the test set after correction by 23
different phrases of average length of 4.3 tokens.</p>
        <p>Attributes of the last group have many values of different types. For example
attribute complication has 17 different values. It is usually attached to a short
phrase (avg. 2.2 tokens) representing just the complication name. Longer phrases
(avg. 5 tokens) represent the opposite information (n comp) when a particular
complication was not diagnosed or there are no complications. These phrases
have to contain a phrase like: nie wykryto ‘not diagnosed’.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>In the corpus consisting of 460 patient records, 66165 occurrences of simple
attributes were labeled. To check the quality of the results, manual verification
of a randomly selected 10% of the corpus (46 records, 46439 tokens) was done
by two annotators, who were given the following guidelines:
– Structures should be assigned to continuous phrases. i.e. to all tokens
between the first and last tokens of the phrase.
– Boundaries of a phrase to which a label is assigned are determined on the
basis of sets of words that may start and end the phrase.
– In case of phrases that represent information that should be taken into
account, but were not predicted by the grammar designer, annotators have
to rely on their own opinion which words belong to such a phrase. If it is
possible, similar rules to those described in the guidelines should be applied.
– Annotators have to point out information that is understandable to human
readers, so phrases with spelling errors should be annotated.</p>
      <p>The results of the manual corrections of the system’s output made by the two
annotators were then compared and the agreed version was accepted as a
Goldstandard version. The final number of differences between the automatically
obtained annotation and the Gold-standard concerned 596 token labels (1.3%).
Human corrections mainly concerned the addition of new labels (79 structures
– 554 tokens). Deletion of mistakenly recognized structures were much less
frequent (4 structures – 20 tokens); very few changes concerned only the boundaries
or the name of the structure. 283 corrections were proposed consistently by both
annotators. Kappa coefficient for annotators agreement counted for all
wordlabel pairs was equal to 0.976 if empty labels were counted (for total 46439
occurrences) and 0.966 when they were ignored (9031 occurrences). The
agreement between the corrected version and the automatically annotated set was
equal to 0.94. Inter-annotator agreement counted only for structures beginnings
(3308) was equal to 0.976.</p>
      <p>The corrected results were compared with the automatically annotated data.
In general, the verification of 9057 non empty labels showed that automatic
annotation achieved an accuracy equal to 0.987, precision – 0.995, recall – 0.936
and 0.966 f-measure value. Precision was equal to 1.00 for all attributes but
doc dat and comp and for dose str and insulin treat str structures. Recall and
F-measures values for all attributes and structures which occurred in the
evaluation set are given in table 1. Errors can be classified into 3 groups:
– Omissions and mistakes of the system: dieta cukrzycowa wysokobiałkowa 188
kcal, 3 posiłki ‘diabetic high protein diet 1800 kcal, 3 meals’ – we did not
recognize a diet of type ‘diabetic and high protein’; the system did not
label information on an obesity of a patient, when it was expressed in Latin
‘obesitas’ instead of Polish ‘otyłość’.
– Spelling errors or punctuation errors in the original data in words that are
crucial for rules: wlew podsttawowy instead of podstawowy ‘base infusion’;
pRetinopathia; masa ciała103 ‘weight103’.
– Information represented by phrases not predicted by the extraction
grammars, or difficult to label by the system because of ambiguity (examples
discussed in section 4).</p>
      <p>As evaluations based on verifying system output can be biased towards types
of phrases which are recognized by the system and may result in the omission
of other types of phrases which represent the same information, we performed a
second type of evaluation. We manually compared the automatically generated
annotation with a manual annotation which was done without seeing the system
results. For this purpose, 5 discharge records randomly selected from the
Goldstandard subcorpus were annotated manually. It took a well trained person 250
minutes (correction of automatic annotation took less than 1 hour) and the
F-measure of the results in comparison to the Gold-standard annotation was
equal to 0.86. Kappa coefficient between manually obtained annotation and the
corrected system output was equal to 0.87 when all word-label pairs were counted
and 0.82 for structures’ beginnings. Lower coefficient value was due to annotator
inattention resulted in omission of information or indicating an inappropriate
text fragment to a label. The agreement between the corrected version and the
automatically annotated set was equal to 0.94.
6
Some Remarks on Automatic Semantic Annotation of a Medical Corpus</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusions</title>
      <p>Standard information given as numbers or dates is often easy to recognize
automatically by any rule based system. The vast majority of such data is
labeled correctly, yet sometimes there are problems as a result of unpredicted long
phrases representing the desired information. These errors should be corrected
during manual verification of the corpus.</p>
      <p>For example, the phrase HbA1c przy przyjcęiu do Kliniki wynosiło 7,8%
‘HbA1c level at the time of admission to hospital was 7.8%’ contains
information that is usually represented by ‘HbA1c = 7,8%’. As rule based systems
are greedy, rules have to be relaxed carefully. For example if we allow several
tokens between the introductory string HbA1c and a number in a rule assigning
the hba1c attribute, it may recognize another number as the value (for HbA1C
9 %, HbA1 11,3 % the value assigned would be 11.3%). It is possible to relax
the extraction grammar rules by imposing restrictions on tokens that appear
between the ‘HbA1c’ token and its value, e.g. a word which has the base form
przyjcęie ‘admission’.</p>
      <p>The second reason for attribute omission is paraphrasing. Natural language
allows us to express the same information in many ways. Thus, it is extremely
difficult to write a system that correctly recognizes all possible phrases. For
instance, in the interpretation of the following phrase: pacjentka z cukrzycą typu
1 została przyjęta do Kliniki z powodu chwiejnego przebiegu choroby ‘patient with
diabetes type 1 was hospitalized in the Clinic because of the unstable course of
illness’ it is necessary to know that the illness mentioned in the second part of
the sentence refers to diabetes, to recognize the reason for hospitalization.</p>
      <p>Another example that is easy for a human annotator but caused problems
in automatic annotation was when context was disregarded for a test result. We
assume that phrases like cukrzyca typu 2 ‘diabetes type 2’ indicate type of
patient’s diabetes. But for the following phrase pacjent obciążony rodzinnie, mama
i babcia z cukrzycą typu 2 ‘patient with family burden, mother and grandmother
with diabetes type 2’ this is not true. Another difficult example is the phrase
dawka dodatkowa 21.00 - 2j. Humalog ‘additional dose 21.00 - 2j Humalog’,
where the string ‘21.00’ was not recognized as a time description but as a dose.</p>
      <p>The biggest problem for automatic rule based semantic annotation stems
from phrases that require a very wide context. For example, it is impossible to
correctly interpret the following phrase: Wprowadzono intensywną
insulinoterapię ‘Intensive insulin therapy was introduced.’ This phrase is a candidate for
i therapy beg indicating the introduction of insulin into a patient’s therapy.
Unfortunately, from this phrase we do not know if the verb ‘introduce’ refers
to ‘insulin’ or to the word ‘intensive’ — a feature of the therapy. This problem
could be resolved only by a human annotator (and not always) after an
analysis of other information in the document. For example, if there is information
on newly diagnosed diabetes or previous oral therapy, the phrase shall be
labeled with the i therapy beg attribute, whereas if there is information that
patient was cured with continuous insulin infusion therapy, the phrase shall not
be labeled with the i therapy beg attribute.</p>
      <p>
        The semantic annotation of text corpora is domain and application related.
As for a new purpose, a new annotation is usually necessary, all methods of
increasing the efficiency of the annotation procedure are very desirable. In the
paper we presented the evaluation results of a corpus annotation obtained using
IE grammars. The results turned out to be of a quality good enough for statistical
purposes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The advantage of designing an IE system instead of preparing
only guidelines for manual annotation is its flexibility – the set of rules may be
changed and a slightly different resource with a high degree of consistency can be
produced, whilst changing the manually annotated resource is more error-prone
and time consuming.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogren</surname>
            ,
            <given-names>P.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Corpus design for biomedical natural language processing</article-title>
          .
          <source>In: ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics</source>
          . pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . Detroit (
          <year>2005</year>
          ), http://www.aclweb.org/anthology/W/W05/W05-1306
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Stockholm EPR corpus - characteristics and some initial findings. to be published</article-title>
          .
          <source>In: in Proceedings of the 14th International Symposium for Health Information Management Research</source>
          . pp.
          <fpage>14</fpage>
          -
          <lpage>16</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gold</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hripcsak</surname>
          </string-name>
          , G.:
          <article-title>Extracting Structured Medication Event Information from Discharge Summaries</article-title>
          .
          <source>In: AMIA Annual Symposium Proceedings</source>
          . p.
          <fpage>237</fpage>
          -
          <lpage>241</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Marciniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mykowiecka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Construction of a medical corpus based on information extraction results</article-title>
          .
          <source>Control and Cybernetics</source>
          (in print) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Marciniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mykowiecka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Towards Morphologically Annotated Corpus of Hospital Discharge Reports in Polish</article-title>
          .
          <source>In: Proceedings of BioNLP</source>
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Meystre</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savova</surname>
            ,
            <given-names>G.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kipper-Schuler</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurdle</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Extracting information from textual documents in the electronic health record: A review of recent research</article-title>
          .
          <source>IMIA Yearbook</source>
          <year>2008</year>
          : Access to Health Information pp.
          <fpage>128</fpage>
          -
          <lpage>144</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mykowiecka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marciniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic semantic labeling of medical texts with feature structures</article-title>
          .
          <source>In: Text, Speech and Dialogue. Proceedings of the TSD</source>
          <year>2011</year>
          , Plzen, Czech Republic,
          <year>2011</year>
          , LNAI, Springer (
          <year>2011</year>
          , accepted for publication)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mykowiecka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marciniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kupść</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Rule-based information extraction from patients' clinical data</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>42</volume>
          ,
          <fpage>923</fpage>
          -
          <lpage>936</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pakhomova</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Codenb</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chutea</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>Developing a corpus of clinical notes manually annotated for part-of-speech</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          <volume>75</volume>
          ,
          <fpage>418</fpage>
          -
          <lpage>429</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hepple</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demetriou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Setzer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Building a semantically annotated corpus of clinical texts</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>42</volume>
          (
          <issue>5</issue>
          ),
          <fpage>950</fpage>
          -
          <lpage>966</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ruch</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gobeill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lovis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Geissbu¨hler, A.:
          <article-title>Automatic medical encoding with SNOMED categories</article-title>
          .
          <source>BMC Medical Informatics and Decision Making</source>
          <volume>8</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>South</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garvin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samore</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gundlapalli</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          :
          <article-title>Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>10</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vintar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ripplinger</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sacaleanu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raileanu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prescher</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>An efficient and flexible format for linguistic and semantic annotation</article-title>
          .
          <source>In: In Third International Language Resources and Evaluation Conference</source>
          , Las (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>