<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Context Related Extraction of Conceptual Information from Electronic Health Records</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Svetla Boytcheva</string-name>
          <email>svetla.boytcheva@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivelina Nikolova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Paskaleva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Parallel Processing, Bulgarian Academy of Sciences 25A Acad.</institution>
          <addr-line>G. Bonchev Str., 1113 So ̄a</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State University of Library Studies and Information Technologies 119 Tzarigradsko Shose Blvd.</institution>
          ,
          <addr-line>1784 So ̄a</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper discusses some language technologies applied for the automatic processing of Electronic Health Records in Bulgarian, in order to extract multi-layer conceptual chunks from medical texts. We consider an Information Extraction view to text processing, where semantic information is extracted using prede¯ned templates. At the ¯rst step the templates are ¯lled in with information about the patient status. Afterwards the system excerpts or infers temporal relations between the events, described in the EHR text. Then cause-e®ect relations are explicated and at last, implicit knowledge is derived from the medical records after reasoning. Thus we propose a cascade approach for the extraction of multi-layer knowledge representation statements because the subject is too complex. In this article we present laboratory prototypes for the ¯rst two tasks and discuss typical examples of conceptual structures, which cover the most challenging tasks in the extraction scenario - the recognition of cause-e®ect relations and temporal structures. The present work in progress is part of the research project EVTIMA (2009-2011) that aims at the design and implementation of technologies for e±cient search of conceptual patterns in medical information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The case history is one of the most important medical documents that were
created, processed and stored since the ancient times. Medical patient records enable
doing research on disease causes and disease symptoms as well as searching for
e®ective treatment methods. Unfortunately, most of these medical documents in
Bulgaria are available as paper archives or in digital format as separate text ¯les
only.</p>
      <p>The needs for more precise clinical research on disease reasons and their
prevention, as well as for their e®ective treatment gave rise to large
repositories of Electronic Health Records (EHRs) which can be explored by computers.
Modern medical informatics requires the development of e®ective methods for
conceptual information extraction from text documents and creation of EHR
data bases in appropriate format, thus facilitating the application of advanced
methods and techniques for knowledge discovery in medical documents.
However the unstructured text of the medical records and the various ways used to
refer to the same medical condition (e.g. disease, symptom, examination results)
make the automated analysis a challenging task.</p>
      <p>There are di®erent types of Electronic Medical Records - one managed by
the patient's GP, others with epicrises issued by hospitals and so on. Here we
shall work with EHRs provided for hospital treatments. Due to the particular
requirements for the personal data protection and restricted access to the medical
documents, the project works on anonymous patient data. The pseudonymisation
is done by the Hospital Information System of the University Specialised Hospital
for Active Treatment of Endocrinology Acad. I. Penchev (part of the Medical
University, So¯a). The project deals with the automatic processing of epicrises
of patients who are diagnosticised with di®erent forms of diabetes.</p>
      <p>The paper is structured as follows. Section 2 overviews some related research
and discusses basic language technologies which are used for Information
Extraction (IE) in the medical domain. Section 3 describes the raw data features
and an outline of the data processing towards knowledge acquisition. Section 4
discusses the main types of patient status data and some techniques for their
extraction. Section 5 presents the module for extraction of temporal structures
from EHRs. Section 6 brie°y sketches the ideas and techniques behind the
modules for cause-e®ect relation detection and more complex relations inferences.
Examples and assessment ¯gures present the current experiments. Section 7
contains some discussions and the conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The main approach for partial text processing is called Information Extraction
(IE). The goal of the IE systems is to search for a given event in the input
documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Di®erent IE systems for various domains have been developed
and they process text in di®erent natural languages. The last two decades
Natural Language Processing (NLP) methods have started to penetrate the medical
¯eld and provide the extraction of medical entities and their classi¯catio.
Today medical IE is integrated in various kinds of applications, for instance entity
extraction from patient records starting from some initial medical ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
or study of large medical databases to identify diseases which co-occur together
more commonly [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The most challenging task is free text processing in the
medical domain. For the latter techniques are used such as:
{ matching existing domain ontology to patient record texts [4{6] -
knowledgebased IE is a very di±cult task since the conceptual resources are very often
incomplete, so important terms from the EHR text might be missing in the
ontology. Moreover the concept labels are usually presented in the domain
ontologies in their canonical form only. These systems are not very successfull
in the recognition of paraphrases, compound concepts, and concepts that
incorporate critical modi¯ers;
{ rule-based processing [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] - usually these systems have drawbacks like (i)
di±culties in generalisation/specialisation of the manually prepared rules,
(ii) rule set management and consistency support and (iii) adaptation of
the rule set for another subdomain;
{ phrase spotting [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] - these systems search for speci¯c key terms or phrases
in the medical records but often they fail to recognise paraphrases and
compound concepts. Another problem is that the particular context of the key
term can di®er from the required one, moreover the key term can also occur
in a negated context [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ];
{ mapping the input text to an internal language-independent representation,
like in the systems RECIT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and GMT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] - this approach also has
limitations in the analysis of complex language constructions.
      </p>
      <p>
        More generally, semantic interpration of medical data can be done by
encoding knowledge strauctures as Conceptual Graphs (CG). Croitoru et all [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
present a CG approach to describing data in a distributed decision support
system for brain tumour diagnosis. They use CG for capturing the static model
rather than the inference procedures. Speci¯c CG extensions like the notion of
"actor" prove to be useful because they provide higher-level functionality on
top of the knowledge statements. In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Delugach proposes to model multiple
medical views by CG; he introduces "actors" to represent some relationships in
the data °ow diagrams for medical information. The "Actors" in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are applied
to automate a requirements consistency checks by CG representation; this idea
is useful in the medical domain too because of the many requirements.
Pfeiffer and Hartley [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] present a Conceptional Programming environment, where
graph de¯nitions contain relational functions, a kind of "actors", for allowing
state, event, and temporal processing.
      </p>
      <p>
        Some IE systems in the medical domain are considered as best practices,
for instance BADGER [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] - a text analysis system, which summarizes medical
patient records by extracting concepts (diagnoses, symptoms, physical ¯ndings,
test results, and therapeutic treatments) based on linguistic context and AMBIT
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] - a system for information extraction from biomedical texts.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Project settings - from data to knowledge</title>
      <p>In Bulgaria the EHR content is standardised by state regulatory documents.
The length of EHR text is usually 2-3 pages and the text is separated into the
following parts according to the medical standards: General information for the
patient, Diagnosis, Anamnesis, Accompanying/Past diseases; Family anamnesis;
Risk factors, Status, Examinations and Clinical Data; Consultations; Debate.</p>
      <p>One of the major problems in processing Bulgarian EHRs is the variety of
terminological expressions. The text contains medical terminology in Latin (about
1%), sometimes with di®erent transcriptions in Cyrillic. Several wordforms per
term are used for most of the medical terminology in Bulgarian (66%) and there
are speci¯c abbreviations of the medical terminology both in Bulgarian and Latin
(about 3%). The EHRs contain also a lot of numerical values of analyses and
clinical test data (about 16%). Another problem is the speci¯c language style
of the medical professionals. The major part of the text consists of sentence
phrases; complete sentences are rare. Sometimes there is no agreement between
the sentence parts. All these obstacles together are not easy to overcome. They
prevent the e®ective application of standard NLP techniques like deep syntax
analysis. Moreover, especially for Bulgarian, there is a lack of well-developed,
stable language technologies with satisfactory precision for partial syntax
analysis. Another essential problem is the rich temporal structure of the patient
descriptions - about 2/3 of the whole text describe connected events, which have
happened in di®erent periods of time. Due to all reasons listed above, it is
necessary to elaborate NLP techniques for conceptual structures extraction using
partial analysis of the Bulgarian EHRs.</p>
      <p>The desired knowledge structures to be extracted from EHRs can be grouped
in the following types: (i) Characteristics for ¯lling information in templates for
patient status, diagnoses, symptoms, treatment etc.; (ii) Discovering relations
cause-e®ect relations between symptoms-diagnosis-treatment etc.; (iii) Analysis
of the di®erent treatments e®ect on the patient status.</p>
      <p>This information is represented in di®erent levels of abstraction, therefore
it is necessary to construct a multi-layered knowledge representation hierarchy
(Fig. 1). The lowest level consists of data about the patient status - symptoms,
clinical data and diagnosis. At this level we use IE techniques for ¯lling in
templates with information about body parts and their current status, symptoms
and/or diagnosis; IE operates on the text words. The middle level represents
temporal relations between events (data and events' order). The next level shows
cause-e®ect relations (meaning of data), and the uppermost level corresponds
to implicit knowledge extracted from medical records after reasoning
(understanding the data). Thus there are several challenging tasks in solving complex
relations recognition: ¯lling in templates with patient's data, temporal
structures determination (by extraction of sequences of events) and recognition of
the cause-e®ect paradigm.</p>
    </sec>
    <sec id="sec-4">
      <title>Patient Status Data Extraction</title>
      <p>Patient-related documents show signi¯cant deviation from the regular text where
coherent discourse is built by adjacent sentences. The EHR texts are even more
speci¯c case, because they are a kind of non-o±cial, non-edited hospital
documentation, which is entered by di®erent authors. These observations are
supported by statistical analyses of the experimental corpus tokens.</p>
      <p>The primary, raw text corpus consists of 106 EHRs forming a corpus of 73
600 word occurrences. After the elimination of repeated identi¯cation records
and long digit expressions (tables, enumerations etc.) the corpus was reduced
to 65 600 tokens. 10% of them are non-language units - Latin words, digits and
mixed letter-digit strings. The rest of the corpus - 58 920 Bulgarian words were
annotated with a large Bulgarian grammatical dictionary (1 100 000 wordforms
of 75 000 lemmas). The results of the morphosyntactical annotation showed a
large percentage of unrecognised words (13 370 tokens or 22,6% of all running
Bulgarian words). The 45 550 analysed tokens are presented by 2 600 wordforms
and their 1 840 lemmas.</p>
      <p>The results given in Fig. 2 show the high percentage of extra-language and
extra-dictionary data in the medical texts ( total - 30%). For comparison, the
correlation of such data in narrative texts is given in Fig .2 as well (without
extralanguage units and only 5,8% extra-dictionary units). The ¯gures show also the
"poverty of expressions" in medical texts - the list of di®erent wordforms - 5,7%
of the running words, while the same value for narrative texts is 19,4%.</p>
      <p>
        The speci¯c text features prevent the extraction of conceptual information
by the classical deep NLP techniques (e.g., tagging and deep parsing), because
the key elements (for the text analysis and contents) are coded by non-language
means. The usual approach in such cases is to use statistical methods for text
analysis. But these methods require the presence of a large text corpus for
observation, training and testing, which does not exist for Bulgarian EHRs. That
is why we apply only partial chunking for extraction of Noun Phrases (NPs)
wherever possible and reduce the wordforms of the extracted NPs to common
stems using stemming rules [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Later on, these NPs and diseases' de¯nitions
given in the International Classi¯cation of Diseases (ICD-10) are treated as key
phrases for further language analysis of the patient EHRs. We build templates
to recognise some crucial elements for the text comprehension. By ¯lling these
templates we obtain information about body parts and their current status,
symptoms and/or diagnosis in expressions like:
{ diagnosis: "Zaharen diabet tip 1" (diabetes type 1);
{ risk factors: "puxaq"(smoker);
{ body parts: "mnogo suha ko a" (very dry skin); " enski tip okosm vane";
"krai$nici - bez otoci".
      </p>
      <p>Bulgarian language with its rich morphological structure enables the
application of unsupervised knowledge extraction methods. Here we discuss some
linguistic techniques and language oriented surface rules, which provide the
construction of semantic links in the template of risk factors.</p>
      <p>Analysing the descriptions of risk factors in 106 EHR texts, we ¯nd some
linguistic dependencies between the lexical units and the concepts they represent.
The communicative role of that text fragment is to serve as a link between the
disease and its originating and aggravating factors (drinking, smoking) as well
as between the medication and the factors for its failure or complications (e.g.
allergies). A main risk factor such as smoking can be expressed by the verb puxa
(smoking) as well as by its related words - the nouns puxaq (smoker), puxene
(smoking), t t nopuxene (tabaco-smoking). In this example the common
semantic element "smoke" can be extracted from all stems of the word family so
the stemming facilitates the meaning extraction from the EHR.</p>
      <p>Besides the morphologically related words, the text expressions of the same
risk factor can be realised through semantically related words like 'cigarettes'
and 'box', used as a measure for the risk level. This level is normally expressed
in numbers, accompanying the focal words (e.g. 2-3 cigarettes vs. 2-3 boxes per
day). We code explicitly the relation between the risk action and the
measurable units of the observed action. The elements of this relation are obtained by
bottom-up study of the possible semantic relations in a representative text
corpus. For example the risk factor 'alcohol drinking' is registered in the EHRs as
"uses 100 g concentrate a day", which presumes a priori corelations between the
notions of 'concentrate' and 'drinking' (even when the second word is missing in
the particular text).</p>
      <p>
        When building the semantic templates, special attention is given to the
various language means of expressing negation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Negation recognition methods
should work at text level, because the negated element (keyword) might be
extracted wrongly if the negation scope goes beyond the standard negative
interpretation of immediate adjacency (X - no X, not X). Standard negation links
denoting the absence of risk are expressed by verbs as 'otriqa' (deny) , 'ne
sobwava' (does not notify), 'ne sa zabel zani' (are not noticed).
      </p>
      <p>Example: Fig. 3 presents a sample template for detecting the risk factor
"smoking" build on the basis of all epicrisis in our corpora. Using concordances
and partial chunking we have collected semi-automatically the linguistic features
which signify the presence, lack or frequency of smoking. Due to the impossibility
to perform a deep syntactico-semantic analysis of the epicrisis text (and to obtain
the detailed structure of the connected text units), the template in Fig. 3 is
applied as a speci¯c search window. Further analysis of EHRs is implemented
by shallow method which presumes the de¯nition of text limits in the search
procedure - a text fragment between two punctuation marks.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Temporal Structures Extraction</title>
      <p>Temporal events can be grouped in the following types: diagnosis, symptoms,
and treatments. The events features are: start of the event, end of the event,
type, characteristic, e®ect.</p>
      <p>The Temporal Structures Extraction Module has a pipeline architecture
including submodules for: Annotation Analyses &amp; Chunking; Temporal
Information Extraction and Filling Templates. This module processes the EHRs
separately one by one.</p>
      <p>The ¯rst submodule splits the EHRs into their main sub-sections. After that
the sections are grouped into ¯ve major segments according to the tense of the
events that will be searched for in the segment: past events, events at the moment
of entering the hospital, events at the moment of leaving the hospital, continuous
events, events in the future.</p>
      <p>If some EHR part contains information about events that can be associated
with more than one tense type, then that EHR part is duplicated and recorded
into the corresponding tense segments. After that the annotation submodule
starts the morphological analysis based on a large dictionary (described in
section 4) and grammar rules. For each wordform, the module ¯nds its basic form
(lexeme) with the associated lexical and grammatical features. The lexicon is
extended with a terminological bank for medical terms, derived from the ICD-10
in Bulgarian. This additional recourse contains 10970 terms, partial taxonomy
of body parts, and a medical goods list.</p>
      <p>The annotation submodule performs also chunking of the main sentence
phrases, which is a partial syntactic analysis, and recognizes the noun phrases.
Each sentence in every segment is numbered separately following the sentence
occurrences in EHR text.</p>
      <p>The second submodule extracts temporal structures from the text by using
keywords to locate the necessary information and rules for determining the scope
of the event.</p>
      <p>The keywords are grouped in the following categories: for time/date, events
with alternating and/or unde¯ned time, treatment events, diagnosis events,
symptom events, e®ect.</p>
      <p>
        The extraction of the temporal features is done by application of rules for
discovering the temporal structures. They work on words and their annotations,
inserted in the text by the partial syntax analysis module, and by determining
of the category of the events (past, present, future, continuous) in the sentences
which contain the corresponding "keywords". The next step is to partially order
the events according to their start/end times. The system tries to ¯nd in the
medical archive previous patient records (if any), to build a complete picture
about the past patient case history and the newly generated events. The
medical records resource bank contains patients' information in XML format. The
submodule for template ¯lling determines the scope of each event. Templates are
manually generated after joint discussions with medical experts. One sentence
might contain information about sequence of events but it is also possible to
have the information about one event spread out among several adjacent
sentences. Due to this reason, to determine the scope of the event X, the module
analyses all events generated from the sentence where X is described and forms
its previous and next sentences, i.e. the events ordering module processes the
discourse structure of the text [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. For each event the module determines its
characteristics and e®ect (if any). During the process of templates ¯lling it is
necessary to de¯ne correlations between the events, depending on their sequence.
The ¯nal step is to update the patient information in the resource bank with
medical records.
      </p>
      <p>Example for Filling of Temporal Conceptual Templates
Let an EHR for some patient be given. We would like our IE system to extract
from "Anamnesis" (the selected part) information about events in "past tense"
and "present continuous tense". Let us consider the selected text. As a result
of the Annotation analysis and Chunking module, after the morphological and
partial syntax analysis, we obtain for the given text the annotation on ¯g. 4.</p>
      <p>In the paragraph "Zaharni t diabet e ustanoven prez 1998 god. na
fona na nadnormeno teglo. Prvonaqalno priemala Novonorm v
kombinaci ss Siofor, posle Diaprel MR ss Siofor, no poradi lipsa
na efekt i izrazeni straniqni reakcii km Metformin, ot nuari
2005 g. preminala na Insulin Novo Miks v dvukaratni aplikacii."
the IE system can recognise the following "keywords" signaling past events
and event sequences - for tense ( 1998 god./year 1998, prvonaqalno/at
¯rst , posle/afterwards, nuari 2005 g./january 2005), for e®ect - diagnosis
(ustanoven/determined), treatment (priemala/took, preminala na/changed
to), symptoms (reakcii/reactions, efekt/e®ect, nadnormeno/over).</p>
      <p>As characteristics for these events, the system identi¯es drug entities after
checking the medical goods list in the repository bank. The ¯rst event has no
associated start time marker and the IE module is looking for time information
in previous sentences in the text, by processing the local discourse. In case of
lack of a marker the module takes into account the so called narrative convention
- two past events, expressed consecutively have happened in the same order they
are written in. Thus we obtain the ¯lled templates shown in ¯g. 5. An important
project task at present is to evaluate the accuracy of the partial temporal analysis
which is sketched in this section.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Cause-E®ect and Complex relations</title>
      <p>
        On the next level of our knowledge representation model we aim at cause-e®ect
relations extraction. As in the previous level several "keywords" for detection of
such relations are used as well as manually prepared rules and templates for ¯lling
the appropriate information from the EHRs. These templates have slots like:
Cause, E®ect, Type, Degree, Evidence, Condition. The IE approach we apply is
mostly based on pattern matching method. The keywords are classi¯ed into four
main types based on the comprehensive typology of causal links of Altenburg
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] as follows: the adverbial link (e.g. hence, therefore), the prepositional link
(e.g. because of, on account of), subordination (e.g. because, as, since, for, so)
and the clause-integrated line (e.g. that is why, the result was). Causative verbs
are transitive action verbs that express a causal relation between the subject
and object or prepositional phrase of the verb. Some of these causal links were
reconsidered for Bulgarian due to the speci¯c language syntax and semantics.
The system has to be able to recognise several causal expressions (paraphrases)
representing one and same causal event/situation.
      </p>
      <p>On top of all structures recognised so far, the system infers implicit
knowledge. Complex relations extraction is a challenging task that relates to
recognition of implicit relations. This approach will help us to synthesise new knowledge
by chaining extracted cause-e®ect relations. Here we make use of logical
representation forms of the cause-e®ect relations as well as inference rules. At this
level the processing is going beyond the single patient data and spreads to
causee®ect relations, extracted from several patients' EHRs. These complex relations
are statistically observed and additional relations are recognised. The same
relations are later set as hypothesis which can be used in the cascade process as
a basis for the next step of the inference procedure. Since there is no prede¯ned
expected result and the inference on all possible combinations of results is too
complex and takes a long time for computation, it is necessary to set in advance
the depth level of inference and type ot the expected relations. It is also
possible that the system in the unsupervised process extracts inappropriate results
and that is why the inferred relations (hypothesis) need some level of supervised
revision before they are saved in the system and further utilised.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Work</title>
      <p>In this initial stage of the system design and implementation we discuss the
obstacles in dealing with unstructured EHR texts and ideas for overcoming them by
using language dependent techniques like related words, morphological analysis,
stemming. We propose bottom-up and top-down approaches for patient status
observation and give directions for future development of the algorithms.</p>
      <p>We discuss an algorithm for extracting temporal conceptual structures from
the raw text corresponding to sequences of consecutive events representing the
development of the patient disease. In this early stage we cannot discuss precision
of our modules yet, but give outline for future development. Precise evaluation
is a goal of one of the next project phases.</p>
      <p>The project objective is to develop algorithms for discovering more complex
cause-e®ect relations and other dependencies that are not explicitly given in
the text. The modules for further analysis of more complex relations will be
developed in the future project stages.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work is a part of the project EVTIMA (E®ective search of conceptual
information with applications in medical informatics") which is funded by the
Bulgarian National Science Fund by grant No DO 02-292/December 2008.</p>
      <p>The patient records for the project are kindly provided by the University
Specialised Hospital for Active Treatment of Endocrinology Acad. I. Penchev,
which is part of the Medical University So¯a.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cunnigham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Information extraction - an user guide</article-title>
          .
          <source>Research Memo CS-99-07</source>
          , Computer Science Deptartment, University of She±eld,
          <year>1999</year>
          (http://www.dcs.shef.ac.uk/ hamish/IE).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.V.</given-names>
            <surname>Lita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Raileanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Niculescu</surname>
          </string-name>
          .
          <article-title>Medical Entity Extraction From Patient Data</article-title>
          .see patent information at http://www.faqs.org/patents/app/20080228769.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goldacre</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          <string-name>
            <surname>Kurina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Yeates</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Seagroatt</surname>
            and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Gill</surname>
          </string-name>
          .
          <article-title>Use of large medical databases to study associations between diseases</article-title>
          ,
          <source>Q J Med</source>
          ,
          <volume>93</volume>
          (
          <issue>10</issue>
          ):
          <fpage>669</fpage>
          -
          <lpage>675</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baud R. H</surname>
          </string-name>
          .
          <article-title>A natural language based search engine for ICD10 diagnosis encoding</article-title>
          , Med Arh.
          <year>2004</year>
          ;
          <volume>58</volume>
          (
          <issue>1 Suppl 2</issue>
          ):
          <fpage>79</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>MedLEE - A Medical Language Extraction and Encoding System</surname>
          </string-name>
          , http://lucid.cpmc.columbia.edu/medlee/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khoo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>J.C.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <article-title>Automatic identi¯cation of treatment relations for medical ontology learning: An exploratory study</article-title>
          . In I.C.
          <string-name>
            <surname>McIlwaine</surname>
          </string-name>
          (Ed.),
          <source>Knowledge Organization and the Global Information Society: Proc. of</source>
          the Eighth International ISKO Conference. Germany: Ergon Verlag,
          <year>2004</year>
          , pp.
          <fpage>245</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Christopher</surname>
            <given-names>S. G.</given-names>
          </string-name>
          <string-name>
            <surname>Khoo</surname>
          </string-name>
          , Syin Chan, Yun Niu,
          <article-title>Extracting Causal Knowledge from a Medical Database Using Graphical Patterns</article-title>
          .
          <source>In Proc. of ACL</source>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.D.</surname>
            ,
            <given-names>A Shallow</given-names>
          </string-name>
          <string-name>
            <surname>Parser</surname>
          </string-name>
          <article-title>Based on Closed-class Words to Capture Relations in Biomedical Text</article-title>
          .
          <source>Journal of Biomedical Informatics (JBI)</source>
          vol.
          <volume>36</volume>
          , pp
          <fpage>145</fpage>
          -
          <lpage>158</lpage>
          ,
          <year>June 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Natural</given-names>
            <surname>Language</surname>
          </string-name>
          <article-title>Processing in Medical Coding</article-title>
          .
          <article-title>White paper of Language and Computing (www</article-title>
          .landcglobal.com).
          <source>April</source>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Boytcheva</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Strupchanska</surname>
          </string-name>
          , E. Paskaleva, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tcharaktchiev</surname>
          </string-name>
          ,
          <article-title>Some Aspects of Negation Processing in Electronic Health Records</article-title>
          .
          <source>In Proc. of International Workshop Language and Speech Infrastructure for Information Access in the Balkan Countries</source>
          ,
          <year>2005</year>
          , Borovets, Bulgaria, pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rassinoux A.-M.</surname>
            ,
            <given-names>R.H.</given-names>
          </string-name>
          <string-name>
            <surname>Baud</surname>
            ,
            <given-names>J.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Scherrer</surname>
          </string-name>
          .
          <article-title>A multilingual analyser for medical texts</article-title>
          . In: Tepfenhart,
          <string-name>
            <surname>W.M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Dick</surname>
          </string-name>
          and J.
          <string-name>
            <surname>Sowa</surname>
          </string-name>
          (Eds.) Conceptual Structures: Current Practices,
          <source>Proceedings of the 2nd Int. Conf. on Conceptual Structures</source>
          , Springer, LNCS Volume
          <volume>835</volume>
          ,
          <year>1994</year>
          ,
          <fpage>84</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Votruba</surname>
            , P., S. Miksch, and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kosara</surname>
          </string-name>
          ,
          <article-title>Linking Clinical Guidelines with Formal Representations</article-title>
          .
          <source>In Proc. - 9th Conf. on Arti¯cial Intelligence in Medicine in Europe (AIME</source>
          <year>2003</year>
          ), p.
          <fpage>152</fpage>
          -
          <lpage>157</lpage>
          , Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Croitoru</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bo</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Srinandan Dashmapatra,
          <string-name>
            <given-names>Paul</given-names>
            <surname>Lewis</surname>
          </string-name>
          , David Dupplaw,
          <string-name>
            <given-names>Liang</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>A Conceptual Graph Description of Medical Data for Brain Tumour Classi¯cation</article-title>
          ,
          <source>15th International Conference on Conceptual Structures (ICCS</source>
          <year>2007</year>
          ), She±eld, UK.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Delugach</surname>
          </string-name>
          ,
          <string-name>
            <surname>Harry</surname>
            <given-names>S.,</given-names>
          </string-name>
          <article-title>An Approach To Conceptual Feedback In Multiple Viewed Software Requirements Modeling</article-title>
          ,
          <source>Proc. Viewpoints</source>
          <volume>96</volume>
          : Intl. Workshop on Multiple Perspectives in Software Development, Oct.
          <volume>14</volume>
          -
          <fpage>15</fpage>
          ,
          <year>1996</year>
          , San Francisco.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Smith</surname>
            <given-names>B. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harry</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>Delugach: A Framework for Analyzing and Testing Requirements with Actors in Conceptual Graphs</article-title>
          .
          <source>ICCS</source>
          <year>2006</year>
          :
          <fpage>401</fpage>
          -
          <lpage>412</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Pfei®er, H.D. and R.T. Hartley: Semantic Additions to Conceptual Programming,
          <source>in Proceedings of the Fourth Annual Workshop on Conceptual Graphs, Detroit, MI, 6.0.7-1 - 8</source>
          (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Aronow</surname>
          </string-name>
          , D. Fisher, J. Aseltine, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehnert</surname>
          </string-name>
          ,
          <article-title>Machine Learning of Text Analysis Rules for Clinical Records</article-title>
          .
          <source>CIIR Technical Report</source>
          , University of Massachusetts Amherst,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Gaizauskas</surname>
            , R.,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hepple</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Harkema</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>and I. Roberts</given-names>
          </string-name>
          , AMBIT:
          <article-title>Acquiring medical and biological information from text</article-title>
          . In S. J. Cox, editor,
          <source>Proceedings of the UK e-Science All Hands Meeting</source>
          , UK,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Mani</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <article-title>Recent Developments in Temporal Information Extraction</article-title>
          ,
          <source>In Proceedings of RANLP-03</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Altenberg</surname>
            ,
            <given-names>B..</given-names>
          </string-name>
          <article-title>Causal linking in spoken and written English</article-title>
          .
          <source>Studia Linguistica</source>
          ,
          <year>1984</year>
          ,
          <volume>38</volume>
          (
          <issue>1</issue>
          ),
          <fpage>20</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>BulStem: Design and Evaluation of In°ectional Stemmer for Bulgarian</article-title>
          .
          <source>Proceedings of Workshop on Balkan Language Resources and Tools (1st Balkan Conference in Informatics)</source>
          . Thessaloniki, Greece,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>