<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luc Mottin</string-name>
          <email>luc.mottin@hesge.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Gobeill</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anaïs Mottaz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilie Pasche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnaud Gaudinat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Ruch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BiTeM group, HES-SO/HEG Geneva, Information Science Department</institution>
          ,
          <addr-line>17 rue de la Tambourine, CH-1227, Carouge</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>HUG, Geneva University Hospitals</institution>
          ,
          <addr-line>4 rue Gabrielle-Perret-Gentil, CH-1205, Genève</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SIB Text Mining, Swiss Institute of Bioinformatics</institution>
          ,
          <addr-line>1 rue Michel-Servet, CH-1206, Genève</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>BiTeM/SIB Text Mining (http://bitem.hesge.ch/) is a University research group carrying over activities in semantic and text analytics applied to health and life sciences. This paper reports on the participation of our team at the CLEF eHealth 2016 evaluation lab. The processing applied to each evaluation corpus (QUAREO and CépiDC) was originally very similar. Our method is based on an Automatic Text Categorization (ATC) system. First, the system is set with a specific input ontology (French UMLS), and ATC assigns a rank list of related concepts to each document received in input. Then, a second module relocates all of the positive matches in the text, and normalizes the extracted entities. For the CépiDC corpus, the system was loaded with the Swiss ICD-10 GM thesaurus. However a late minute data transformation issue forced us to implement an ad hoc solution based on simple pattern matching to comply with the constraints of the CépiDC challenge. We obtained an average precision of 62% on the QUAREO entity extraction (over MEDLINE/EMEA texts, and exact/inexact), 48% on normalizing this entities, and 59% on the CépiDC subtask. Enhancing the recall by expanding the coverage of the terminologies could be an interesting approach to improve this system at moderate labour costs.</p>
      </abstract>
      <kwd-group>
        <kwd>Named-Entity Recognition</kwd>
        <kwd>Automatic Text Categorization</kwd>
        <kwd>Discontinuous Entity Extraction</kwd>
        <kwd>Relocation</kwd>
        <kwd>Statistical Training</kwd>
        <kwd>Concept Normalization</kwd>
        <kwd>UMLS</kwd>
        <kwd>ICD-10</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Biomedical data involves a large diversity and quantity of valuable knowledge for
the medical research and practice. Thus, text-mining tools such as named-entity
recognizers have been developed to effectively and efficiently access textual contents.
Now, a dynamic way to improve the different systems implies to compare them on
specific shared tasks as in CLEF such as in [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. In 2016, the challenge was divided
in three subtasks: entity recognition and normalization on the QUAREO corpus, and
entity extraction on the CépiDC corpus, plus a replication track [
        <xref ref-type="bibr" rid="ref4 ref5">4-5</xref>
        ]. Both of the
corpora are available in French and related to biomedical literature.
      </p>
      <p>We report in this paper the contribution of the group to the eHealth task 2
(Multilingual Information) within the CLEF 2016 competition. Our team participated to
most of these tracks, including MEDLINE and EMEA entity extraction (respectively
labelled 2.Q.1 and 2.Q.2), MEDLINE and EMEA normalizedEntities (2.Q.3 and
2.Q.4), the CépiDC coding (2.C), and the replication track.</p>
      <p>Our approach was to integrate an existing automatics categorizer (Ruch 2006) in
the processing of corpora. By providing a ranked list of concepts for each unit of a
corpus, we aim at testing the accuracy of this tool within a Named-Entity Recognition
(NER) task.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. QUAREO 2.1.1. Data</title>
      <p>
        The QUAREO French medical corpus provided for this task includes two datasets
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The first one is composed of 833 article titles from MEDLINE. The second
dataset contains four sets of instructions for use of medicines from the European
Medicines Agency (EMEA), which are separated in 15 free-texts. Additionally, two other
datasets were previously supplied to train, evaluate and adjust the systems.
      </p>
      <p>Designed with a controlled language and strict rules, EMEA instructions represent
a good assessment for the extraction of entities blurred into free-text. MEDLINE
extracts contain fewer concepts, but might be a challenge since they come from different
authors and journals that imply diverse writing style.</p>
      <p>
        The Unified Medical Language System (UMLS) is a compilation of ontologies and
software or services [
        <xref ref-type="bibr" rid="ref7 ref8">7-8</xref>
        ]. Required for the entity normalization, we used the standard
French release of the UMLS Metathesaurus as exclusive dictionary to extract the
biomedical entities with their Unique Concept Identifiers (CUIs) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Thus, to set up
our application we handled the release, freely available in April 2016, from the
National Institute of Health website (www.nlm.nih.gov). With 397203 entries including
synonyms, and 139771 unique concept, this terminology regroups concepts from nine
sources in their French versions; Table 1 presents the distribution for each source.
      </p>
      <p>
        Ten groups of clinical entities are defined from the UMLS semantic types to
provide a consistent categorization of biomedical concepts and support their
normalization [
        <xref ref-type="bibr" rid="ref10 ref11">10-11</xref>
        ]. These semantic groups are: Anatomy, Chemicals &amp; Drugs, Devices,
Disorders, Geographic Areas, Living Beings, Objects, Phenomena, Physiology, and
Procedures. Aware of nested entities that could be assigned to different groups [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we
used the training data to statistically assign the semantic types to the ten categories.
Regarding the semantic types with no mapping, or weakly expressed, we decided to
manually curate the corpus.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.1.2. Automatic Text Categorization</title>
      <p>
        CLEF 2016 was the opportunity to evaluate a tool we worked on several years ago
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Based on a specific thesaurus, the Categorizer operates on each text of the
corpus one by one, and provides ranked lists of concepts. This ranking process combines
a regular expression classifier with a vector-space classifier described in Ruch (2016).
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.1.3. Entity relocation</title>
      <p>The second phase of our system aims at matching the new list of concepts with the
input text using patterns. Each line is divided in word tokens and the program
considers that multiple-words entities can be discontinuous, with one or many nested words.
Concretely, the system will successively try to find each term from the biomedical
concept identified with the line tokens. This also implies to take care of repeated and
overlapping entities. When a prediction is completely retrieved in the text, the system
recovers the offset position (positions of the first and last characters), and prepares a
new entry in the output respecting the BRAT format.</p>
    </sec>
    <sec id="sec-6">
      <title>2.1.4. Entity normalization</title>
      <p>Normalization was processed directly with the matching. As the ATC predict a list
of possible entities derived from the UMLS concepts, UMLS CUIs are associated
with every proposition. Thus, for each prediction matched in the text, the system can
immediately assign a unique CUI.</p>
    </sec>
    <sec id="sec-7">
      <title>2.2. CepiDC 2.2.1. Data</title>
      <p>The CépiDC corpus compiles 110869 lines related to causes of death, and reported
by physicians, within a single CSV file. The corpus is structured in such a way that
one sentence is repeated when multiple causes should be distinctly encoded.</p>
      <p>
        The International Classification of Diseases (ICD), maintained by the World
Health Organisation (WHO), is an international standard including causes of
mortality. The ICD-10 GM is the Swiss national version of this vocabulary [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and we used
it as basis to set the system. Aiming to expand the coverage of the primary thesaurus,
we upgraded it by adding new entities (new translations from the English ICD-10, see
examples in Table 2). We also included new synonyms from the training dictionaries
(from 2006 to 2013) with their ICD-10 codes. Finally, to avoid false positives
potentially induced by short terms and acronyms, the expansion was limited to terms longer
than three characters.
Table 2 : U84 and children translation granularity from ICD-10 GM.
      </p>
      <p>ICD-10 WHO ICD-10 GM</p>
      <p>(English) (French)
U84 aRnetsimisticarnocbeiatol dortuhgers Virrvéuisrsiosdstaetanltt’isqHuaeeursxpès
Translation proposed
U84.0
U84.1
U84.2
U84.3
U84.7
U84.8
U84.9</p>
      <p>Resistance to
antiparasitic drug(s)
Resistance to
anti</p>
      <p>fungal drug(s)
Resistance to
antivi</p>
      <p>ral drug(s)
Resistance to
tuberculostatic drug(s)
Resistance to
multiple antimicrobial</p>
      <p>drugs
Resistance to other
specified
antimicro</p>
      <p>bial drug</p>
      <p>Resistance to
unspecified
antimicrobial drugs</p>
      <sec id="sec-7-1">
        <title>Résistance aux autres antimicrobiens</title>
      </sec>
      <sec id="sec-7-2">
        <title>Résistance aux médi</title>
        <p>caments
antipara</p>
        <p>sitaires
Résistance aux
médicaments antifongiques
Résistance aux
médicaments antiviraux
Résistance aux
médicaments
antituberculeux
Résistance à de
multiples médicaments
antimicrobiens</p>
      </sec>
      <sec id="sec-7-3">
        <title>Résistance à un autre antimicrobien précisé</title>
      </sec>
      <sec id="sec-7-4">
        <title>Résistance à un antimicrobien non précisé</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>2.2.2. Pattern Matching</title>
      <p>Our system uses pattern matching to test the different concepts, from the thesaurus,
with each line in the input. First, this method prioritizes the exact match that fit the
whole text, and then the longer entities.</p>
    </sec>
    <sec id="sec-9">
      <title>3. Results and discussion</title>
      <p>
        Performances of the systems are evaluated with the common metrics used in
Natural Language Processing [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Precision represents the proportion of retrieved
concepts that exactly match the gold benchmark prepared for these documents, while
Recall represents the proportion of relevant concepts that were exactly extracted by
the system. F-measure, also called harmonic mean, evaluates the accuracy of the
system using both of the Precision and the Recall. Scores are calculated according to the
following formulas.
      </p>
      <sec id="sec-9-1">
        <title>5 teams, 9 runs</title>
        <p>BITEM-run1
Average scores
Median scores</p>
      </sec>
      <sec id="sec-9-2">
        <title>3 teams, 5 runs</title>
        <p>TP
TP
406
exact match</p>
        <p>FP
exact match</p>
        <p>FP</p>
        <p>Moreover, an exact match is attributed for the entity recognition when the entity
type and span (starting position + ending position) correspond to the gold benchmark.
Regarding the normalized entity recognition, the UMLS CUIs must also coincide with
the reference benchmark. Inexact matches are credited when at least one word overlap
from the prediction overlaps the span from the certificated benchmark.</p>
        <p>The results from the competitive phase disclosed in mid-May are reported in
figures from 1 to 5. Our system provides substantially better results on MEDLINE than
EMEA corpus, with F-scores of respectively 50% and 27% on the plain entity
recognition. However, the recall may indicates that the basic French UMLS limits the
coverage. This one is obviously not sufficient to extract all the concepts of interest,
especially on the EMEA corpus that implies more drugs and pharmaceuticals.</p>
        <p>On the other hand, to pre-process the ontology must have played a significant role
to reach a F-score of 55% (precision 59% and recall 53%) by deploying an ad hoc
solution for the CéPIDC coding task.</p>
        <p>BITEM-run1 347 430 1856 0.4466 0.1575 0.2329
Average scores 0.4762 0.3215 0.3761
Median scores 0.4466 0.2687 0.3148
Figure 1 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/EMEA corpus, regarding the exact
matches.</p>
        <p>371</p>
        <p>1798
QUAERO (EMEA)
FN
FN</p>
        <p>overall : entities
Precision Recall
0.5225 0.1842
0.525 0.4114
0.5998 0.3787
overall : normalized entities
Precision Recall F1
5 teams, 9 runs inexact match overall : entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 489 288 1649 0.6293 0.2287 0.3355
Average scores 0.6377 0.5141 0.5423
Median scores 0.7175 0.4808 0.5564
3 teams, 5 runs inexact match overall : normalized entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 363 415 1840 0.4666 0.1648 0.2435
Average scores 0.4968 0.4341 0.4405
Median scores 0.4666 0.2842 0.3324
Figure 2 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/EMEA corpus, regarding the inexact
matches.</p>
        <p>QUAERO (MEDLINE)
5 teams, 9 runs exact match overall : entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 1376 1032 1741 0.5714 0.4415 0.4981
Average scores 0.503 0.4264 0.4455
Median scores 0.6166 0.4375 0.4981
3 teams, 5 runs exact match overall : normalized entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 1185 1220 1912 0.4927 0.3826 0.4308
Average scores 0.5006 0.376 0.4287
Median scores 0.4927 0.3826 0.4308
Figure 3 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/MEDLINE corpus, regarding the exact
matches.
5 teams, 9 runs inexact match overall : entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 1778 630 1351 0.7384 0.5682 0.6422
Average scores 0.6387 0.5707 0.5859
Median scores 0.7394 0.5682 0.6422
3 teams, 5 runs inexact match overall : normalized entities</p>
        <p>TP FP FN Precision Recall F1
BITEM-run1 1214 1185 1885 0.506 0.3917 0.4416
Average scores 0.5181 0.4757 0.4917
Median scores 0.506 0.3917 0.4416
Figure 4 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/MEDLINE corpus, regarding the
inexact matches.
5 teams, 7 runs</p>
        <p>TP
exact match</p>
        <p>FP
BITEM-run1 57265 40650 51562 0.5848 0.5262
Average scores 0.7878 0.6636
Median scores 0.811 0.6554
Figure 5 : System results for the coding task on the CépiDC corpus.</p>
        <p>CépiDC
FN</p>
        <p>Precision
overall
Recall</p>
        <p>F1</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Conclusion</title>
      <p>Our results in the QUAREO subtask could certainly be improved by working with
the English version of the UMLS, which covers much more terminology (128
sources), such as the NCI thesaurus or dictionaries specific to the drugs. Text sample
would be translated in English using APIs (such a method has been proposed in past
CLEF eHealth workshops), and the resulting coverage improvement could be
significant. Another way to improve our system on QUAREO might have been to exploit
the training datasets to exercise the Categorizer.</p>
      <p>Regarding the CepiDC corpus, ATC did not achieved good results (e.g. forgetting
many exact matches) due to an issue at data pre-processing stages. Our ad hoc pattern
matching method brought relatively good results for the precision as well as the
recall, but it would be interesting to prepare a subsequent run using the Categorizer.</p>
    </sec>
    <sec id="sec-11">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>1- Braschler</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2000</year>
          )
          <article-title>CLEF 2000 - Overview of Results</article-title>
          .
          <source>In Cross-Language Information Retrieval and Evaluation</source>
          , Springer Berlin Heidelberg,
          <year>2069</year>
          ,
          <fpage>89</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>2- Goeuriot</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            <given-names>H.</given-names>
          </string-name>
          , et al. (
          <year>2015</year>
          )
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2015</article-title>
          . In Information Access Evaluation. Multilinguality, Multimodality, and Interaction, Springer International Publishing,
          <volume>9283</volume>
          ,
          <fpage>429</fpage>
          -
          <lpage>443</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>3- Huang</surname>
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>Community challenges in biomedical text mining over 10 years: success, failure and the future</article-title>
          .
          <source>In Brief Bioinform</source>
          ,
          <volume>17</volume>
          ,
          <fpage>132</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>4- Overview of the CLEF eHealth Evaluation Lab 2016</article-title>
          .
          <article-title>Upcoming publication</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>5- CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction</article-title>
          .
          <source>In CLEF 2016 Working Notes. CEUR-WS</source>
          , Vol-
          <volume>1609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6-
          <string-name>
            <surname>Néveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leixa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,et al. (
          <year>2014</year>
          )
          <article-title>The Quaero French medical corpus: A Resource for Medical Entity Recognition and Normalization</article-title>
          .
          <source>In Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing</source>
          ,
          <fpage>24</fpage>
          -
          <lpage>30</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>7- Barr</surname>
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komorowski</surname>
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pattison-Gordon</surname>
            <given-names>E.</given-names>
          </string-name>
          , et al. (
          <year>1988</year>
          )
          <article-title>Conceptual Modeling for the Unified Medical Language System</article-title>
          .
          <source>In Proceedings of the Annual Symposium on Computer Application in Medical Care</source>
          ,
          <year>1988</year>
          ,
          <fpage>148</fpage>
          -
          <lpage>151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>8- Humphreys</surname>
            <given-names>B.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lindberg</surname>
            <given-names>D.A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoolman H.M.</surname>
          </string-name>
          , et al. (
          <year>1998</year>
          )
          <article-title>The Unified Medical Language System: an informatics research collaboration</article-title>
          .
          <source>In JAMIA</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>9- Tuttle</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sherertz</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erlbaum</surname>
            <given-names>M.</given-names>
          </string-name>
          , et al. (
          <year>1989</year>
          )
          <article-title>Implementing Meta-1: The First Version of the UMLS Metathesaurus</article-title>
          .
          <source>In Proceedings of the Annual Symposium on Computer Application in Medical Care</source>
          ,
          <year>1989</year>
          ,
          <fpage>483</fpage>
          -
          <lpage>487</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>10- McCray</surname>
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burgun</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2001</year>
          )
          <article-title>Aggregating UMLS Semantic Types for Reducing Conceptual Complexity</article-title>
          .
          <source>In Studies in health technology and informatics, 84(0 1)</source>
          ,
          <fpage>216</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11- Yan Chen Y.,
          <string-name>
            <surname>Gu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perl</surname>
            <given-names>Y.</given-names>
          </string-name>
          , et al. (
          <year>2008</year>
          )
          <article-title>Structural group auditing of a UMLS semantic type's extent</article-title>
          .
          <source>In Journal of Biomedical Informatics</source>
          ,
          <volume>42</volume>
          (
          <issue>1</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>12- Ruch</surname>
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2006</year>
          )
          <article-title>Automatic assignment of biomedical categories: toward a generic approach</article-title>
          . In Bioinformatics,
          <volume>22</volume>
          (
          <issue>6</issue>
          ),
          <fpage>658</fpage>
          -
          <lpage>664</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13-
          <string-name>
            <surname>Jetté</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hemmelgarn</surname>
            <given-names>B.</given-names>
          </string-name>
          , et al. (
          <year>2010</year>
          )
          <article-title>The development, evolution, and modifications of ICD-10: challenges to the international comparability of morbidity data</article-title>
          .
          <source>Medical Care</source>
          ,
          <volume>48</volume>
          (
          <issue>12</issue>
          ),
          <fpage>1105</fpage>
          -
          <lpage>1110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14-
          <string-name>
            <surname>Manning</surname>
          </string-name>
          ,C.D. and
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>1999</year>
          )
          <article-title>Foundations of Statistical Natural Language Processing</article-title>
          . The MIT Press, Cambridge,
          <fpage>268</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>