<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erik M. van Mulligen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zubair Afzal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saber A. Akhondi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dang Vo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan A. Kors</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Medical Informatics, Erasmus University Medical Center</institution>
          ,
          <addr-line>Rotterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We participated in task 2 of the CLEF eHealth 2016 challenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certi cates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Uni ed Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training material. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity recognition</kwd>
        <kwd>Concept identi cation</kwd>
        <kwd>ICD-10 Coding</kwd>
        <kwd>Term translation</kwd>
        <kwd>French terminology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        We addressed these tasks using dictionary-based indexing approaches. For
entity recognition and normalization we used the system that we developed
for the same task in the CLEF eHealth 2015 challenge [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], but we trained it
on the data that was made available in this year's challenge. Central in our
approach is indexing with French terminologies from the UMLS supplemented
with automatically translated English UMLS terms, followed by several
postprocessing steps to reduce the number of false-positive detections. For ICD-10
coding, we used a terminology that was constructed based on the training data
and again applied post-processing to improve precision. We describe our systems
and their evaluation for each subtask. On the test data, our results for both tasks
are well above the average performance of the systems that participated in the
CLEF eHealth 2016 task 2 challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>In the following, we describe the corpora, terminologies, indexing, and
postprocessing steps separately for each subtask..
2.1</p>
      <sec id="sec-2-1">
        <title>Corpora</title>
      </sec>
      <sec id="sec-2-2">
        <title>Entity recognition and normalization. The training and test data are based</title>
        <p>
          on the Quaero medical corpus, a French annotated resource for medical entity
recognition and normalization [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The Quaero corpus consists of three
subcorpora: titles from French Medline abstracts, drug labels from the European
Medicines Agency (EMEA), and patents from the European Patent O ce. For
the CLEF eHealth challenge, only Medline titles and EMEA documents were
made available. The training set consisted of 1665 Medline titles and 6 full
EMEA documents (comprising the training and test data previously released in
CLEF eHealth 2015); a new test set contained 834 Medline titles and 4 EMEA
documents.
        </p>
        <p>The annotations in the Quaero corpus are based on a subset of the UMLS. An
entity in the Quaero corpus was only annotated if the concept belonged to one of
the following ten semantic groups (SGs) in the UMLS: Anatomy, Chemicals and
drugs, Devices, Disorders, Geographic areas, Living beings, Objects, Phenomena,
Physiology, and Procedures. Nested or overlapping entities were all annotated,
as were ambiguous entities (i.e., if an entity could refer to more than one concept,
all concepts were annotated). Also discontinuous spans of text that refer to a
single entity could be annotated.</p>
        <p>Coding. The data set for the coding of death certi cates is called the CepiDC
corpus. Each certi cate consists of one or more lines of text and some metadata,
including age and gender of the deceased, and location of death. The training
set contained 65,843 certi cates from the period 2006 to 2012, and the test set
contained 27,850 certi cates from 2013.</p>
        <p>The annotations in the CepiDC corpus consist of codes from the ICD-10
and were assigned per text line. For each code that was assigned by the human
coder, a term that supports the selection of the code was provided. This term
was an excerpt of the text line or an entry of a coding dictionary (see below).
Furthermore, the human coder provided for each code the duration that the
deceased had been su ering from the coded cause, and a code rank with respect
to the cause of death.
2.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Terminologies</title>
      </sec>
      <sec id="sec-2-4">
        <title>Entity recognition and normalization. We used the terminology that per</title>
        <p>
          formed best in the CLEF eHealth 2015 concept recognition task [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This
terminology was constructed from all French terms in the ten relevant SGs of UMLS
version 2014AB (77,995 concepts with 161,910 terms). To increase the coverage
of this baseline terminology, English UMLS terms were automatically translated
into French. We used two translators, Google Translate (GT) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and Microsoft
Translator (MT) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and only included terms that had the same translation in
the baseline terminology. The resultant French terminology contained 136,127
concepts with 386,617 terms. Finally, we expanded the terminology with terms
from concepts in the training set that were not recognized by our indexing system
(false negatives).
        </p>
        <p>
          Coding. For coding, we constructed two ICD-10 terminologies. A baseline
terminology was made by compiling the terms corresponding with each annotated
ICD-10 code in the training corpus. The number of times that each code had
been assigned in the training corpus, was also determined. For ambiguous terms,
i.e., terms that corresponded with more than one ICD-10 code, the term was
removed for those codes that occurred less than half as often as the most frequent
code with that term. A second, expanded terminology was based on the
baseline terminology, but also incorporated codes and terms from four versions of a
manually curated ICD-10 dictionary. These dictionary versions have been
developed at the Centre d'epidemiologie sur les causes medicales de deces (CepiDC)
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and were made available by the task organizers. They contained many
additional ICD-10 codes and terms that were not present in the training corpus (and
thus were lacking in the baseline terminology). If a term was present in both the
baseline terminology and a CepiDC dictionary but the corresponding codes were
di erent, the code in the dictionary version was not included in the expanded
terminology to avoid introducing ambiguity. If the term in the baseline
terminology was ambiguous (had multiple codes), only the term-code combinations in
the baseline terminology that were also present in the CepiDC dictionary were
incorporated in the expanded terminology.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>2.3 Indexing</title>
      </sec>
      <sec id="sec-2-6">
        <title>Entity recognition and normalization. The Quaero corpus was indexed</title>
        <p>
          with Peregrine, our dictionary-based concept recognition system [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Peregrine
removes stopwords (we used a small list of (in)de nite articles and, for French,
partitive articles) and tries to match the longest possible text phrase to a concept.
It uses the Lexical Variant Generator tool of the National Library of Medicine
to reduce a token to its stem before matching [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Peregrine is freely available
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Peregrine can nd partially overlapping concepts, but it cannot detect nested
concepts (it only returns the concept corresponding with the longest term). We
therefore implemented an additional indexing step. For each term found by
Peregrine and consisting of n words (n &gt; 1), all subsets of 1 to n{1 words were
generated, under the condition that for subsets consisting of more than one word,
the words had to be adjacent in the original term. All word subsets were then
also indexed by Peregrine. We did not try to nd discontinuous terms since there
frequency was very low.</p>
        <p>
          Coding. For the coding task, we employed the open-source Solr text tagger
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], using the ICD-10 terminologies to index the death certi cates. Several
preprocessing steps were performed, including stopword ltering (using the default
Solr stopword list for French), ASCII folding (converting non-ASCII Unicode
characters to their ASCII equivalents, if existing), elision ltering (removing
abbreviated articles that are contracted with terms), and stemming (using the
French Snowball stemmer). Words were matched case-insensitive, except for
selected abbreviations that were expanded using a synonym list prior to matching.
2.4
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>Post-processing</title>
      </sec>
      <sec id="sec-2-8">
        <title>Entity recognition and normalization. To reduce the number of false</title>
        <p>
          positive detections that resulted from the indexing, we applied several
postprocessing steps. First, we removed terms that were part of an exclusion list.
The list was manually created by indexing the French part of the Mantra
corpus, a large multilingual biomedical corpus developed as part of the Mantra
project [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], and selecting the incorrect terms from the 2,500 top-ranked terms.
        </p>
        <p>Second, for any term-SG-CUI combination and SG-CUI combination that
was found by Peregrine in the training data, we computed precision scores: true
positives / (true positives + false positives ). For a given term, only
term-SGCUI combinations with a precision above a certain threshold value were kept. If
multiple combinations quali ed, only the two with the highest precision scores
were selected. If for a given term none of the found term-SG-CUI combinations
had been annotated in the training data, but precision scores were available
for the SG-CUI combinations, a term-SG-CUI combination was still kept if the
precision of the SG-CUI combination was higher than the threshold. If multiple
combinations quali ed, the two with the highest precision were kept if they had
the same SG; otherwise, only the combination with the highest precision was
kept. If none of the SG-CUI combinations had been annotated, a single
termSG-CUI combination was selected, taking into account whether the term was
the preferred term for a CUI, and the CUI number (lowest rst).
Coding. To reduce the number of false-positive codes that were generated
during the indexing step, we computed a precision score for each term-code
combination that was recognized by the Solr tagger in the training data: true positives
/ (true positives + false positives ). All codes that resulted from term-code
combinations with precision values below a given threshold value, were removed.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Entity recognition and normalization</title>
        <p>
          We indexed the Quaero training data, and added the false-negative terms to our
terminology. We then ran the system on the Quaero test data, and submitted
two runs for both the entity recognition and normalization tasks: one run using
the system with a precision threshold of 0.3 (run1, this threshold was also used
in our last-year's submission for the same task [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]), the other with a precision
threshold of 0.4 (run2). Table 1 shows our performance results for exact match
on the test set.
        </p>
        <p>As expected, run1 (the system with the lower precision threshold) has lower
precision and higher recall than run2, but the di erences are small and the
Fscores are nearly identical. The results are well above the average and median
of the scores from all runs of the challenge participants.
To determine the optimal precision threshold for the coding task, we split the
CepiDC training data in two equally-sized sets. Precision scores were
generated for all term-code combinations that were found using the expanded ICD-10
terminology in one half of the training data and were then used to lter the
recognized term-code combinations in the other half of the training data. Table
2 shows the performance for di erent threshold values on the second half.</p>
        <p>Without precision ltering (threshold 0.0), an F-score of 0.773 (precision
0.732, recall 0.818) was obtained. The highest F-score (0.827) was achieved for
a threshold of 0.4, mainly because precision greatly improved (0.863), while
recall only slightly deteriorated (0.795). The same optimal threshold value was
obtained when we used the baseline ICD-10 terminology.</p>
        <p>We submitted two runs on the CepiDC test set, one using the expanded
ICD-10 terminology (run1), the other using the baseline terminology (run2).
For both runs, precision scores were derived from all the training data and the
threshold for precision ltering was set at 0.4. Table 3 shows the performance
of our system, together with the average and median performance scores of the
runs of all task participants.</p>
        <p>Our results indicate that the baseline terminology (run2) performed slightly
better than the expanded terminology (run1) in terms of F-score. Remarkably,
the baseline terminology had higher recall than the expanded terminology.
Overall, our performance results, in particular recall, are considerably better than the
average and median score of all submitted runs.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>
        We retrained our system for entity recognition and normalization that we
developed for the same task in the CLEF eHealth 2015 challenge, and newly developed
a dictionary-based system for ICD-10 coding of death certi cates. For both
systems, the performance on the test sets proved to be substantially better than
the averaged results of all systems that participated in the challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Our system for entity recognition and normalization performed better on the
EMEA subcorpus than on the Medline subcorpus, primarily because of a higher
recall. This is in line with the results for this task in the CLEF eHealth 2015
challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, whereas we had expected the performance of our system
to be similar or even better than last year (because of the larger training set this
year), results actually were worse, in particular for entity normalization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We
are currently investigating what may have caused this performance decrease.
      </p>
      <p>The baseline and expanded ICD-10 terminologies that we developed for our
coding system, produced almost similar F-scores. Remarkably, although the
expanded terminology contained much more codes and terms than the baseline
terminology, recall for the baseline terminology was slightly higher. A probable
explanation is that the term disambiguation that we performed when expanding
the baseline terminology with the ICD-10 dictionaries supplied by the task
organizers, e ectively prevented some term-code combinations seen in the training
data from being included in the expanded terminology. Moreover, since the codes
and terms in the baseline terminology were derived from a very large training
set, there may have been few new codes and terms in the test set.</p>
      <p>Our coding system achieved a very high precision of 0.886, with a recall of
0.813. Reasons for the lower recall include disambiguation errors, spelling
mistakes and typos in the death certi cates, and missing terms in the terminology.
Also, we noticed that some gold-standard code annotations erroneously
corresponded with a line that preceded or followed the line that contained the term to
be coded, which resulted in false-negative detections (and possibly false-positive
detections that were actually correct). Further improvement of our system may
be possible by using better curated terminologies and applying spelling
correction techniques.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2016</article-title>
          .
          <source>In: CLEF 2016 - 7th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS)</source>
          . Springer, Heidelberg (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Clinical Information Extraction at the CLEF eHealth Evaluation Lab 2016</article-title>
          .
          <article-title>CLEF 2016 Online Working Notes</article-title>
          , CEUR-WS (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The Uni ed Medical Language System (UMLS): Integrating Biomedical Terminology</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>32</volume>
          ,
          <issue>D267</issue>
          {
          <volume>270</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical Named Entity Recognition</article-title>
          .
          <source>CLEF 2015 Online Working Notes</source>
          , CEUR-WS (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>5. International Classi cation of Diseases, http://www.who.int/classifications/ icd/en/</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Afzal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
          </string-name>
          , S.A., van Haagen,
          <string-name>
            <surname>H.H.B.M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>van Mulligen</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kors</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Biomedical Concept Recognition in French Text Using Automatic Translation of English Terms</article-title>
          .
          <source>CLEF 2015 Online Working Notes</source>
          , CEUR-WS (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leixa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosset</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The QUAERO French Medical Corpus: a Ressource for Medical Entity Recognition and Normalization</article-title>
          .
          <source>In: Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BioTxtM)</source>
          , pp.
          <volume>24</volume>
          {
          <issue>30</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Google</given-names>
            <surname>Translate</surname>
          </string-name>
          , https://translate.google.com
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Microsoft</given-names>
            <surname>Translator</surname>
          </string-name>
          , http://www.bing.com/translator
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pavillon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Certi</surname>
          </string-name>
          cation et Codi cation des Causes Medicales de Deces.
          <source>Bulletin Epidemiologique Hebdomadaire. 30/31</source>
          , 134{
          <fpage>138</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Schuemie</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jelier</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kors</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Peregrine: Lightweight Gene Name Normalization by Dictionary Lookup</article-title>
          .
          <source>Proceedings of the BioCreAtIvE II Workshop</source>
          ; Madrid, Spain. pp.
          <volume>131</volume>
          {
          <issue>133</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Divita</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browne</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          , Rind esch, T.C.:
          <article-title>Evaluating Lexical Variant Generation to Improve Information Retrieval</article-title>
          .
          <source>Proceedings of the American Medical Informatics Association Symposium</source>
          , pp.
          <volume>775</volume>
          {
          <issue>779</issue>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>13. Peregrine Indexer, https://trac.nbic.nl/data-mining</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>14. Solr Text Tagger, https://github.com/OpenSextant/SolrTextTagger</mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>Mantra project website</article-title>
          , http://www.mantra-project.eu
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>