<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SIBM at CLEF eHealth Evaluation Lab 2017: Multilingual Information Extraction with CIM-IND</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chloe Cabot</string-name>
          <email>chloe.cabot@chu-rouen.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lina F. Soualmia</string-name>
          <email>lina.soualmia@chu-rouen.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan J. Darmoni</string-name>
          <email>stefan.darmoni@chu-rouen.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>French National Institute for Health</institution>
          ,
          <addr-line>INSERM, LIMICS UMR-1142</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Normandie Univ., SIBM, TIBS - LITIS EA 4108, Rouen University and Hospital</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents SIBM's participation in the Task 1: Multilingual Information Extraction - ICD10 coding of the CLEF eHealth 2017 evaluation initiative which focuses on named entity recognition in French and English death certi cates. We addressed the identi cation of relevant clinical entities within the International Classi cation of Diseases version 10 (ICD10) in the CepiDC and CDC datasets with our CIM-IND system. CIM-IND is a multilingual system designed to recognize named entities in French and English texts using a dictionary-based approach and natural language processing and fuzzy matching methods. The evaluation was performed for two cases: (i) for all ICD10 codes, the main evaluation for the task and (ii) for ICD10 codes addressing a particular type of deaths, called external causes or violent deaths. On the English test set, our system obtained F-scores of 0.81 for all ICD10 codes and 0.4066 for external causes. On the French aligned test set, our system obtained F-scores of 0.8038 for all ICD10 codes and 0.5011 for external causes. On the French raw test set, our system obtained Fscores of 0.7636 for all ICD10 codes and 0.4897 for external causes. These scores were substantially higher than the average score of the systems that participated in the challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>Information extraction</kwd>
        <kwd>Entity recognition</kwd>
        <kwd>Lexical semantics</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>International Classi cation of Diseases</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Since the amount of digital medical documents has widely expanded in the last
twenty years, the information retrieval from such heterogeneous documents has
become a signi cant challenge to address a large variety of tasks in clinical and
biomedical research as well as personalized medicine. Named entity recognition
(NER) is a basic sub-task of information extraction that aims to extract and
classify entity names from text. The NER problem has been studied widely in
the last decade in the biomedical eld as well as others such as social media
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or speech data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As the use of NER services has expanded,
state-of-theart algorithms have improved on formal medical text for English [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However,
NER algorithms struggle to adapt to free text because algorithms are designed
for formal text and are based on features present in well-formed text such as
biomedical articles. Free text in medical notes comprises spelling errors,
incorrect use of punctuation, grammar and capitalization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In other languages, free
text can also present incorrect use of diacritical marks. In medical reports, text
is usually made from short or incomplete sentences, similar to note-taking, with
a substantial use of ambiguous abbreviations. Usually, clinical records are
created in a rush without any proo ng. Consequently, a large number of spelling
errors occurs. These errors should not only be related to the complexity of the
language but also to characteristics of the medical domain. Siklosi et al. found
that the most frequent types of errors are the unintentional mistyping,
grammatical errors, sentence fragments, and non-standardized abbreviations [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In fact,
as opposed to formal text, abbreviations are rarely de ned in medical reports.
Despite the e orts made in NER, even in the biomedical domain, information
extraction in clinical notes still has to undertake several challenges [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Since 1995, the department of BioMedical Informatics of the Rouen
University Hospital (SIBM, URL: www.cismef.org) has been working on developing
tools to access health knowledge (information retrieval and automatic indexing)
in French [7{10]. More recently, our team has worked on the evaluation of health
information systems and information retrieval and indexing in Electronic Health
Records (EHRs) [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. In this context, a multilingual system called CIM-IND
has been developed. CIM-IND is designed to recognize named entities in French
and English texts using a dictionary-based approach and natural language
processing and fuzzy matching methods. The main objective of this system is to
deal accurately and e ciently with the informal and noisy nature of free text in
medical reports. To assess the performance of CIM-IND, our team participated
in the CLEF eHealth 2016 Task 2 [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ] which aimed at fully automatically
identify clinically relevant entities in death certi cates in French and obtained
average results [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. While death certi cates are standardized documents lled
by physicians to report the death of a patient, they usually present spelling or
typing errors, abbreviations, and, in French, non-diacritized text or a mix of
cases and diacritized text. The main motivation in participating is to improve
the functionalities of the tool and to determine the progress achieved since our
last year participation and our ability to address the issues detected then. As
the Task 1: Multilingual Information Extraction - ICD10 coding of the CLEF
eHealth 2017 evaluation initiative involved assigning codes from the
International Classi cation of Diseases, version 10 (ICD10) to both French and English
death certi cates[
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], we were also able to test our multilingual approach.
      </p>
      <p>The rest of the paper is organized as follows. In Section 2 we introduce our
extraction approach and tools used in this task and we describe our experimental
setup. Section 3 reports on our results. Section 4 presents some error analyses
and re ections and wraps up concluding remarks and outlines future work.</p>
    </sec>
    <sec id="sec-2">
      <title>Material and methods</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Test datasets</title>
        <p>French CepiDC datasets Since 1968, the CepiDC, a French National Institute
for Health and Medical Research (Inserm) laboratory, is dedicated to elaborate
annually the national medical causes of death statistics in association with the
French National Institute for Statistics and Economic Studies (Insee), the
dissemination of the data and the studies and researches on the medical causes of
death. These statistics are built from information from death certi cates. The
CepiDC team handles a database containing more than 18,000,000 death records
[18]. The task consists of extracting ICD10 codes from the raw lines of death
certi cate text. The task is an information extraction task that relies on the text
supplied to extract ICD10 codes from the certi cates, line by line. Two datasets
are provided for the task. The rst dataset is called \aligned dataset" and the
second is called \raw dataset". As the structure of the les provided by these
two sets di ers, some minor adjustments were necessary to process them.
Aligned dataset The dataset includes 31,690 death certi cates processed by
CepiDC in 2014 totalling 91,962 lines. The annotations in the CepiDC corpus
consist of ICD10 codes and were assigned per text line.. The dataset is supplied
in one CSV-formatted le. Each row contains twelve information elds associated
with a raw line of text from an original death certi cate as follows:
{ DocID: death certi cate ID
{ YearCoded: year the death certi cate was processed by CpiDC
{ Gender: gender of the deceased
{ Age: age at the time of death, rounded to the nearest ve-year age group
{ LocationOfDeath: Location of death
{ LineID: line number within the death certi cate
{ RawText: raw text entered in the death certi cate
{ IntType: type of time interval the patient had been su ering from coded
cause, according to the following categories: minutes, hours, days, months,
years
{ IntValue: length of time the patient had been su ering from coded cause
{ CauseRank: Rank of the ICD10 code
{ StandardText: dictionary entry or excerpt of the raw text that supports the
selection of an ICD10 code (if any)
{ ICD10: ICD10 code associated with the certi cate corresponding to the
DocID and LineID
The output comprises the 9 input elds plus two text elds (CauseRank and
StandardText) used to report evidence text supporting the ICD10 code supplied
in the twelfth, nal eld.</p>
        <p>Raw dataset The data from 31,683 death certi cates is distributed over three
CSV-formatted les. The rst le includes the following elds: DocID, YearCoded,
LineID, RawText, IntType, IntValue. The second les includes the following
elds: DocID, YearCoded, Gender, PrimCauseCode, Age, LocationOfDeath. The
third le includes the following elds: DocID, YearCoded, LineID.
English CDC dataset The data from 6,665 death certi cates is distributed
over three CSV-formatted les. The rst le includes the following elds:
DocID, YearCoded, LineID, RawText, IntType, IntValue. The second le includes
the following elds: DocID, YearCoded, Gender, PrimCauseCode, Age,
LocationOfDeath. The third le includes the following elds: DocID, YearCoded,
LineID.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dictionaries</title>
        <p>The French CepiDC corpus includes six versions of a manually curated ICD10
dictionary developed at CepiDC corresponding to years: 2006-2010, 2011, 2012,
2013, 2014 and 2015. The English CDC corpus includes a manually curated
ICD10 dictionary developed by the CDC providing 170,285 entries. These
resources were used to build spelling dictionaries. Moreover, the training sets were
used to complete these dictionaries.</p>
        <p>Spelling dictionaries For each language, the dictionary versions were merged if
necessary. Each ICD term was split into words and duplicates removed. The two
lists of unique words obtained provided a spelling dictionary for each language.
Additional dictionaries Then, an additional dictionary was computed from each
training set by extracting ICD10 code and term combinations. The number of
times an ICD10 code was used in the training corpus was also determined. For
ambiguous terms, i.e. terms that corresponded with more than one ICD10 code,
the most used term was kept. Each additional dictionary was merged with
dictionaries provided in the corresponding corpus. If a term was present in both
the additional dictionary and a corpus dictionary but the corresponding codes
were di erent, the code from the additional dictionary was removed to avoid
introducing ambiguity between dictionary versions. This processing helped to
complete the provided dictionaries especially with some lacking abbreviations.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Extracting ICD10 concepts from death certi cates with CIM-IND</title>
        <p>CIM-IND is designed to match ICD10 terms from the text as input in the relevant
version of the ICD10. The extraction is performed at the phrase level of the
text using natural language processing techniques. The system is built using
Python and Python/C extensions and provides a response in CSV format for
each identi ed concept with: (i) the entry text, (ii) the o set of the rst and the
nal word contained in the health concept, (iii) the ICD10 identi er and (iv)
the ICD10 term. CIM-IND performs three main steps to identify ICD10 terms:
normalization, candidate selection and candidate ranking.</p>
        <p>Normalization Several pre-processing steps are performed, including stop words
ltering (using the default NLTK stop word lists for both French and English
[19]) and elision ltering (removing abbreviated articles that are contracted with
terms). Words are matched case-insensitive. Diacritics in French texts are
conserved and Unicode is used for matching. Finally, spell checking is performed
with the Enchant library using the manually built dictionary.</p>
        <p>Candidate selection A method based on the phonetic encoding algorithm
Double Metaphone (DM) [20] is used to operate a rst approximate term search.
The DM phonetic encoding algorithm is the second generation of the Metaphone
algorithm. It is designed primarily to encode American English names while
taking into account the fact that such words can have more than one acceptable
pronunciation. Double Metaphone can compute a primary and a secondary
encoding for a given word or name to indicate both the most likely pronunciation as
well as an optional alternative pronunciation (hence the \double" in the name).
DM tries to account for myriad irregularities in English as well as Slavic,
Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other languages.
Though powerful, DM does have its limitations and drawbacks. DM was
designed for searching lists of proper names rather than large amounts of text.
DM may not match grossly misspelled words that seriously alter the phonetic
structure of the word. Despite its limitations, the DM algorithm, which is free
to use and open source, still holds as a exible and powerful phonetic encoding
system today, especially in a multilingual approach.</p>
        <p>First, CIM-IND computes DM encoding for each word included in the
normalized phrase. Then, ICD10 term candidates with matching DM encoding are
retrieved. This step provides quickly a list of relevant ICD10 term candidates
and allows to perform time-consuming analyses on a reduced set of terms in the
nal step. In this way, our system relies on a database to store pre-computed
DM encoding for each word available in each ICD10 version dictionary.
Candidate ranking Finally, a Weighted Distance Score (WDS) algorithm has
been developed to rank the list of candidate terms. The WDS algorithm returns
a similarity score scaled from 0 to 100 for each candidate, 100 representing a
perfect match. The most likely term having the highest score is retained as the
matching ICD10 term. As only one or multiple ICD10 terms can be present in
a phrase, two cases are considered. First, if the candidate sequence s1 length is
similar to the processed line s2 length (i.e only one ICD10 term is expected),
two scores are computed: (i) a base score (BS) and (ii) a set score (SeS). The
BS is computed by determining the Levenshtein distance between the sequences
s1 and s2 scaled from 0 to 100. The SeS nds all alphanumeric tokens in each
string and treats them as a set. Then two strings are constructed by concatenate,
on the one hand, the sorted intersection and, on the other hand, the sorted
remainder. Then, the distance of these strings are computed controlling any
unordered partial matches.</p>
        <p>Else, if one of the sequences is 1.5 times longer than the other, two partial
scores are computed: (i) a partial base score (PBS) and (ii) a partial set score
(PSeS). The PBS returns the distance of the most similar substring as a number
between 0 and 100. First each block representing a sequence of matching
characters in a string is determined. Then, the best partial match will be the one
aligning with at least one of those blocks. The PSeS computes the PBS for each
string built from the sorted intersection and the sorted remainder of s1 and s2.
To assure that only full results can return a perfect match, partial scores are
scaled based on the length of s1 and s2. All set scores are scaled by 0.95. Finally,
the WDS score is determined as the highest of these scores.</p>
        <p>64185;2013;2;85;2;5;SYNDROME DE GLISEMENT AVEC GRABATISATION DEPUIS
OCTOBRE 2012;4;3;6-1;syndrome glissement;R453
64185;2013;2;85;2;5;SYNDROME DE GLISEMENT AVEC GRABATISATION DEPUIS
OCTOBRE 2012;4;3;6-1;grabatisation;R263
79317;2013;2;85;2;6;heuorragie digestive basse sur surdosage en
AVK;3;5;6-3;hemorragie digestive basse;K921
79317;2013;2;85;2;6;heuorragie digestive basse sur surdosage en
AVK;3;5;6-3;surdosage avk;X44
64370;2013;1;80;2;5;ABCES CERVICAL . LARYNGECTOMIE TOTALE.ATCD
D'IDM.;NULL;NULL;;laryngectomie totale;Z900
64370;2013;1;80;2;5;ABCES CERVICAL . LARYNGECTOMIE TOTALE.ATCD
D'IDM.;NULL;NULL;;abces cervical;L021
64370;2013;1;80;2;5;ABCES CERVICAL . LARYNGECTOMIE TOTALE.ATCD
D'IDM.;NULL;NULL;;antecedent infarctus myocarde;I258</p>
        <p>Figure 1 gives an example of processing French texts with CIM-IND. The
seventh eld contains the text to annotate, the eleventh the ICD10 dictionary entry
matching the text and the last eld the corresponding ICD10 code. Similarly,
Figure 2 gives an example of processing English texts with CIM-IND.</p>
        <p>For example, in Figure 1, lines 1-2 contains the misspelled word \glisement"
(for French \glissement") and lines 3-4 contains the misspelled word \heuorragie"
(for French \hemorragie"). This rst error is correctly processed by the DM
algorithm providing the same encoding for both the misspelled and correct words.
However, the second error is not properly processed. As the misspelling
profoundly alters the phonetic of the word, the DM algorithm processes a di erent
encoding than for the correct word. This highlights the importance to process
a spell checking of the normalized text to avoid grossly misspelled words before
the DM processing and so secure a proper list of candidates.</p>
        <p>Regarding execution time, CIM-IND is able to process a line from 50 to 300
ms depending on its length.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>French CepiDC datasets</title>
        <p>CIM-IND was run on both French test sets and one run was submitted for
each of these datasets. Table 1 shows the results obtained on the raw dataset
together with the average and median performance scores of the runs of all task
participants. Table 2 shows the results obtained on the aligned dataset.</p>
        <p>On the raw dataset, CIM-IND achieved a precision of 0.8568 and a recall
of 0.6886 (F 1 = 0:7636) for all ICD10 codes. Regarding only ICD10 codes
corresponding to external causes (meaning violent deaths), CIM-IND achieved a
substantial lower performance with a precision of 0.567 and a recall of 0.431
(F 1 = 0:4897).</p>
        <p>On the aligned dataset, CIM-IND achieved a precision of 0.8346 and a
recall of 0.7751 (F 1 = 0:8038) for all ICD10 codes. Regarding only ICD10 codes
corresponding to external causes, CIM-IND achieved again a lower performance
with a precision of 0.5343 and a recall of 0.4717 (F 1 = 0:5011).</p>
        <p>Since the main di erence between these two datasets was related to
formatting, it was expected to obtain quite similar results. However, remarkably, the
aligned dataset obtains a higher recall than the raw dataset. Then, it should
be noted that performance is considerably lower regarding only external causes
related ICD10 codes for both test sets. Overall, our performance results are
considerably better than the average and median score of all submitted runs.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>English CDC dataset</title>
        <p>One run was submitted for the English CDC set. Table 3 shows the results
obtained on this dataset together with the average and median performance
scores of the runs of all task participants.</p>
        <p>CIM-IND achieved a precision of 0.8393 and a recall of 0.7827 (F 1 = 0:81) for
all ICD10 codes. Regarding only ICD10 codes corresponding to external causes,
CIM-IND achieved a lower performance with a precision of 0.4261 and a recall
of 0.3889 (F 1 = 0:4066).</p>
        <p>Regarding all ICD10 codes, these results are slightly better than the results
obtained with the French raw dataset but remarkably similar to those obtained
with the aligned dataset. Again, there is a signi cant performance drop regarding
only external causes related ICD10 codes. In this case, results are lower than
those obtained on both French datasets, for both precision and recall. Overall,
in both evaluations, our results are higher than the average and median score of
all submitted runs.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and conclusion</title>
      <p>
        The development of CIM-IND started last year and the system was evaluated in
the corresponding CLEF eHealth 2016 task, only on one French corpus. In 2016,
CIM-IND obtained a F1 score of 0.6795, which was slightly below the average
results [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Since then, various improvements have been developed concerning
especially the ranking of ICD10 term candidates and CIM-IND's ability to deal
with free text inconsistencies. This year's results have demonstrated these
improvements with a 12% increase in F1 score in the French raw dataset and an
18% increase in F1 score in the French aligned dataset. Moreover, this year's
challenge demonstrated that CIM-IND performed broadly as well in both
English and French, achieving above-average results in both languages.
      </p>
      <p>However, some aspects of our results should be investigated. Although
CIMIND achieved satisfactory results, we noticed that some errors due to
disambiguation or misspellings and inconsistencies remain. In particular, signi cant
misspellings occurring on words which are not part of the spelling dictionary
would result in incorrect DM encoding, and so an improper list of candidate
terms.</p>
      <p>In English text, our results could be slightly improved with a more complete
terminology or a larger training set to cover some missing terms, especially
abbreviations. Moreover, the performance drop regarding external causes-related
ICD10 codes should be investigated and seems to a ect all submitted runs.
External causes present a speci c context and often a speci c terminology related
to accidents, violent deaths or treatment-induced overdoses. They occur more
rarely in the training sets. Actually only 2440 lines in the French training set
(110,869 lines) and 313 lines in the English train set (39,333 lines) appear to
be related to external causes (ICD10 codes V01 to Y98). This can explain the
reduced performance to some extent. Also, in some cases, the ICD10 codes
associated with a given line use the context provided in other lines of the same
death certi cate. CIM-IND processes each line independently and then was not
able to properly annotate such lines.</p>
      <p>The main conclusion of this work and the obtained results is that
improvements can still be performed to enhance rst the processing of the given
terminologies and disambiguation-related issues and also the recognition and
processing of spelling errors. We plan on deepening these two aspects and to participate
to other challenges in the future to keep track of our developments.
18. Pavillon, G., Laurent, F.: Certi cation et codi cation des causes medicales de
deces. Bulletin epidemiologique hebdomadaire (2003)
19. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O'Reilly</p>
      <p>Media, Inc (2009)
20. Philips, L.: The double metaphone search algorithm. C/C++ Users Journal 18(6)
(June 2000) 38{43</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Derczynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzo</surname>
            , G., van Erp,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorrell</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Analysis of named entity recognition and linking for tweets</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>51</volume>
          (
          <issue>2</issue>
          ) (
          <year>March 2015</year>
          )
          <volume>32</volume>
          {
          <fpage>49</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.j.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bigot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          :
          <article-title>Feature-enriched word embeddings for named entity recognition in open-domain conversations</article-title>
          .
          <source>In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          ,
          <source>IEEE</source>
          (
          <year>2016</year>
          )
          <volume>6055</volume>
          {
          <fpage>6059</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mork</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>: 12 years on - Is the NLM medical text indexer still useful and relevant? Journal of biomedical semantics 8(1</article-title>
          ) (
          <year>February 2017</year>
          )
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topaz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goss</surname>
            ,
            <given-names>F.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Automated misspelling detection and correction in clinical free-text records</article-title>
          .
          <source>Journal of biomedical informatics 55 (June</source>
          <year>2015</year>
          )
          <volume>188</volume>
          {
          <fpage>195</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Siklosi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Proszeky</surname>
          </string-name>
          , G.:
          <article-title>Context-aware correction of spelling errors in Hungarian medical documents</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          <volume>35</volume>
          (
          <year>January 2016</year>
          )
          <volume>219</volume>
          {
          <fpage>233</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Menasalvas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo-Martin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Challenges of Medical Text and Image Processing: Machine Learning Approaches</article-title>
          .
          <source>In: Machine Learning for Health Informatics</source>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <volume>221</volume>
          {
          <fpage>242</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroyt</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douyere</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacoste</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigolle</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brisou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Videau</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goupyt</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piott</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quere</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ouazir</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdulrab</surname>
          </string-name>
          , H.:
          <article-title>A search tool based on 'encapsulated' MeSH thesaurus to retrieve quality health resources on the internet</article-title>
          .
          <source>Medical informatics and the Internet in medicine 26(3) (July</source>
          <year>2001</year>
          )
          <volume>165</volume>
          {
          <fpage>178</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogozan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Automatic indexing of online health resources for a French quality controlled gateway</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>42</volume>
          (
          <issue>3</issue>
          ) (May
          <year>2006</year>
          )
          <volume>695</volume>
          {
          <fpage>709</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakji</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Letord</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rollin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          :
          <article-title>Improving information retrieval with multiple health terminologies in a quality-controlled gateway</article-title>
          .
          <source>Health Information Science and Systems</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chebil</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Omri</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Indexing biomedical documents with a possibilistic network</article-title>
          .
          <source>JASIST</source>
          <volume>67</volume>
          (
          <issue>4</issue>
          ) (
          <year>2016</year>
          )
          <volume>928</volume>
          {
          <fpage>941</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lelong</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosjean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.: Retrieving</given-names>
          </string-name>
          <string-name>
            <surname>Clinical</surname>
          </string-name>
          and
          <article-title>Omic Data from Electronic Health Records</article-title>
          .
          <source>Stud Health Technol Inform</source>
          <volume>221</volume>
          (
          <year>2016</year>
          )
          <fpage>115</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lelong</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          :
          <article-title>Semantic Search Engine to Query into Electronic Health Records with a Multiple-Layer Query Language</article-title>
          .
          <source>In: Proceedings of the 2nd SIGIR workshop on Medical Information Retrieval (MedIR)</source>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2016</article-title>
          .
          <article-title>In: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction. Springer, Cham (
          <year>September 2016</year>
          )
          <volume>255</volume>
          {
          <fpage>266</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.:</given-names>
          </string-name>
          <article-title>Clinical information extraction at the CLEF eHealth evaluation lab 2016</article-title>
          .
          <article-title>In: Proceedings of CLEF 2016 Evaluation Labs</article-title>
          and Workshop: Online Working Notes. (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dahamna</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          :
          <source>SIBM at CLEF eHealth Evaluation Lab</source>
          <year>2016</year>
          :
          <article-title>Extracting Concepts in French Medical Texts with ECMT and CIMIND</article-title>
          .
          <source>In: CEUR-WS Working Notes of the Conference and Labs of the Evaluation Forum CLEF</source>
          . (
          <year>2016</year>
          )
          <volume>47</volume>
          {
          <fpage>60</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <article-title>CLEF 2017 eHealth Evaluation Lab Overview</article-title>
          .
          <source>In: CLEF - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science LNCS</source>
          , Springer. (
          <year>September 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>R.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rondet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>CLEF eHealth 2017 Multilingual Information</surname>
          </string-name>
          <article-title>Extraction task overview: ICD10 coding of death certi cates in English and French</article-title>
          . In: CLEF Evaluation Labs and Workshop Online Working Notes, CEUR-WS. (
          <year>September 2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>