=Paper= {{Paper |id=Vol-1609/16090094 |storemode=property |title=BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction |pdfUrl=https://ceur-ws.org/Vol-1609/16090094.pdf |volume=Vol-1609 |authors=Luc Mottin,Julien Gobeill,Anaïs Mottaz,Emilie Pasche,Arnaud Gaudinat,Patrick Ruch |dblpUrl=https://dblp.org/rec/conf/clef/MottinGMPGR16 }} ==BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction== https://ceur-ws.org/Vol-1609/16090094.pdf
  BiTeM at CLEF eHealth Evaluation Lab 2016
   Task 2: Multilingual Information Extraction
        Luc Mottin1,2,*, Julien Gobeill1,2, Anaïs Mottaz1,3, Emilie Pasche1,2, Arnaud
                                 Gaudinat1,2, Patrick Ruch1,2
    1
     BiTeM group, HES-SO/HEG Geneva, Information Science Department, 17 rue de la
                       Tambourine, CH-1227, Carouge, Switzerland
 2SIB Text Mining, Swiss Institute of Bioinformatics, 1 rue Michel-Servet, CH-1206, Genève,

                                         Switzerland
    3HUG, Geneva University Hospitals, 4 rue Gabrielle-Perret-Gentil, CH-1205, Genève,

                                         Switzerland
       *Corresponding author: Tel: +41 22 38 81 848; Fax: +41 22 38 81 701; Email:
                                     luc.mottin@hesge.ch




   Abstract: BiTeM/SIB Text Mining (http://bitem.hesge.ch/) is a University re-
search group carrying over activities in semantic and text analytics applied to health
and life sciences. This paper reports on the participation of our team at the CLEF
eHealth 2016 evaluation lab. The processing applied to each evaluation corpus
(QUAREO and CépiDC) was originally very similar. Our method is based on an Au-
tomatic Text Categorization (ATC) system. First, the system is set with a specific
input ontology (French UMLS), and ATC assigns a rank list of related concepts to
each document received in input. Then, a second module relocates all of the positive
matches in the text, and normalizes the extracted entities. For the CépiDC corpus, the
system was loaded with the Swiss ICD-10 GM thesaurus. However a late minute data
transformation issue forced us to implement an ad hoc solution based on simple pat-
tern matching to comply with the constraints of the CépiDC challenge. We obtained
an average precision of 62% on the QUAREO entity extraction (over
MEDLINE/EMEA texts, and exact/inexact), 48% on normalizing this entities, and
59% on the CépiDC subtask. Enhancing the recall by expanding the coverage of the
terminologies could be an interesting approach to improve this system at moderate
labour costs.

   Key words: Named-Entity Recognition, Automatic Text Categorization, Discon-
tinuous Entity Extraction, Relocation, Statistical Training, Concept Normalization,
UMLS, ICD-10.
1. Introduction

   Biomedical data involves a large diversity and quantity of valuable knowledge for
the medical research and practice. Thus, text-mining tools such as named-entity rec-
ognizers have been developed to effectively and efficiently access textual contents.
Now, a dynamic way to improve the different systems implies to compare them on
specific shared tasks as in CLEF such as in [1-3]. In 2016, the challenge was divided
in three subtasks: entity recognition and normalization on the QUAREO corpus, and
entity extraction on the CépiDC corpus, plus a replication track [4-5]. Both of the
corpora are available in French and related to biomedical literature.

   We report in this paper the contribution of the group to the eHealth task 2 (Multi-
lingual Information) within the CLEF 2016 competition. Our team participated to
most of these tracks, including MEDLINE and EMEA entity extraction (respectively
labelled 2.Q.1 and 2.Q.2), MEDLINE and EMEA normalizedEntities (2.Q.3 and
2.Q.4), the CépiDC coding (2.C), and the replication track.

   Our approach was to integrate an existing automatics categorizer (Ruch 2006) in
the processing of corpora. By providing a ranked list of concepts for each unit of a
corpus, we aim at testing the accuracy of this tool within a Named-Entity Recognition
(NER) task.



2. Methods

    2.1. QUAREO

         2.1.1. Data

   The QUAREO French medical corpus provided for this task includes two datasets
[6]. The first one is composed of 833 article titles from MEDLINE. The second da-
taset contains four sets of instructions for use of medicines from the European Medi-
cines Agency (EMEA), which are separated in 15 free-texts. Additionally, two other
datasets were previously supplied to train, evaluate and adjust the systems.

   Designed with a controlled language and strict rules, EMEA instructions represent
a good assessment for the extraction of entities blurred into free-text. MEDLINE ex-
tracts contain fewer concepts, but might be a challenge since they come from different
authors and journals that imply diverse writing style.

   The Unified Medical Language System (UMLS) is a compilation of ontologies and
software or services [7-8]. Required for the entity normalization, we used the standard
French release of the UMLS Metathesaurus as exclusive dictionary to extract the
biomedical entities with their Unique Concept Identifiers (CUIs) [9]. Thus, to set up
our application we handled the release, freely available in April 2016, from the Na-
tional Institute of Health website (www.nlm.nih.gov). With 397203 entries including
synonyms, and 139771 unique concept, this terminology regroups concepts from nine
sources in their French versions; Table 1 presents the distribution for each source.


  Table 1 : Distribution of terms in the French UMLS Metathesaurus.
Source                                    # terms
MSHFRE                                    112571
MDRFRE                                    97896
LNC-FR-FR                                 88306
LNC-FR-BE                                 44451
LNC-FR-CA                                 42766
LNC-FR-CH                                 4940
WHOFRE                                    3717
MTHMSTFRE                                 1833
ICPCFRE                                   723


   Ten groups of clinical entities are defined from the UMLS semantic types to pro-
vide a consistent categorization of biomedical concepts and support their normaliza-
tion [10-11]. These semantic groups are: Anatomy, Chemicals & Drugs, Devices,
Disorders, Geographic Areas, Living Beings, Objects, Phenomena, Physiology, and
Procedures. Aware of nested entities that could be assigned to different groups [2], we
used the training data to statistically assign the semantic types to the ten categories.
Regarding the semantic types with no mapping, or weakly expressed, we decided to
manually curate the corpus.


         2.1.2. Automatic Text Categorization

   CLEF 2016 was the opportunity to evaluate a tool we worked on several years ago
[12]. Based on a specific thesaurus, the Categorizer operates on each text of the cor-
pus one by one, and provides ranked lists of concepts. This ranking process combines
a regular expression classifier with a vector-space classifier described in Ruch (2016).


         2.1.3. Entity relocation

   The second phase of our system aims at matching the new list of concepts with the
input text using patterns. Each line is divided in word tokens and the program consid-
ers that multiple-words entities can be discontinuous, with one or many nested words.
Concretely, the system will successively try to find each term from the biomedical
concept identified with the line tokens. This also implies to take care of repeated and
overlapping entities. When a prediction is completely retrieved in the text, the system
recovers the offset position (positions of the first and last characters), and prepares a
new entry in the output respecting the BRAT format.


         2.1.4. Entity normalization

   Normalization was processed directly with the matching. As the ATC predict a list
of possible entities derived from the UMLS concepts, UMLS CUIs are associated
with every proposition. Thus, for each prediction matched in the text, the system can
immediately assign a unique CUI.


    2.2. CepiDC

         2.2.1. Data

  The CépiDC corpus compiles 110869 lines related to causes of death, and reported
by physicians, within a single CSV file. The corpus is structured in such a way that
one sentence is repeated when multiple causes should be distinctly encoded.

    The International Classification of Diseases (ICD), maintained by the World
Health Organisation (WHO), is an international standard including causes of mortali-
ty. The ICD-10 GM is the Swiss national version of this vocabulary [13], and we used
it as basis to set the system. Aiming to expand the coverage of the primary thesaurus,
we upgraded it by adding new entities (new translations from the English ICD-10, see
examples in Table 2). We also included new synonyms from the training dictionaries
(from 2006 to 2013) with their ICD-10 codes. Finally, to avoid false positives poten-
tially induced by short terms and acronyms, the expansion was limited to terms longer
than three characters.
   Table 2 : U84 and children translation granularity from ICD-10 GM.
                      ICD-10 WHO           ICD-10 GM
  ICD-10 code                                              Translation proposed
                         (English)           (French)
                                        Virus de l’Herpès
                   Resistance to other                      Résistance aux autres
      U84                                 résistants aux
                   antimicrobial drugs                          antimicrobiens
                                           virostatiques
                                                            Résistance aux médi-
                    Resistance to an-
     U84.0                                       -            caments antipara-
                    tiparasitic drug(s)
                                                                    sitaires
                   Resistance to anti-                      Résistance aux médi-
     U84.1                                       -
                      fungal drug(s)                       caments antifongiques
                  Resistance to antivi-                     Résistance aux médi-
     U84.2                                       -
                        ral drug(s)                          caments antiviraux
                                                            Résistance aux médi-
                   Resistance to tuber-
     U84.3                                       -          caments antitubercu-
                    culostatic drug(s)
                                                                      leux
                  Resistance to multi-                      Résistance à de mul-
     U84.7           ple antimicrobial           -           tiples médicaments
                           drugs                                antimicrobiens
                   Resistance to other
                                                            Résistance à un autre
     U84.8        specified antimicro-           -
                                                            antimicrobien précisé
                         bial drug
                    Resistance to un-
                                                            Résistance à un anti-
     U84.9        specified antimicro-           -
                                                            microbien non précisé
                         bial drugs



         2.2.2. Pattern Matching

   Our system uses pattern matching to test the different concepts, from the thesaurus,
with each line in the input. First, this method prioritizes the exact match that fit the
whole text, and then the longer entities.



3. Results and discussion

   Performances of the systems are evaluated with the common metrics used in Natu-
ral Language Processing [14]. Precision represents the proportion of retrieved con-
cepts that exactly match the gold benchmark prepared for these documents, while
Recall represents the proportion of relevant concepts that were exactly extracted by
the system. F-measure, also called harmonic mean, evaluates the accuracy of the sys-
tem using both of the Precision and the Recall. Scores are calculated according to the
following formulas.
   Moreover, an exact match is attributed for the entity recognition when the entity
type and span (starting position + ending position) correspond to the gold benchmark.
Regarding the normalized entity recognition, the UMLS CUIs must also coincide with
the reference benchmark. Inexact matches are credited when at least one word overlap
from the prediction overlaps the span from the certificated benchmark.

   The results from the competitive phase disclosed in mid-May are reported in fig-
ures from 1 to 5. Our system provides substantially better results on MEDLINE than
EMEA corpus, with F-scores of respectively 50% and 27% on the plain entity recog-
nition. However, the recall may indicates that the basic French UMLS limits the cov-
erage. This one is obviously not sufficient to extract all the concepts of interest, espe-
cially on the EMEA corpus that implies more drugs and pharmaceuticals.

   On the other hand, to pre-process the ontology must have played a significant role
to reach a F-score of 55% (precision 59% and recall 53%) by deploying an ad hoc
solution for the CéPIDC coding task.


                                                  QUAERO (EMEA)
5 teams, 9 runs                    exact match                            overall : entities
                        TP             FP         FN          Precision        Recall          F1
BITEM-run1                   406            371        1798       0.5225          0.1842          0.2724
Average scores                                                     0.525          0.4114           0.435
Median scores                                                     0.5998          0.3787          0.4443
3 teams, 5 runs                    exact match                    overall : normalized entities
                        TP             FP         FN          Precision        Recall          F1
BITEM-run1                   347            430        1856       0.4466          0.1575          0.2329
Average scores                                                    0.4762          0.3215          0.3761
Median scores                                                     0.4466          0.2687          0.3148
Figure 1 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/EMEA corpus, regarding the exact
matches.
                                                QUAERO (EMEA)
5 teams, 9 runs             inexact match                               overall : entities
                    TP              FP          FN          Precision         Recall          F1
BITEM-run1               489             288         1649       0.6293           0.2287         0.3355
Average scores                                                  0.6377           0.5141         0.5423
Median scores                                                   0.7175           0.4808         0.5564
3 teams, 5 runs             inexact match                       overall : normalized entities
                    TP              FP          FN          Precision         Recall          F1
BITEM-run1               363             415         1840       0.4666           0.1648         0.2435
Average scores                                                  0.4968           0.4341         0.4405
Median scores                                                   0.4666           0.2842         0.3324
Figure 2 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/EMEA corpus, regarding the inexact
matches.


                                               QUAERO (MEDLINE)
5 teams, 9 runs                 exact match                              overall : entities
                    TP              FP          FN          Precision         Recall          F1
BITEM-run1               1376         1032           1741       0.5714           0.4415         0.4981
Average scores                                                   0.503           0.4264         0.4455
Median scores                                                   0.6166           0.4375         0.4981
3 teams, 5 runs                 exact match                     overall : normalized entities
                    TP              FP          FN          Precision         Recall          F1
BITEM-run1               1185         1220           1912       0.4927           0.3826         0.4308
Average scores                                                  0.5006            0.376         0.4287
Median scores                                                   0.4927           0.3826         0.4308
Figure 3 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/MEDLINE corpus, regarding the exact
matches.
                                                   QUAERO (MEDLINE)
5 teams, 9 runs                    inexact match                            overall : entities
                       TP               FP          FN          Precision        Recall          F1
BITEM-run1                  1778             630         1351       0.7384          0.5682          0.6422
Average scores                                                      0.6387          0.5707          0.5859
Median scores                                                       0.7394          0.5682          0.6422
3 teams, 5 runs                    inexact match                    overall : normalized entities
                       TP               FP          FN          Precision        Recall          F1
BITEM-run1                  1214          1185           1885        0.506          0.3917          0.4416
Average scores                                                      0.5181          0.4757          0.4917
Median scores                                                        0.506          0.3917          0.4416
Figure 4 : System results for the plain entity recognition and the normalized
entity recognition tasks on the QUAREO/MEDLINE corpus, regarding the inex-
act matches.


                                                          CépiDC
5 teams, 7 runs                    exact match                                   overall
                       TP               FP          FN          Precision        Recall          F1
BITEM-run1              57265            40650        51562
                                                       0.5848   0.5262                              0.5539
Average scores                                         0.7878   0.6636                              0.7185
Median scores                                           0.811   0.6554                              0.6997
Figure 5 : System results for the coding task on the CépiDC corpus.



4. Conclusion

   Our results in the QUAREO subtask could certainly be improved by working with
the English version of the UMLS, which covers much more terminology (128
sources), such as the NCI thesaurus or dictionaries specific to the drugs. Text sample
would be translated in English using APIs (such a method has been proposed in past
CLEF eHealth workshops), and the resulting coverage improvement could be signifi-
cant. Another way to improve our system on QUAREO might have been to exploit
the training datasets to exercise the Categorizer.

   Regarding the CepiDC corpus, ATC did not achieved good results (e.g. forgetting
many exact matches) due to an issue at data pre-processing stages. Our ad hoc pattern
matching method brought relatively good results for the precision as well as the re-
call, but it would be interesting to prepare a subsequent run using the Categorizer.
5. References

   1- Braschler M. (2000) CLEF 2000 — Overview of Results. In Cross-Language
       Information Retrieval and Evaluation, Springer Berlin Heidelberg, 2069, 89-
       101.
   2- Goeuriot L., Kelly L., Suominen H., et al. (2015) Overview of the CLEF
       eHealth Evaluation Lab 2015. In Information Access Evaluation. Multilin-
       guality, Multimodality, and Interaction, Springer International Publishing,
       9283, 429-443.
   3- Huang C.C., Lu Z. (2015) Community challenges in biomedical text mining
       over 10 years: success, failure and the future. In Brief Bioinform, 17, 132-
       144.
   4- Overview of the CLEF eHealth Evaluation Lab 2016. Upcoming publication.
   5- CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Ex-
       traction. In CLEF 2016 Working Notes. CEUR-WS, Vol-1609.
   6- Néveol, A., Grouin, C., Leixa, J.,et al. (2014) The Quaero French medical
       corpus: A Resource for Medical Entity Recognition and Normalization. In
       Proceedings of the Fourth Workshop on Building and Evaluating Resources
       for Health and Biomedical Text Processing, 24-30
   7- Barr C.E., Komorowski H.J., Pattison-Gordon E., et al. (1988) Conceptual
       Modeling for the Unified Medical Language System. In Proceedings of the
       Annual Symposium on Computer Application in Medical Care, 1988, 148-
       151.
   8- Humphreys B.L., Lindberg D.A.B., Schoolman H.M., et al. (1998) The Uni-
       fied Medical Language System: an informatics research collaboration. In
       JAMIA, 5 (1), 1–11.
   9- Tuttle M., Sherertz D., Erlbaum M., et al. (1989) Implementing Meta-1: The
       First Version of the UMLS Metathesaurus. In Proceedings of the Annual
       Symposium on Computer Application in Medical Care, 1989, 483-487.
   10- McCray A.T., Burgun A., Bodenreider O. (2001) Aggregating UMLS Se-
       mantic Types for Reducing Conceptual Complexity. In Studies in health
       technology and informatics, 84(0 1), 216-220.
   11- Yan Chen Y., Gu H., Perl Y., et al. (2008) Structural group auditing of a
       UMLS semantic type’s extent. In Journal of Biomedical Informatics, 42(1),
       41-52.
   12- Ruch P. (2006) Automatic assignment of biomedical categories: toward a
       generic approach. In Bioinformatics, 22 (6), 658-664.
   13- Jetté N, Quan H., Hemmelgarn B., et al. (2010) The development, evolution,
       and modifications of ICD-10: challenges to the international comparability
       of morbidity data. Medical Care, 48(12), 1105-1110.
   14- Manning,C.D. and Schütze,H. (1999) Foundations of Statistical Natural Lan-
       guage Processing. The MIT Press, Cambridge, 268-269.