=Paper=
{{Paper
|id=Vol-1179/CLEF2013wn-CLEFeHealth-HervasEt2013
|storemode=property
|title=UCM at CLEF eHealth 2013 Shared Task1a
|pdfUrl=https://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-HervasEt2013.pdf
|volume=Vol-1179
|dblpUrl=https://dblp.org/rec/conf/clef/HervasMSD13
}}
==UCM at CLEF eHealth 2013 Shared Task1a==
UCM at CLEF eHealth 2013 Shared Task1 Lucı́a Hervás, Vı́ctor Martı́nez, Irene Sánchez, Alberto Dı́az NIL Group Universidad Complutense de Madrid C/Profesor Garcı́a Santesmases, Madrid, 28040, Spain, lhervasmartin@gmail.com victormartinezsimon@gmail.com irene.sanchzmartinz@gmail.com albertodiaz@fdi.ucm.es Abstract. We are developing a system that analyzes medical reports and extracts a SNOMED-CT based concept representation. The more interesting characteristic of our system is not only that it can detect the concepts. It also takes into account if they appear in an affirmative, nega- tive or speculative context. The system also separates the concept repre- sentation according to the structure of the document. Our system takes these steps: automatic orthographic correction, acronyms and abbrevi- ation detection, negation and speculation phrase detection and medical concepts detection. For participating in Task 1 we have adapted our system in order to obtain the mentions that belongs to the Disorder se- mantic group defined in the guidelines. The approach is based on using MetaMap to detect the concepts and the spans. Our aim was to identify which was the best way to use MetaMap in our system to solve the Task 1. Keywords: Natural Language Processing, medical report, concept de- tection, Metamap, UMLS 1 Introduction The goal of Task 1 is to analyze clinical text documents and find mentions of disorders. There are two subtasks: (1a) discovering the mention boundaries and (1b) mapping each mention to a UMLS CUI. Normalization/mapping is limited to UMLS CUIs of SNOMED codes. Participants are free to use any UMLS resources [2]. For participating in this Task we have adapted a system that analyzes medical reports in order to obtain the mentions that belongs to the Disorder semantic group defined in the guidelines. The approach is based on using MetaMap to detect the concepts and the spans. Our system extracts a SNOMED-CT based concept representation from a medical report. Before the analysis we have other phases: a language corrector and an acronyms expander. The more interesting characteristic of our system is that not only detect the concepts, it also take into account if they appear in an affirmative, negative or speculative context. The system also separates the concept representation according to the structure of the document, that is, there is a different representation for each section of the document. During our research, we discover the ShARe/CLEF eHealth 2013 Shared Tasks [2]. As the these tasks were very close of what we are developing, we decide to participate to increase our knowing and the performance of our system. Our aim was to identify which was the best way to use MetaMap in our system to solve the Task1. We have submitted runs with no external annotations, two for task 1a and two for task 1b. The difference between the runs is only the DB used. We used the 2012AA USAbase strict model for the first run and the 2011AA USAbase strict model for the second run. Our best results for task 1a show 0.504 F1 score with strict evaluation, and 0.660 F1 score with relaxed evaluation. Our best results for task 1b show 0.362 Accuracy with strict evaluation and 0.870 Accuracy with relaxed evaluation. 2 MetaMap MetaMap maps biomedical text to concepts in the UMLS Metathesaurus. Sev- eral types of lexical/syntatic analysis are performed on the input text to perform this mapping: tokenization, part-of-speech tagging, lexical lookup in the SPE- CIALIST lexicon and shallow parsing. Per each noun phrase obtained is applied the next processes: variant generation, candidate identification, mapping con- struction and word sense disambiguation. Final scores are computed per each candidate mapping combining different measures [1]. MetaMap has different parameters that influences its performance: data op- tions, output options and processing options. The data options allow to choose the level of filtering and the UMLS data. The default setting is the Strict model, where all types of filtering are applied. The Relaxed model only includes manual and lexical filterings. 3 Processing We use the default setting of MetaMap to detect the different concepts and to know their CUI. MetaMap retrieves some concepts, so to reduce the noise, we only take the concepts with the greater score. We also use the MedPost/SKR server included in MetaMap to perform word sense disambiguation. Finally, we configure the system to accept only the next semantic types that corresponds to the Disorder semantic group defined in the guidelines. – Congenital Abnormality – Acquired Abnormality – Injury or Poisoning – Pathologic Function – Disease or Syndrome – Mental or Behavioral Dysfunction – Cell or Molecular Dysfunction – Experimental Model of Disease – Anatomical Abnormality – Neoplastic Process – Signs and Symptoms 4 Framework evaluation Participants will be provided training and test datasets. The evaluation for all tasks will be conducted using the withheld test data. Teams are allowed to use any outside resources in their algorithms. 4.1 Evaluation Measures In subtask 1a, boundary detection of disorders, the evaluation measures are F1- score, Recall and Precision, where a TP is considered when the span obtained is the same that the gold standard span, a FP when it is a spurious span, and a FN when it is a missing span. There are two variants: Strict and Relaxed, depending if the span is identical to the reference standard span, or if the span overlaps the standard span. In subtask 1b, identify the boundaries of disorders and map them to a SNOMED-CT code, the evaluation measure is Accuracy, where Correct is con- sidered as the number of disorder named entities with strictly correct span and correctly generated code and Total is considered as the number of disorder named entities. There are also two variants: Strict and Relaxed, depending if Total is considered as the number of reference standard named entities, or if Total is considered as the number of named entities with strictly correct span generated by the system. In the first case, the system is penalized for incorrect code assign- ment for annotations that were not detected by the system. In the second case, the system is only evaluated on annotations that were detected by the system. 5 Results We have submitted runs with no external annotations, two for task 1a and two for task 1b. The difference between the runs is only the DB used. We used the 2012AA USAbase strict model for the first run and the 2011AA USAbase strict model for the second run. Our best results for task 1a shown a 0.504 F1 score with strict evaluation, and a 0.660 F1 score with relaxed evaluation. Our best results for task 1b shown a 0.362 Accuracy with strict evaluation, and a 0.871 Accuracy with relaxed evaluation. Table 1. Task 1A. No external annotations. Strict Team,Country Precision Recall F1-Score UTHealth CCB.2, UT, USA 0.800 0.706 0.750 NIL-UCM.2, Spain 0.617 0.426 0.504 NIL-UCM.1, Spain 0.621 0.416 0.498 FAYOLA.1, VW, USA 0.024 0.446 0.046 Table 2. Task 1A. No external annotations. Relaxed Team,Country Precision Recall F1-Score UTHealth CCB.2, UT, USA 0.925 0.827 0.873 NIL-UCM.2, Spain 0.809 0.558 0.660 NIL-UCM.1, Spain 0.812 0.543 0.651 FAYOLA.1, VW, USA 0.504 0.043 0.079 Table 3. Task 1B. No external annotations. Strict Team,Country Accuracy(sn2012) Accuracy(sn2011) NCBI.2, MD, USA 0.589 0.584 NIL-UCM.2, Spain 0.362 0.362 NIL-UCM.1, Spain 0.362 0.362 NCBI.2, MD, USA 0.006 0.006 Table 4. Task 1B. No external annotations. Relaxed Team,Country Accuracy(sn2012) Accuracy(sn2011) AEHRC.1, QLD, Australia 0.939 0.939 NIL-UCM.1, Spain 0.871 0.870 NIL-UCM.2, Spain 0.850 0.850 UTHealth CCB.1, UT, USA 0.728 0.772 6 Discussion The detection of boundaries of disorders offers bad results mainly due to the limit of MetaMap in the discovering of the spans: the best Recall obtained is around 0.42. Of course, the main problem is related with the discontinuous spans that MetaMap is not able to process. With respect to the difference between our systems, the second version offers slightly better results, as expected, because it uses the 2011AA USAbase database. With respect to the type of evaluation, higher scores are obtained with relaxed evaluation, mainly on Precision, but Recall only increase to 0.558. With respect to the mapping of CUIs, the results are low in the strict evalua- tion, but high in relaxed evaluation. That is due to the penalization for incorrect code assignment for annotations not detected by the system. Our results show the baseline that can be obtained using MetaMap with the strict model configuration. Then, we can conclude that MetaMap is not enough to solve this task. Acknowledgements We want to acknowledge the support given by the Shared Annotated Resources (ShARe) project, funded by the United States National Institutes of Health with grant number R01GM090187. References 1. Aronson , A., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances, Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 229236, 2010. 2. Suominen, H., Salantera, S., Velupillai, S.: Three Shared Tasks on Clinical Natural Language Processing. Proceedings of CLEF 2013. To appear