-

TeamUEvora at CLEF eHealth 2014 Task2a

Jo~ao Sequeira

Nuno Miranda

Teresa Goncalves

Paulo Quaresma

0 0 Computer Science Department, School of Science and Technology University of Evora , Evora , Portugal

156 166

We present our rst participation in a ShARe/CLEF eHealth Lab contributing for task 2a. Task 2 is an extension of the 2013 lab task 1 and consists of information extraction from clinical texts for Disease/Disorder Template Filling; task 2a aims at predicting each attribute's normalization value. This work constitutes a preliminary approach to the problem of extracting and handling information from clinical texts. More than getting a good result, our priority was to get a rst hint on the questions and problems that are posed within this area. For that, we developed a system that combines information from cTAKES output and the training corpus. The performance was measured using accuracy. Our system ranked 7th with an accuracy of 0.802, a F1 of 0.214, a precision of 0.217 and a recall value of 0.211.

Clinical texts Template lling Text normalization cTAKES Medical Informatics

The ShARe/CLEF eHealth Lab 20141 [ 1,2 ] task 2 is an extension of the task 1 of the same lab from 2013 [ 3 ] and consists of information extraction from clinical texts with the goal of disease/disorder template lling. For each disease/disorder present in each clinical report there is a template with ten di erent attributes and participants have to predict the value for each attribute. There are two subtasks: 2a) assign normalization values to the ten attributes; 2b) assign cue values to the nine attributes with cues.

This is our rst participation in a ShARe/CLEF eHealth Lab and we contributed to subtask 2a, building a system that uses previous implemented technologies. Being this the rst time we work with medical information, our main priority is to understand the problems associated with the extraction of information in the area. In this paper we present the system architecture and the decisions made; we also present and analyse the experimental results on the training and test corpora.

The paper has the following structure: Section 2 introduces the task, the training and test corpora in detail and Section 3 presents the implemented system. The results are discussed in Section 4 and conclusions and a glimpse of future work are presented in Section 5. 2

Task

As said in Section 1, task 2 is an extension of the 2013 task 1 lab aiming at lling templates with attributes values and cues.

Files with empty templates for each disease/disorder (mentioned in the corresponding clinical text) were provided to the participants. Each template indicates the Uni ed Medical Language System Concept Unique Identi er (CUI), mention boundaries and the ten attributes needed to be lled. Each attribute has two slot types: the normalized value and the lexical cue from the sentence where the normalized value occurred. Task 2a evaluates the systems' ability to predict the normalized value for each attribute and task 2b the ability to nd the right cue slot value for each attribute.

Since we participated only on task 2a (that was mandatory), our templates have default values in all the cue slots. Table 1 presents template information: a header with the le name, the cue slot of the disease/disorder and its CUI, the nine modi ers associated with the disease/disorder with normalized values (task 2a) and cue slots (task 2b) plus the DocTime modi er that only has a normalized value. 2.1

Description of the training and test corpora

The train and test corpora provided are composed of clinical texts from four di erent types: discharge summary, ECG report, ECHO report and radiology report. Their distribution in each corpus is presented in Table 2.

Analysing both corpora we can observe some di erences. In the training corpus the Discharge summary type has 45.82% of documents while the remaining classes have an equal number, 18.06%; in the test corpus there are only Discharge summary documents. 3

System Architecture

This section presents the implementation of our system and the approaches taken to tackle the modi ers. 3.1

cTAKES As said before, our system uses previous implemented technologies for clinical texts analysis and information extraction (this method was also used in task 1 [ 6,7,8,9,10,11,12,13,14 ] of 2013 ShARe/CLEF eHealth Lab).

Header File name

Cue slot Concept Unique Identi er (CUI)

Modi ers 2a) Normalized values 2b) Cue slot

yes/no* if value is yes patient*, family member, other, null, if di erent donor family member, donor other of patient

yes/no* if value is yes unmarked*, changed, increased, decreased, if di erent improved, worsened, resolved of unmarked unmarked*, severe, if di erent slight, moderate of unmarked true/false* if value is true true/false* if value is true NULL*, CUI, if di erent

CUI-less of NULL unknown*, before, after, no overlap, before-overlap slot none*, date, time, if di erent

duration, set of none

We used the output of the clinical Text Analysis and Knowledge Extraction System (cTAKES) [ 4 ] (version 3.1.1). cTAKES2 is a open source linguistic tool kit from the Apache Software Foundation. Some operations done by cTAKES include: Negation and Uncertainty Indicators, Subject and Conditional Classes and Body Location. For the modi ers NI, SC, UI, CO and BL we extracted the information from the cTAKES output. Among the attributes related with the diseases/disorders identi ed by cTAKES we found information that could be directly used for some of the modi ers: we used the polarity attribute from cTAKES to identify if the diseases/disorders were negated and assigning a value to NI; for the SC, UI and CO modi ers, cTAKES have attributes with the same name and we only needed to convert that information into the normalized values of the task modi ers.

For the BL modi er we used a set of rules to know if there were body locations in the same sentence of the identi ed disease/disorders and extracted the respective CUI. We tried to extract the CUI of the most speci c body location possible, so we searched the expression with a bigger number of words, using the premise that more information means more speci city.

Course Class and Severity Class. For the CC and SV modi ers we used a

mapping approach. From the 299 clinical texts that compose the training corpus, we extracted expressions (without repetition) related to each modi er value.

When using expressions from a mapping approach, there is the risk of identifying equal expressions from the text but not in the correct context. To determine if the modi ers CC and SV had this problem we checked the expressions in each mappings le and concluded that the expressions were not too common and the probability of identifying wrong expressions was acceptable for our objectives. Generic Class. The GC modi er had a particular characteristic { there was no example of it in the training corpus; assuming that the test corpus would follow this, few to none appearances of this modi er expressions would appear. Based on this assumption we used the default value (false) in every template. 2 http://ctakes.apache.org/ DocTime. The DT modi er expresses the temporal relation between the disease/disorder and the time when the clinical text was written. It can have the following values: { Before-overlaps: disease/disorder identi ed in the past and still present; { Before: disease/disorder identi ed and treated in the past; { Overlap: disease/disorder present but there is no information about when it was diagnosed or when it will pass; { After: one action or event that it is still to come; { Unknown: no temporal relation information.

For this modi er we used a purely statistic approach, meaning that, for each template we selected the most common value presented in the training corpus { Overlap.

Table 3 presents occurrence percentage for training corpus for each possible DT value; it can be noticed that more than half of the occurrences (56.35%) has the Overlap value, so this one was chosen to ll all the templates. The Before value had also an expressive number, but Overlap more than doubles it. Temporal Expressions. To identify dates and hours we used regular expressions. At rst we thought of using a mapping approach too, but dates and hours are very speci c and if an expression appear in the same format but with one day apart, that expression wouldn't be identi ed.

Based in the training corpus, we created four regular expressions aiming to identify DATE and two regular expressions to identify Time: { DATE

Day/Month/Year (dd/mm/yyyy); Day-Month-Year (dd-mm-yyyy); Year-Month-Day (yyyy-mm-dd);

Month-Year (mm-yy). { TIME 24 hours time (hh:mm); 12 hours time (hh:mm am/pm)

We didn't consider the identi cation of expressions associated with the remaining values of the modi er { duration and set. 3.3

Implementation

Our system was implemented using the Java programming language. Figure 1 presents the system's architecture { it uses mapping les, regular expressions, decisions based on the training corpus and cTAKES.

XML les were generated from cTAKES, and from them we extracted information using a parser and applied the procedures described in the last subsection. With the obtained information, the system updated the modi ers' values and printed the templates with the nal result.

Next we explain the steps necessary to get the lled templates: 1. run cTAKES with the clinical texts as input; 2. load information from templates, namely the header (because the rest are the default values), and the map les built for CC and SV; 3. process the XML from cTAKES using a set of rules to extract information; 4. use the information previously gathered to substitute the default values from the templates.

Step 1. The rst step can be also called a pre-processing one { the generation of the XML les using cTAKES. It generates a XML le for each clinical text. cTAKES has a large set of speci c analysis engines and a set of aggregate ones that combine the speci c ones. These aggregate engines describe how particular annotators can be combined using a set of rules that describe how each annotator uses the analysis of the previous one.

Several aggregate engines were tested and the one that o ered the best results (and was used for the participation run) was AggregatePlaintextUMLSProcessor. Step 2. On startup, the system loads the mapping les of CC and SV modi ers obtained from the training corpus. It also loads the templates information into a data structure that the system can use during all process.

Step 3. After steps 1 and 2, the system processes the XML les. We used xPath expressions to extract the information considered necessary to task 2a; this information was stored in data structures suited for being subsequently processed. The information is extracted using two approaches: { the 'strict' one, where the system searches diseases/disorders with a perfect match the information gathered from cTAKES; { the 'relaxed' one, that is used in case the 'strict' fails. This one, although less accurate, veri es if the boundaries of the disease/disorder from the template header are inside the ones of the chunk identi ed by cTAKES.

The CUI of the body locations associated to the disease/disorder is obtained using a set of rules that joins information from the di erent data structures mantained. In order to reach the most speci c CUI, the system chooses the longest body location term from the cTAKES output.

Step 4. The nal step gathers all information from the previous steps, relying mainly in the coordinates of the diseases/disorders in text.

To extract the modi ers information, the system searches the sentences where the diseases/disorders were identi ed, looks for the cTAKES gathered information, replaces the info in the respective template, searches for terms in the mapping, applies the regular expressions and writes the found info in the template. Finally it writes the info for the DT and GC modi ers (that is equal for all templates). 4

Results

0.071 and TE an improvement of 0.142. For DT modi er, the training presents a better result with an improvement of 0.57 over the test corpus.

Comparing the test corpus results with the best accuracy reported in task 2a we notice that in some modi ers like SC, UI, CO and TE the di erence is lower than 0.2 and the values for class GC are equal; for modi ers BL, DT and CC there is a bigger discrepancy between the results. Nevertheless, in overall our system stood behind 0.082 when compared with the overall value calculated.

Table 5 presents the F1, precision and recall values for both the train and test corpora. There we can see that the values are not so di erent between the train and test corpora among most of the modi ers. Modi ers like SC, UI, CO, BL and TE have better results in the test corpus; on the other side NI, CC, SV and DT modi ers have better results in the training corpus.

The DT modi er obtained widely di erent values with a F1 of 0.592 in the train and a corresponding value of 0.024 in the test corpus. This can be explained because the value of this modi er is always the same for every template of the output; this decision was based on the modi er statistics from the training corpus.

We ranked seventh among all the participants of task 2a, as showed in Table 6. The best system had an overall accuracy of 0.868 and our system obtained an overall accuracy of 0.802. This value is lower than the average accuracy value of all participants. Our system also obtained values below the average in the F1, precision and recall.

Conclusions and Future work

This paper presents the design and the implementation of our system, developed for participating in the task 2a of 2014 ShARe/CLEF eHealth Lab. The task's main goal was to obtain normalized attributes values for disease/disorder template lling. 5.1

Conclusions

Our participation's main goal was to understand the problems associated with the design and implementation of a system to extract information from medical data. The system gathers knowledge from already implemented technology in the clinical area, namely cTAKES; it also uses resources based on the training corpus, regular expressions and decisions based on modi ers statistics.

Between 14 participants, it ranked 7th, with an accuracy value of 0.802. Taking into account our goal, we consider this a good result; nevertheless there is much space for improvement. 5.2

Future work

cTakes is one of the resources of our system and we intend to add more sources of knowledge of the medical area so we can improve our system. One hypothesis is MetaMap[ 5 ], widely used in task 1 of 2013 Lab. Last year, some participants used only cTAKES [ 6,8 ], others used only MetaMap [ 7,9,10,11,12 ] and others used a joint approach [ 13,14 ].

On the other hand, we intend to complement or substitute the approach taken to some modi ers: { for Course and Severity we want to try a machine learning approach; { for temporal expressions, we want to improve the system by also identifying duration and set expressions. For that we intend to use technologies in the area of clinical time identi cation; { for DocTime we intend to incorporate knowledge in order to give di erent values to di erent examples (instead of using the same value for all of them): { for Generic modi er, we aim to develop a more automatic way to detect this class. Nevertheless, to do that we need some examples of this modi er in the training corpus.

1. Kelly , L. , Goeuriot , L. , Leroy , G. , Suominen , H. , Schreck , T. , Mowery

D. L.

, Velupillai , S. , Chapman , W. W. , Zuccon , G. , Palotti , J.: Overview of the ShARe/CLEF eHealth Evaluation Lab 2014 . Springer-Verlag. ( 2014 )

2. Elhadad , N. , Chapman , W. , O'Gorman , T. , Palmer , M. , Savova , G. : The ShARe Schema for the Syntactic and Semantic Annotation of Clinical Texts . ( 2014 ). (Under Review) .

3. Suominen , H. , Salantera, S. , Velupillai , S. , Chapman , W. W. , Savova , G. , Elhadad , N. , Pradhan , S. , South , B. R. , Mowery , D. L. , Jones , G. J. , Leveling , J. , Kelly , L. , Goeuriot , L. , Martinez , D. , Zuccon , G. : Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 . CLEF 2013, Valencia , Spain: Springer Berlin Heidelberg. In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

4. Savova , G.K. , Masanz , J.J. , Ogren , P.V. , Zheng , J. , Sohn , S. , Kipper-Schuler , K.C. , Chute , C.G. : Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications . In: Journal of the American Medical Informatics Association 17 ( 2010 ) 507 - 513 .

5. Aronson , A.R. , Lang , F.M.: An overview of MetaMap: historical perspective and recent advances . JAMIA 17 ( 3 ) ( 2010 ) 229 - 236 .

6. Cogley , J. , Stokes , N. , Carthy , J.: Medical Disorder Recognition with Structural Support Vector Machines . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

7. Leaman , R. , Khare , R. , Lu , Z. : NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with DNorm . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

8. Gung , J.: Using Relations for Identication and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 eHealth Evaluation Lab . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

9. Hervas , L. , Mart nez , V., Sanchez , I. ,

az , A.: UCM at CLEF eHealth 2013 Shared Task1 . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

10. Osborne , J. D. , Gyawali , B. , Solorio , T. : Evaluation of YTEX and MetaMap for clinical concept recognition . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

11. Wang , C. , Akella , R.: UCSC's System for CLEF eHealth 2013 Task 1 . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

12. Zuccon , G. , Holloway , A. , Koopman , B. , Nguyen

: Identify Disorders in Health Records using Conditional Random Fields and Metamap; AEHRC at ShARe/CLEF 2013 eHealth Evaluation Lab Task 1 . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

13. Bodnari , A. , Deleger , L. , Lavergne , T. , Neveol , A. , Zweigenbaum , P. : A Supervised Named-Entity Extraction System for Medical Text . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).

14. Xia , Y. , Zhong , X. , Liu , P. , Tan , C. , Na , S. , Hu , Q. , Huang , Y. : Combining MetaMap and cTAKES in Disorder Recognition: THCIB at CLEF eHealth Lab 2013 Task 1 . In: Proceedings of ShARe/CLEF eHealth Evaluation Labs ( 2013 ).