=Paper= {{Paper |id=Vol-1180/CLEF2014wn-eHealth-Osborne2014 |storemode=property |title=Disease Template Filling using the CTAKES YTEX Branch for ShareClef 2014 Task 2a |pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-Osborne2014.pdf |volume=Vol-1180 |dblpUrl=https://dblp.org/rec/conf/clef/Osborne14 }} ==Disease Template Filling using the CTAKES YTEX Branch for ShareClef 2014 Task 2a== https://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-Osborne2014.pdf
    Disease Template Filling using the CTAKES
                  YTEX Branch

                               John David Osborne

       University of Alabama at Birmingham, Birmingham AL 35294, USA,
                                ozborn@uab.edu,
                 WWW home page: http://coral.cis.uab.edu/



      Abstract. Using an adapted version of the YTEX branch of CTAKES
      for disease template filling accuracies of 0.936, 0.974, 0.807 and 0.926
      were achieved for the conditional, generic, negation and subject class
      respectively in Task 2a. Overall accuracy was 0.79. Unfortunately sub-
      stantially poorer performance in F1 score, precision and recall for all 4
      of these templating tasks indicates that it is not yet possible to get good
      performance using these CTAKES algorithms in this task.


Keywords: CTAKES, YTEX, evaluation, information extraction


1   Approach and Objectives

The YTEX [1] development branch of CTAKES [2] pipeline was evaluated for
template filling (Task 2a) [3]. The objective was to use the existing CTAKES
tools to populate the template for negation, subject class, conditional qualifiers
and generic references and to use the YTEX word sense disambugation and
dictionary lookup component to identify the anatomic location of the disease.
The remaining template filling tasks were not attempted and the YTEX based
anatomical location lookup was not completed in time for the test data.


2   Methodology

The base system employed was the YTEX branch of ctakes, specifically revision
1588688 at https://svn.apache.org/repos/asf/ctakes/branches/ytex. Default set-
tings were used for YTEX, including a concept window length of 10. The 2013AB
version of UMLS was used. Identified annotations matching the appropriate dis-
ease UMLS semantic types were checked for overlap with input disease templates
as defined in the Share schema [4]. The CTAKES generated modifiers were then
used to fill the template, otherwise the default values were used to fill the tem-
plate. No machine learning or training on the provided data took place.
    The system also included some additional non-CTAKES rule-based annota-
tors from a previous system [5] designed for ShARe/CLEF eHealth 2013 concept
recognition. However the only role they played was to better match CTAKES




                                         147
generated identified annotations to ShARe/CLEF eHealth 2014 disease concepts;
not to fill out the disease templates. Additionally the system also included an
annotator capable of recognizing a variety of different section types in clinical
notes. This annotator was developed on a variety clinical notes at the University
of Alabama at Birmingham (UAB) including discharge summaries and was not
otherwise modified in time for the test data. It was employed here only to find
family history sections in clinical notes and to change the subject to family for
disease occurrences in this section.

3   Results


                  Table 1. CORAL System Task 2a Test Results

             Task            Rank Accuracy F1 Score Precision Recall
             Overall average 10   0.790    0.030    0.240     0.016
             Norm BL         8    0.546    0        0         0
             Norm CC         4    0.961    0        0         0
             Norm CO         5    0.936    0.052    0.500     0.028
             Norm DT         9    0.001    0        0         0
             Norm GC         3    0.974    0        0         0
             Norm NI         12   0.807    0.196    0.746     0.113
             Norm SC         8    0.926    0.161    0.098     0.450
             Norm SV         6    0.942    0        0         0
             Norm TE         1    0.864    0        0         0
             Norm UI         3    0.941    0        0         0




    All template tasks with an F1 Score, precision and recall of zero were not
attempted by the CORAL system with the exception of generic mentions (Norm
GC). In the case of generic mentions, the CTAKES based generic determination
did not identify any in the test data although it was actively searching for them.
In the Norm SC (Subject Class) task, the use of UAB family history section
identification was not useful, the regular expressions developed for identifying
family history for UAB notes were not triggered on the test data. This under-
scores the diversity of clinical notes and the frailty of regular expression based
approaches. Finally, individual results for other tasks indicate that it is possible
to achieve seemingly reasonable accuracy in this task just by filling in the default
value for the template.

4   Analysis and Discussion
The overall poor performance of the CTAKES based template filling for the 4
attempted tasks indicates that no off the shelf solution exists for this type of
disease concept templating.




                                       148
Acknowledgements This project was supported by the UAB Center for Clinical
and Translational Science - grant number UL1 RR025777 from the NIH National
Center for Research Resources, and the UAB Office of the Vice President for
Information Technology.


References
1. Garla, V., Re III, V.L., Dorey-Stein, Z., Kidwai, F., Scotch, M., Womack, J., Justice,
   A., Brandt, C.: The Yale cTAKES extensions for document classification: architec-
   ture and application. J. Am. Med. Inform. Ass. 18 614–620 (2011)
2. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler,
   K.C., Chute, C.G.: Mayo clinical Text Analysis and Knowledge Extraction Sys-
   tem (cTAKES): architecture, component evaluation and applications. J. Am. Med.
   Inform. Ass. 17 507–513 (2010)
3. L Kelly, L Goeuriot, H Suominen, T Schreck, G Leroy, DL Mowery, S Velupillai,
   WW Chapman, D Martinez, G Zuccon, J Palotti: Overview of the ShARe/CLEF
   eHealth Evaluation Lab 2014. Springer-Verlag.
4. N Elhadad, W Chapman, T O’Gorman, M Palmer, G Savova. The ShARe Schema
   for the Syntactic and Semantic Annotation of Clinical Texts. In preparation.
5. Osborne, J. D., Gyawali, B., Solorio, T.: Evaluation of YTEX and MetaMap for
   clinical concept recognition. arXiv preprint arXiv:1402.1668 (2014)




                                          149