Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            Comparing the Effects of Two Semantic Terminology Models on Classification of
                         Clinical Notes: A Study of Heart Murmur Findings

                             Guoqian Jiang, Ph.D. and Christopher G. Chute, M.D., Dr. P.H.
                  Division of Biomedical Informatics, Mayo Clinic College of Medicine, Rochester, MN
                                          (mailto:Jiang.Guoqian@mayo.edu)

            Abstract                                                          For example, the latest version of the International
            Objectives: We compared the effects of two semantic               Classification of Nursing Practice (ICNP) uses a
            terminology models on classification of clinical notes            7-Axis model to support the representation of nursing
            through a study in the domain of heart murmur findings.           concepts and integrates the domain concepts of nursing
            Methods: One schema was established from the                      in a manner suitable for computer processing [5].
            existing SNOMED CT model (S-Model) and the other                  One of the main goals of the semantic terminology
            was from a template model (T-Model) which uses base               models is to support capturing structured clinical
            concepts and non-hierarchical relationships to                    information that is crucial for computer programs such
            characterize the murmurs. A corpus of clinical notes              as information retrieval systems and decision support
            (n=309) was collected and annotated using the two                 tools [6]. Structured recording has the potential to
            schemas. The annotations were coded for a decision                improve information retrieval from a patient database
            tree classifier for text classification task. The standard        in response to clinically relevant questions [1].
            information retrieval measures of precision, recall,              However,       functional     difference  in   retrieval
            f-score and accuracy and the paired t-test were used for          performance has not been clearly demonstrated
            evaluation. Results: The performance of S-Model was               between these two different semantic terminology
            better than the original T-Model (p<0.05 for recall and           models.
            f-score). A revised T-Model by extending its structure            In this study, we focus upon the specific domain of
            and corresponding values performed better than                    heart murmur findings. Two schemas were established
            S-Model (p<0.05 for recall and accuracy). Conclusion:             from two different semantic terminology models for
            We discovered that content coverage is a more                     evaluation: one schema is extracted from the existing
            important factor than terminology model for                       SNOMED CT model (S-Model) and the other is a
            classification; however a templatestyle facilitates               template model (T-Model) extracted from a
            content gap discovery and completion.                             concept-dependent attributes model recently published
                                                                              by Green, et al [7]. The objectives of the study are to
            Introduction                                                      annotate the real clinical notes using the two schemas
            While modern terminologies have advanced well                     and to compare and evaluate the effects of two models
            beyond      simple    one-dimensional        subsumption          on classification of the clinical notes.
            relationships through the introduction of composite
            expressions, there is an emerging convergence of                  Methods and Materials
            approaches toward the use of a concept-based clinical             Defining the annotation schemas
            terminology with an underlying formal semantic                    We defined two schemas for both S-Model and
            terminology model (STM) [1]. SNOMED CT, the most                  T-Model and represented the two schemas in Protégé
            comprehensive clinically oriented medical terminology             (version 3.2 beta), which is an ontology editing
            system, currently adopts a foundation based on a                  environment and was developed by Stanford Medical
            description logic (DL) model and the underlying                   Informatics [8].
            DL-based structure to formally represent the meanings             For the S-Model, we established a schema by
            of concepts and the interrelationships between concepts           extracting concept trees from the existing
            [2-3]. The existing SNOMED CT model is mainly                     sub-hierarchy of heart murmur findings in January
            pre-coordination oriented, i.e. containing many                   2006 version of SNOMED CT (see Fig. 1). One root
            pre-coordinated     terms,      and     also     supports         concept is “Heart murmur (SCTID_88610006)” which
            post-coordination. For example, a compositional                   includes 86 sub-concepts of pre-coordinated terms of
            expression “[ hypophysectomy (52699005) ] +                       heart murmur findings. The other root concept is
            [ transfrontal approach (65519007) ]” could be used to            “Anatomical concepts (SCTID_257728006)” which
            describe a more specific clinical statement than that             includes two parts relevant to our schema. One part is
            only using the term “hypophysectomy (52699005)”.                  the      concept     “Cardiac     internal    structure
            For a specific domain, a template model having a                  (SCTID_277712000)” and its sup-concepts. The other
            semantic structure with a coherent class of terms can be          part contains only those anatomical concepts appearing
            used as a formal representation [4]. This kind of model           in our clinical notes corpus on the basis of a manual
            is mainly post-coordination oriented and a list of                review. For all heart murmur concepts, two semantic
            atomic terms is organized within a semantic structure.            attributes derive from SNOMED CT context model for


                                                                         59
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            heart murmur findings that frame post-coordination.              instances        of        “anatomical         concepts
            One is “procedure site” that represents the auscultation         (SCTID_257728006)” and the values of the latter one
            site of a heart murmur and the other is “finding site”           were set as the instances of “Cardiac internal structure
            that represents the potential etiological site of a heart        (SCTID_277712000)”.
            murmur. The values of the former one were set as the

            Fig. 1 Schema of SNOMED CT Model (S-Model) for heart murmur findings represented in Protégé


            Fig. 2 Schema of Template Model (T-Model) for heart murmur findings represented in Protégé


                                                                        60
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            For the T-Model, a schema was established from a                  Annotation software and Annotators
            concept-dependent attributes model published in a                 A general purpose text annotation tool, Knowtator [9],
            recent paper of Green, et al [7]. In this schema (see Fig.        was used to map text contents to our schema.
            2), one root concept is “heart murmur” which had eight            Knowtator is a Java plug-in for Protégé and mainly
            semantic attributes, consisting of “has cardiac cycle             used for creating gold-standard training and evaluation
            timing”, “has murmur configuration”, “has murmur                  corpora for natural language processing (NLP) systems.
            duration”, “has murmur intensity”, “has murmur pitch”,            The annotation schemas described in section above
            “has murmur quality”, “has point of maximum                       were instantiated in Knowtator.
            intensity”, “radiates towards”. The corresponding                 One author (GJ) performed the annotation task and
            values of these eight attributes were set as the                  then the other author (CGC) verified the annotations
            sub-concepts of the other root concept “cardiac                   for 10% of all documents. Differences were mutually
            murmur characteristic values”. We adopted the model               adjudicated and lessons generalized to the remaining
            attributes are directly from Green’s model, as well as            90% of cases.
            their values (kindly provided by Green, interpersonal             Coding for machine learning classification
            communication).                                                   We coded the annotated corpora for classification using
            Preparing clinical notes corpus                                   a machine learning classification algorithm. The target
            The Mayo Clinic has a repository of approximately                 category of the classification is binary, i.e. aortic
            twenty million clinical notes that consist of documents           stenosis (AS) or non-AS. In other words, the goal of
            dictated by physicians that are subsequently transcribed          the classification is to predict whether a document with
            and filed as part of the patient’s electronic medical             a heart murmur description belongs to AS category or
            record. The following criteria were made to sample                not. The annotations of each document were used as
            those notes. Firstly, we extracted notes with these               the predictive features and coded as binary.
            criteria from Mayo repository in an automatic way: 1)             We used a Weka implementation of the decision tree
            created between January 1, 2005 to January 31, 2005;              (J4.8) [10], which is a well-known supervised approach
            2) Having a heart murmur description in Physical                  to classification.
            Examination section; 3) age  21; 4) Having a Hospital            Outcome measures and statistical analysis
            International Classification of Disease Adaptation                For the annotation task, we compared the description
            (HICDA) code of the Heart Valvular Disease, and 5)                completeness between the two models. The annotators
            removing patients with a code for status prosthetic               were asked to judge whether the heart murmur
            valve or complication of a prosthetic valve. Secondly,            descriptions of each document could be described
            we flagged extracted documents containing a diagnosis             completely through using the schema of a model while
            of aortic stenosis (AS), yielding 103 documents.                  they performed annotation task. If they judged a
            Thirdly, we randomly selected controls among the                  document as “incomplete”, they indicated a reason for
            extracted documents having no diagnosis of AS by                  the judgment.
            matching the following conditions: 1) no history of               To evaluate the data retrieval task, we used the standard
            vavular surgeries; 2) matching gender and age within 1            evaluation metrics of precision, recall, f-score and
            year for each case (see Table 1). Two controls were               accuracy. Precision is defined as the ratio of correctly
            retained for each case, totaling to 309 documents.                assigned AS category (true positive) to the total hit
            Finally, we parsed out cardiac exam from the Physical             number (true positives and false positives). Recall is
            Examination section of each document to create an                 the ratio of correctly assigned AS category (true
            annotation corpus.                                                positive) to the number of target category in the test set
                                                                              (true positives and false negatives). The f-score
            Table 1. Control documents selection by matching with             represents the harmonic mean of precision and recall.
            gender and age                                                    Accuracy is the ratio of correctly assigned categories
                                                                              (true positives and true negatives) to total number of
             Age Male Control Female Control Total                            instances in test dataset.
            21-30 1     2       0       0      3                              For S-Model, one dataset (SM) that contains the
            31-40 0     0       0       0      0                              annotations of both heart murmurs and anatomical
            41-50 0     0       2       4      6                              concepts was prepared. For T-Model, three datasets
            51-60 4     8       0       0      12                             were prepared. The first one (TM1) is that contains the
                                                                              annotations from Green’s original model. The other
            61-70 7     14      5      10      36                             two datasets are extension of TM1. We extended TM1
            71-80 26    52      7      14      99                             to create TM2 by completing the values for all eight
            81-90 24    48      21     42     135                             semantic attributes whenever a description appearing in
             91-  2     4       4       8      18                             the clinical notes corpus did not have a corresponding
            Total 64   128      39     78     309                             value in TM1. For example, we added “upper sternal
                                                                              border”, “mid sternal border” and “lower sternal


                                                                         61
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            border” into the schema because they appeared                     For comparison, the average number of annotations per
            frequently in our corpus to describe the auscultation             document in S-Model was less than those in T-Model,
            areas and the original model only contains “sternal               indicating that S-Model supports more abstract way for
            border”.                                                          description of heart murmur findings than T-Model.
            Building on TM2, we created our third model (TM3)                 Considering description completeness, 88 documents
            by adding a new semantic attribute “has inferences to             (28%) in S-Model were judged as “incomplete”; in the
            (specific murmurs or etiological mentions)” to the root           original T-Model, 201 documents (65%) were judged
            concept “heart murmur” and also completing its                    as “incomplete”. Thus, S-Model exhibits more
            corresponding values from those descriptions                      complete domain coverage than the original T-Model.
            appearing in the corpus. We re-annotated all documents            The reasons for the incompleteness of four datasets
            using the extended models respectively.                           from two models were listed in Table 2. We found that
            Ten-fold cross validation for retrieval was performed             S-Model (SM) could describe most of “auscultation
            10 separate times over all four datasets and the paired           area” and the original T-Model (TM1) could not. For
            t-test was performed to test the statistical significance         “radiation”, both SM and TM1 could not describe it
            of performance measures between the dataset of                    well (we noticed that for SM, it is due to lacking of
            S-Model and three datasets of T-Model.                            semantic attribute for “Radiation”, whereas that in
                                                                              TM1 is due to lacking of appropriate values for
            Results                                                           “Radiation” attribute). In addition, SM could describe
            For annotations                                                   all “ejection murmur” mentions and part of “aortic
            In S-Model, we made 995 annotations across all 309                valve related” etiological mentions; TM1 could not.
            documents. The average number of annotations per                  The results indicated that the strict template model, per
            document is 3.2. Among the annotations, 728 belonged              Green, assumes that observers are using strict
            to 33 different sub-concepts of heart murmur                      descriptions, and not making inferences to specific
            (88610006). Of the heart murmur annotations, 509                  murmurs and etiological mentions, whereas SNOMED
            (70.0%) had the values of the attribute “procedure site”          CT model accommodates partly the variability in
            filled and 6 (0.8%) had the values of the attribute               inferences and strict descriptions, by providing terms
            “finding site” filled.                                            that covers both.
            In T-Model, we made 1377 annotations against the
            original T-Model (TM1). The average number of                     Table 2 Frequency of reasons for the incompleteness of
            annotations per documents is 4.5. Among 335 discrete              four datasets from two models
            heart murmur annotations, 89.9% include timing,                                                SM   TM1   TM2   TM3
            79.7% include intensity and 69.0% include points of               Auscultation area             1    78     0     0
            maximum intensity (POMI). (see Fig.3)                                                                47     0
                                                                              Radiation                     47                0
                                                                              Configuration                 8     8     0     0
            Fig. 3 The annotation distribution of the eight attributes
                                                                              Quality                       7     5     0     0
            for all 335 heart murmurs annotated in original
            T-Model.                                                          Specific murmurs
                                                                                  Ejection murmur           0    107   107    0
                                                                                  Regurgitant murmur        3     3     3     0
               100.0% 89.9%                                                                                       2     2
                                                                                  Flow murmur               2                 0
                90.0%       79.7%
                80.0%                                                         Etiological mentions
                                  69.0%
                70.0%                                                             Aortic valve related      19   25    25     0
                60.0%                                                             Mitral valve related      4     4     4     0
                50.0%                                                                                             1     1
                                                                                  Pulmonary valve related   1                 0
                40.0%
                                                                                  Septal defect             1     1     1     0
                30.0%
                                        15.8%14.9%
                20.0%                              11.3%
                10.0%                                    4.2% 1.5%
                                                                              For TM2 and TM3, zero values in Table 2 indicated our
                 0.0%                                                         synthetic completion of the values of each
                                                                              corresponding attribute in T-Model. The description
                                                   y


                                      Ra ty
                                                   i
                        g


                                                  n

                                                  n

                                                                h
                                                  n
                                                 m
                              sit
                      in


                                                io


                                                              tc
                                                io


                                                io
                                               li
                                            Po


                                                                              completeness of TM2 was corresponding up to 57.6%,
                                                           Pi
                      m


                                             at

                                            at
                                             at
                                           ua
                               n
                            te


                                          ur
                                          di


                                         ur
                   Ti


                                         Q
                          In


                                       ig

                                        D


                                                                              and that of TM3 up to 100%. Table 3 provided the
                                     nf
                                   Co


                                                                              examples (a AS case vs. a Non-AS case) to show how
                                                                              annotations were taken for all four schemas from two
                                                                              models.


                                                                         62
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            For classification                                                            lesser than SM. The result indicates that the original
            As described in above section, four datasets (SM, TM1,                        T-Model using strict physical descriptions may not
            TM2 and TM3) from two models were formed for                                  fully represent descriptions of heart murmur findings in
            evaluation. The results of the evaluation metrics of the                      clinical notes, negatively impacting functional
            four datasets were shown in Table 4. We found that the                        performance.
            classification performance of SM was better than TM1                          The classification performance of TM3 was the
            (i.e. original Green’s model), with statistical                               significantly best among the datasets (p<0.05, paired
            significance identified for recall and f-score (p<0.05,                       t-test vs. SM). The result provided further evidence that
            paired t-test). We consider that the reason was probably                      inferences to specific murmurs and etiological
            that the TM1 did not contain a complete list of murmur                        mentions were important part of descriptions of heart
            characteristic values for many of its semantic                                murmur findings in real clinical notes, influencing the
            attributes.                                                                   functional performance of the terminology model in
            The performance of TM2 was better than TM1, but still                         this specific domain.


              Table 3 The examples (AS Case vs. Non-AS Case) of annotations using four schemas


                                                     AS Case                                                         Non-AS Case
            Textual Note    Heart: Loud 3 to 4/6 systolic ejection murmur heard best at      Heart: Regular rate and rhythmwith a 2/6 left upper sternal
                            the right upper sternal border. Absent of S2.                    border systolic regurgitant murmur. P2 was slightly increased.
                                                                                             There was an S4 but no S3. The apical impulse was not
                                                                                             localizable.

            SM              15157000:Cardiac murmur - intensity grade III (VI)               36680007:Cardiac murmur - intensity grade II (VI)
            Annotation        procedure site: [117144008:upper parasternal region]             procedure site: upper parasternal region
                                laterality: [24028007:right]                                     laterality: [7771000:left]
                            25311008:Cardiac murmur - intensity grade IV (VI)                31574009: Systolic murmur
                              procedure site: [117144008:upper parasternal region]             procedure site: [117144008:upper parasternal region]
                                laterality: [24028007:right]                                     laterality: [7771000:left]
                            77197001: Ejection murmur
                              procedure site: [117144008:upper parasternal region]
                                laterality: [24028007:right]

            TM1             Heart murmur:                                                    Heart murmur:
            Annotation         has cardiac cycle timing value: systolic timing                 has cardiac cycle timing value: systolic timing
                               has murmur intensity value: intensity grade III/VI              has murmur intensity value: intensity grade II/VI
                               has murmur intensity value: intensity grade IV/VI               has point of maximum intensity: sternal border (laterality: left)
                               has point of maximum intensity: sternal border (laterality:
                            right)

            TM2             Heart murmur:                                                    Heart murmur:
            Annotation         has cardiac cycle timing value: systolic timing                  has cardiac cycle timing value: systolic timing
                               has murmur intensity value: intensity grade III/VI               has murmur intensity value: intensity grade II/VI
                               has murmur intensity value: intensity grade IV/VI                has point of maximum intensity: upper sternal border
                               has point of maximum intensity: upper sternal border          (laterality: left)
                            (laterality: right)
                              has murmur quality value: loud

            TM3             Heart murmur:                                                    Heart murmur:
            Annotation         has cardiac cycle timing value: systolic timing                  has cardiac cycle timing value: systolic timing
                               has murmur intensity value: intensity grade III/VI               has murmur intensity value: intensity grade II/VI
                               has murmur intensity value: intensity grade IV/VI                has point of maximum intensity: upper sternal border
                               has point of maximum intensity: upper sternal border          (laterality: left)
                            (laterality: right)                                                 has inferences to: regurgitant murmur
                               has murmur quality value: loud
                               has inferences to: ejection murmur


                                                                                     63
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


              Table 4 The results of the evaluation metrics of the four datasets

                                            Precision              Recall                 F-score                Accuracy
               
                                           (mean±sd)             (mean±sd)               (mean±sd)              (mean±sd)
                      SM                 74.2% ±13.7%          59.4% ±15.6%            64.5% ±12.7%            79.0% ±6.1%
                      TM1                67.5% ±14.9%         *44.6% ±13.8%           *52.1% ±11.5%            73.6% ±5.4%
                      TM2                71.0% ±14.0%          53.2% ±18.9%            59.0% ±15.3%            76.9% ±6.8%
                      TM3                80.0% ±12.2%         *69.8% ±14.6%            73.5% ±10.4%           *83.6% ±5.8%
              *p< 0.05 (paired t-test)

                                                                               terminology model depends not only on the full value
            Discussions                                                        set of its semantic structure, but also on the coverage of
            In this study, we developed an approach to compare                 the semantic structure itself.
            and evaluate the domain coverage (indicated by the                 Our second extension (TM3) of the T-Model adds a
            description completeness) of two semantic terminology              semantic attribute together with its corresponding
            models and their effects on the classification of real             values. This did overcome the limitation of semantic
            clinical notes. We found that the description                      structure of the original T-Model and achieves a
            completeness of the S-Model was better than the                    complete description for given corpus. In other words,
            original T-Model with original value set,                          the extended structure allows a systematic examination
            correspondingly the performance of the S-Model on                  of where content gaps exist (e.g. missing values of
            classification was also better. The extensions of                  references to specific murmurs and etiological
            T-Model that improved the description completeness,                mentions) and also guides the “completion” of the
            did improve its performance on classification of                   terms or missing contents informed by the extended
            clinical notes. We clearly demonstrated that the domain            structure.
            coverage of a terminology model was directly                       In S-Model, most of its contents are pre-coordinated,
            correlated with its performance on classification of               with the post-coordination only possible for two
            clinical notes; this is not surprising.                            semantic attributes “procedure site” and “finding site”.
            We could see that the effect of a terminology model on             We did not extend the SNOMED CT model in a similar
            its functional performance in a specific domain mainly             fashion since the model is an international standard
            depends on its ability to represent the contents of the            although we believe that performance would be
            domain. In other words, the key issue for a terminology            improved were it also extended. However, the
            model is how to achieve complete domain coverage. If               extension of the model would be more complicated
            two different terminology models could represent the               than that of template model because it involves both
            contents of a domain to achieve the same coverage,                 pre-coordination and post-coordination. We consider
            their performances on classification of clinical notes             that the template model would be more applicable for
            should have no difference.                                         achieving complete domain coverage. An important
            In original T-Model, the description of a hear murmur              implication of these experiments is that a templatestyle
            could be fully post-coordinated by a semantic structure            terminology model more readily identifies gaps in
            of eight semantic attributes. With original value set, we          coverage, and facilitates their completion for
            found that its description completeness was                        classification tasks.
            sub-optimal. In the paper from which the model was                 Knowtator was used as our annotation tool and
            derived [7], the authors stated that “to adequately                satisfied our purpose well, demonstrating the following
            capture the full spectrum of cardiac murmur                        merits. The first merit is that Knowtator uses the
            descriptions, our model needed a complete list of                  Protégé ontology editing environment to build the
            murmur characteristics”. So our first extension (TM2)              annotation schema. The frame-based knowledge
            completes the term values for all eight attributes of the          representation system provides a flexible and
            original T-Model. The description of completeness was              expressive way to efficiently make schemas of the two
            increased from 35.0% to 57.6%.                                     model types in this study. The second merit is that
            Thus, adding axes content to each attribute within the             Knowtator provides visualization of annotations,
            semantic structure did improve the domain coverage of              making the annotation task and confirmation process
            the model; however, even with value completion, the                simple and efficient. The third merit is that the Java
            original T-Model still could not achieve complete                  API of the system, which supports the annotation query
            description for given corpus.                                      that exports our coding of annotations to a classifier
            Therefore, we consider that the domain coverage of a               format automatically.


                                                                          64
Representing and sharing knowledge using SNOMED
Proceedings of the 3rd international conference on Knowledge Representation in Medicine (KR-MED 2008)
R. Cornet, K.A. Spackman (Eds)


            In order to improve the baseline performances on all               The authors would like to thank Philip V. Ogren,
            standard evaluation measures, we performed control                 Serguei V.S. Pakhomov, Guergana K. Savova, Pauline
            selection of clinical notes using strict criteria. This            Funk and James D. Buntrock for their support.
            design did improve baseline performances (data not
            shown).                                                            References
            We regard the evaluation in this study in its                      [1] Brown PJ, Sonksen P. Evaluation of the quality of
            comparative context across models; absolute measures               information retrieval of clinical findings from a
            of precision and recall are subject to factors beyond the          computerized patient database using a semantic
            scope of this study. A limitation of this study is that the        terminological model. J Am Med Inform Assoc. 2000
            annotations of clinical notes depends entirely on what             Jul-Aug;7(4):392-403.
            clinicians decide to document for each patient, who                [2] Spackman KA, Campbell KE. Compositional
            they may or may not know has AS at the time. The                   concept representation using SNOMED: towards
            local culture around documentation seems possible that             further convergence of clinical terminologies. Proc
            these findings could be different on another corpus.               AMIA Symp. 1998;:740-4.
            Second, we only collected a relatively small size of               [3] Yu AC. Methods in biomedical ontology.J Biomed
            clinical notes corpus given that the intensive annotation          Inform. 2006 Jun;39(3):252-66.
            tasks were required. We consider that the annotation               [4] Zhou L, Tao Y, Cimino JJ, Chen ES, Liu H, Lussier
            corpus is valid as both authors have clinical medicine             YA, Hripcsak G, Friedman C. Terminology model
            background. Ten-fold cross validation used in this                 discovery using natural language processing and
            study may facilitate the efficient use of the data and get         visualization techniques. J Biomed Inform. 2006
            the best liability estimate. This kind of annotation               Dec;39(6):626-36.
            corpus may be used to train a machine learning based               [5] URL: http://icn.ch/icnp.htm; last visited at
            annotation algorithm to build an automatic domain                  December 29, 2006.
            specific annotation tool. In addition, because it was              [6] Rosenbloom ST, Miller RA, Johnson KB, Elkin PL,
            not our intention to evaluate which classifier performed           Brown SH. Interface terminologies: facilitating direct
            better, we only used a Weka implementation of the                  entry of clinical data into electronic health record
            decision tree (J4.8) algorithm.                                    systems. J Am Med Inform Assoc. 2006
            In conclusion, the domain coverage of the two models               May-Jun;13(3):277-88.
            and their performance on classification clearly differ             [7] Green JM, Wilcke JR, Abbott J, Rees LP.
            when applied to real clinical notes. Our approach                  Development and evaluation of methods for structured
            provides an effective framework to evaluate the                    recording of heart murmur findings using
            coverage and functional performance of the semantic                SNOMED-CT post-coordination. J Am Med Inform
            terminology models in a specific domain for potential              Assoc. 2006 May-Jun;13(3):321-33. Epub 2006 Feb
            improvement. Future direction would focus on the                   24.
            scalability of the approach and the evaluation of                  [8] URL: http://protege.stanford.edu/index.html; last
            interoperability among the different semantic                      visited at December 29, 2006.
            terminology models.                                                [9] URL: http://bionlp.sourceforge.net/Knowtator/; last
                                                                               visited at December 29, 2006.
            Acknowledgements                                                   [10] URL: http://www.cs.waikato.ac.nz/ml/weka/; last
            This study is partly supported by NIH R01 LM07319.                 visited at December 29, 2006.


                                                                          65