Automatic Summarization for Terminology
        Recommendation: the case of the NCBO
               Ontology Recommender

             Pablo López-García1,2 , Stefan Schulz1 , and Roman Kern2
1
    Medizinische Universität Graz - Institut für Medizinische Informatik, Statistik und
               Dokumentation. Auenbruggerplatz 2, 8036 Graz (Austria)
             2
               Know-Center GmbH. Inffeldgasse 13/6, 8010 Graz (Austria)


        Abstract. The National Center for Biomedical Ontology (NCBO) on-
        tology recommender helps users choose a biomedical terminology by an-
        alyzing a submitted document. Submitting a single document might not
        be representative and result in poor recommendations, while submit-
        ting a large sample might be expensive, sometimes unfeasible. In this
        paper, we investigate the effectiveness of two well-researched automatic
        summarization techniques as an alternative: topic modeling using La-
        tent Dirichlet Allocation and keyword extraction using TextRank. In our
        case study, both techniques proved to be extremely valuable, dramati-
        cally boosting performance without significantly affecting terminology
        recommendations (r = 0.83–0.98).

        Keywords: biomedical terminology, automatic summarization, ontolo-
        gies, TextRank, topic modeling


1     Introduction

Selecting one or more domain terminologies that are best suited for a given
application has proved to be a hard task [14]. Especially in the biomedical
field, terminology systems (vocabularies, classification, nomenclatures, ontolo-
gies) highly vary in scope, size, architecture, granularity, and purpose [4]. For
instance, SNOMED CT provides controlled terms for virtually every aspect of
health care [2], while others are highly specialized, such as the Foundational
Model of Anatomy (FMA), the National Cancer Institute (NCI) Thesaurus, the
Gene Ontology, or the Medical Subject Headings (MeSH).
    To help users choose a suitable terminology in text annotation applications,
the National Center for Biomedical Ontology (NCBO) [10] released the NCBO
ontology recommender web service [7]. After analyzing the structure and terms
of a document submitted by a user and the candidate terminologies in Biopor-
tal, the recommender suggests a list of terminologies. It is expected that the
terminology ranked first is the most appropriate for annotating that particular
document and others with a similar context. Bioportal is an open repository of
biomedical terminologies from the NCBO that currently reports nearly 6 million
2        Pablo López-García, Stefan Schulz, and Roman Kern

biomedical terms distributed in 370 terminologies, most of which are based on
an ontological foundation [11].
    Unfortunately, techniques that are widespread in the field of recommender
systems, such as collaborative filtering [8], can rarely be applied when recom-
mending biomedical terminologies. On the one hand, the content of biomedical
terminologies is much harder to model than the content of books, movies, or
songs—the prototypical target items of recommender systems. On the other
hand, user feedback in biomedical terminology is scarce. In Bioportal, for ex-
ample, users can rate the usability, coverage, quality, formality, correctness, and
documentation of terminologies, but in most cases the number of ratings is neg-
ligible (only one for SNOMED CT1 ).
    There are several limitations, however, when using a document submitted by
a user as context for making recommendations. Firstly, the submitted document
might not accurately represent the context of the user’s document collection,
misleading the recommender. Secondly, getting recommendations is expensive:
our experience shows that a single recommendation can take over 30 seconds
when using a full clinical document as input. Thirdly, even with a substantially
improved performance of the system, an intensive use with numerous submis-
sions of full texts to the recommender web service might result in degraded
performance. Therefore, it would be desirable to minimize both the number and
size of submitted documents, while maintaining their informational value.
    Summarizing the context of a collection before submitting it to a recom-
mender has proved to be a useful technique to improve efficiency without sub-
stantially influencing recommendations [1]. On the one hand, Hariri et al. showed
that topic modeling a collection using Latent Dirichlet Allocation (LDA) [3] was
useful for building a query-driven recommender for song recommendations [5].
Topic modeling finds clusters of related keywords in documents that usually
make sense to humans, e.g., "paracetamol", "aspirin", and "ibuprofen" identi-
fied as a cluster in a collection of medical records would be generally associated
with the topic of analgesics. Once the a collection has been topic modeled, each
document is represented as a weighted mixture of topics, from more to less preva-
lent. On the other hand, keyword extraction using TextRank [9], a graph-based
ranking model technique, provides an efficient and concise way of summarizing
a document that might be used for the same purpose.


1.1    Objectives


The main objective of this paper is to study the effectiveness of (a) topic model-
ing using Latent Dirichlet Allocation and (b) keyword extraction using TextRank
as summarization strategies in a context-based biomedical terminology recom-
mender, the NCBO ontology recommender.

1
    http://bioportal.bioontology.org/ontologies/SNOMEDCT, as of September 2014
                    Automatic Summarization for Terminology Recommendation                          3

2      Materials and Methods

The NCBO ontology recommender web service suggests the most appropriate
biomedical terminology for annotating biomedical documents. The recommender
analyzes a document submitted by the user and applies three criteria for making
recommendations: coverage, connectivity, and size of the candidate terminology,
taking into account all 370 terminologies from Bioportal [7]. Recommendations
are offered both via a web interface and via a REST API.
    As a representative document collection we selected discharge summaries
from an Intensive Care Unit (ICU), reporting events of a hospitalization (e.g.,
admitting and discharge diagnoses, physical examinations, and past and follow-
up medications). These text address a number of topics of interest in biomedical
informatics (e.g., anatomy, drugs, and diseases). The documents were obtained
from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC-II)
research database2 , a collection of de-identified data from an ICU [13]. 26,657
discharge summary texts were extracted from the text field in the noteevents ta-
ble of the MIMIC-II database. Table 1 shows an excerpt of a discharge summary.


    ADMISSION DIAGNOSIS: End stage renal disease, admitted for transplant surgery.
    HISTORY OF PRESENT ILLNESS: The patient is a 65 year-old woman with end stage renal
    disease, secondary to malignant hypertension. She was started on dialysis in (...)
    PAST MEDICAL HISTORY: End stage renal disease, secondary to malignant hypertension
    on dialysis. History of anemia following gastric angiectasia (...)
    ALLERGIES: No known drug allergies.
    MEDICATIONS: Unknown.
    SOCIAL HISTORY: Married, lives with her husband. She has a history of a half pack of
    cigarettes per day for 20 years. Occasional alcohol.
    PHYSICAL EXAMINATION: The patient was afebrile. Vital signs were stable. Blood pressure
    was 124/58; heart rate 76; weight 160 pounds. Abdomen soft and nontender (...)
    HOSPITAL COURSE: On [**3389-7-7**], the patient went to the operating room for living
    donor kidney transplant, performed by Dr. [**Last Name (STitle) 593**] and assisting by (...)
    DIAGNOSES: End stage renal disease, status post renal transplant. Arterial thrombosis.
    Deep venous thrombosis. Resolving hypertension.
         Table 1. Excerpt of a discharge summary from the MIMIC-II database.


    For topic modeling our collection of discharge summaries, we used a topic
modeling tool3 based on LDA. We used all 26,657 documents from MIMIC-II, 200
iterations of Gibbs sampling, and 10 topics, which we termed Topic A, Topic B...
Topic J. For each document, we were only interested in the two most prevalent
topics and their associated keywords (termed primary topic and secondary topic,
respectively). As a per-document summarization strategy, we used an in-house
improvement of TextRank. Table 2 shows the obtained keywords using both
topic modeling and TextRank summarization when applied to the free text from
Table 1.
    Our main goal was to evaluate how effective topic modeling using LDA and
keyword extraction using TextRank were, in comparison to submitting full texts
2
    http://mimic.physionet.org/database.html
3
    http://code.google.com/p/topic-modeling-tool/
4        Pablo López-García, Stefan Schulz, and Roman Kern

          Topic1                  Topic2              TextRank
          blood     patient       continued neumonia arterial   femoral q
          day       post          failure   pulmonary blood     good    renal
          discharge postoperative fluid     renal     day       history right
          history rate            negative started    disease   lower   transplant
          mg        status        patient tube        extremity normal ultrasound
Table 2. Keywords for document in Table 1 using topic modeling and TextRank.
Topic1 and Topic2 represent the primary and secondary topics, respectively. Matching
keywords using different methods are marked in bold.


to the recommender. For that purpose, we considered the recommender as a
black box and took a sample of 20 documents from the MIMIC-II database4 .
Figure 1 shows our approach for getting the recommendations in each case.


                      Topic Modeling                                 TextRank Full Text
             Doc.         Topic1 – W1           Topic2 – W2

              1       |       E – 0.382 | B – 0.251
              2       |       E – 0.419 | B – 0.374
                                                                       arterial
                                    ...
                                                                         blood
                                           Documents                       day
                                                                        disease
                                                                      extremity
             Topic                 Keywords                               good
                                                                        history
               A          | bid daily day ...                            (...)
                                  ...
               E          | blood day discharge ...
                                    ...
               J          | admission age blood ...

                                                   Topics     Ontology Recommender
             Topic                Recomm. Ontology
                  A       |          SNOMED CT
                                       ...
                  E       |               EHDA
                                          ...
                  J       |               EHDA

                                     Look-up Table
                                                              EHDA      EHDA      EHDA


       Fig. 1. Recommendations using topic modeling, TextRank, and full texts.


    For topic modeling, we submitted each topic’s keywords to the recommender
and stored the top recommended terminology for each topic, storing an asso-
ciation between topics, keywords, and recommended terminologies in a look-up
table for future use. For keyword extraction (TextRank), we submitted the key-
words representing the summary of each document from the sample. As gold
standard for comparison, we used the full text of each document. We applied a
limit of 7,000 characters to every document submitted, as our preliminary exper-
iments showed that the recommender was not able to process long documents.
4
    The first 20 documents retrieved by our PostgreSQL installation.
                 Automatic Summarization for Terminology Recommendation                     5

We recorded recommendation times, including pre-processing when applicable
(e.g., time spent summarizing a document using TextRank).

3     Results
The discovered keywords using topic modeling (Table 3) suggest several contexts
in the documents, such as: medication administration (A), cardiology (C, I),
and diagnostic tests (G). However, only two terminologies were recommended:
SNOMED CT and EHDA. Surprisingly, EHDA, focused on developmental stage-
specific anatomical structures of the human, was recommended for 7 of the 10
topics, including diagnostic tests (G).


  Keywords                                                                        Terminology
A bid daily day disp mg po refills sig tablet times                               SNOMED CT
B continued failure fluid negative patient pneumonia pulmonary renal started tube SNOMED CT
C aortic cm left mildly mitral normal regurgitation systolic valve ventricular       EHDA
D bilaterally ct discharge head hemorrhage history intact left normal patient        EHDA
E blood day discharge history mg patient post postoperative rate status              EHDA
F admission discharge history home hospital medications mg normal pain patient SNOMED CT
G blood ct glucose hct neg plt pm pt rbc wbc                                         EHDA
H chest contrast ct evidence fracture impression left pain small tube                EHDA
I artery cardiac chest coronary disease heart left mg pain patient                   EHDA
J admission age blood day discharge infant life normal respiratory weeks             EHDA
 Table 3. Topics, keywords, and recommended terminologies using topic modeling.


    Table 4 shows the primary and secondary topics and their weights in each
document from the sample. Topic E was the most frequent overall, appearing in
half of the documents.

      # Topic1    W1 Topic2    W2 W1 + W2      # Topic1    W1 Topic2    W2 W1 + W2
      1   E     0.382  B     0.251   0.663     11  F     0.541  G     0.220   0.761
      2   E     0.419  B     0.374   0.793     12  E     0.557  C     0.109   0.666
      3   J    0.907   E    0.055   0.962      13  C     0.274  A     0.274   0.548
      4   E     0.297  B     0.271   0.568     14  I    0.243   J    0.162   0.405
      5   A     0.667  F     0.211   0.878     15  E     0.318  D     0.171   0.489
      6   D     0.500  E     0.179   0.679     16  E     0.581  B     0.203   0.784
      7   F     0.619  D     0.157   0.776     17  I     0.327  G     0.261   0.588
      8   D     0.409  H     0.273   0.682     18  H     0.750  G     0.083   0.833
      9   F     0.229  G     0.217   0.446     19  E     0.384  F     0.282   0.666
      10  H     0.486  D     0.159   0.645     20  I     0.467  E     0.298   0.765
Table 4. Topics and associated weights. Maximum and minimum scores are in bold.


    Table 5 shows the recommended terminologies and scores when submitting
the full texts and their TextRank versions. When using the full texts, 4 different
terminologies were recommended, with SNOMED CT and EHDA recommended
for 85% of the documents. Using TextRank, a terminology not identified using
the full texts was suggested5 .
5
    Bone Dysplasia Ontology – http://bioportal.bioontology.org/ontologies/BDO
6      Pablo López-García, Stefan Schulz, and Roman Kern

     # Full Text Score TextRank Score    # Full Text Score TextRank Score
      1 EHDA    4378.80 EHDA   1303.18   11 SNOMED 1851.21 NCIT      124.93
      2 SNOMED 2539.19 SNOMED   114.75   12 EHDA    1453.65 RH-MESH   97.72
      3 SNOMED 1086.31 BDO      108.68   13 SNOMED 1306.00 EHDA      153.02
      4 EHDA    5436.65 EHDA   1846.39   14 EHDA     423.34 NCIT      35.81
      5 NCIT     183.62 SNOMED   37.12   15 RH-MESH 2068.63 EHDA      81.61
      6 RH-MESH 222.07 SNOMED    39.80   16 SNOMED 1783.57 SNOMED     97.44
      7 SNOMED 1983.63 SNOMED   146.16   17 EHDA    1734.18 EHDA     285.63
      8 EHDA    5926.30 EHDA    688.57   18 EHDA    5949.76 EHDA     884.94
      9 EHDA    2695.63 EHDA    612.06   19 SNOMED 1838.72 EHDA      137.71
     10 EHDA    4054.92 EHDA   1096.61   20 SNOMED 1979.53 EHDA     1150.17
    Table 5. Recommended terminologies and associated scores for the sample.


   Figure 2 shows the distribution of recommended terminologies. In all cases,
EHDA best represented the sample, followed by SNOMED CT. The correlation
between terminology distributions respect the gold standard (full texts) was very
high (r = 0.83–0.98). Topic modeling the MIMIC-II database took 5 minutes
17 seconds (11 ms per document) and 7 seconds per topic were spent getting
a recommendation. When submitting documents, a recommendation took 27
seconds per full text, 11 seconds using TextRank (including summarization).


Fig. 2. Distribution of recommended terminologies for the sample using full text,
TextRank, and Topic modeling (-w = weighted).
               Automatic Summarization for Terminology Recommendation            7

4   Discussion
EHDA and SNOMED CT were recommended for the majority (85%) of docu-
ments in the sample, EHDA being preferred. Why EHDA was the most recom-
mended terminology when submitting discharge summaries as context needs to
be carefully studied, as discharge summaries contain a broad range of topics (dis-
charge diagnoses, physical examinations, past and follow-up medications, etc.)
that are not covered by EHDA. Even in the case of anatomy, FMA [12] seems
more appropriate, as EHDA is focused mainly on tissue development [6]. Al-
though assessing the validity of the recommender was not the goal of our study,
the inexplicable prevalence of EHDA in the recommendations suggests possible
shortcomings in the recommender that would inevitably limit the significance of
our results.
    When analyzing performance, our results suggest that it might not be feasible
for users to submit a large number of documents as a representative context, as
getting recommendations for a sample of 20 documents with full texts (limited
to 7,000 characters) took nearly 10 minutes. The keywords obtained using topic
modeling were less in number than the keywords using TextRank. This should,
in principle, make recommendations using topic modeling less correlated to the
ones obtained using the full texts, but the opposite was true. This fact might be
explained because all 26,657 documents from MIMIC-II were used when model-
ing the topics, providing a much more accurate context.


5   Conclusions and Future work
In this study, we have proposed and evaluated two well-researched automatic
summarization techniques for summarizing a large collection of clinical docu-
ments used as input to the NCBO ontology recommender: topic modeling the
collection using LDA, and per-document TextRank keyword extraction. When
comparing both approaches to our gold standard (full texts) in the evaluation,
we found out that recommendation times improved considerably. In all cases,
the distributions of recommended terminologies were highly correlated with the
gold standard distribution (r = 0.83–0.98). The high correlation shows that both
TextRank and topic modeling are valuable techniques to summarize the context
provided by the full texts and boost recommendation performance without seri-
ously affecting the overall recommendation results.
    As future work, we plan to: (i) use a larger sample of documents to investigate
if our results are consistent, (ii) select a collection of documents from other
domain to generalize our results, and (iii) investigate potential quality issues in
the recommender, given the prevalent but inexplicable recommendations of the
EHDA terminology when submitting discharge summaries as input.
Acknowledgments
The authors thank H. Ziak, A. Rexha, G. Hammer, C. Martínez-Costa, M.
Kreuzthaler, and G. A. Uribe Gómez for their contributions; the NCBO for
providing the ontology recommender and Bioportal; and the MIT and the Beth
Israel Deaconess Medical Center for providing the MIMIC-II database. This work
was developed within the EEXCESS project funded by the European Union
FP7/2007-2013 under grant agreement number 600601.


                              Bibliography


 [1] Adomavicius, G., Tuzhilin, A.: Context-Aware Recommender Systems. In:
     Recommender Systems Handbook, pp. 217–253. Springer (2011)
 [2] Benson, T.: Principles of Health Interoperability. HL7 and SNOMED.
     Springer (2010)
 [3] Blei, D.M., et al.: Latent Dirichlet Allocation. Journal of Machine Learning
     Research 3, 993–1022 (2003)
 [4] Freitas, F., et al.: Survey of Current Terminologies and Ontologies in Biology
     and Medicine. RECIIS—Electronic Journal in Communication, Information
     and Innovation in Health 3(1), 7–18 (2009)
 [5] Hariri, N., et al.: Query-Driven Context Aware Recommendation. In: 7th
     ACM Conference on Recommender Systems. pp. 9–16. ACM (2013)
 [6] Hunter, A., et al.: An Ontology of Human Developmental Anatomy. Journal
     of Anatomy 203(4), 347–355 (2003)
 [7] Jonquet, C., et al.: Building a Biomedical Ontology Recommender Web
     Service. Journal of Biomedical Semantics 1(Suppl 1), S1 (2010)
 [8] Linden, G., et al.: Amazon.com Recommendations: Item-to-Item Collabo-
     rative Filtering. Internet Computing, IEEE 7(1), 76–80 (2003)
 [9] Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Confer-
     ence on Empirical Methods in NLP. pp. 404–411. ACL (2004)
[10] Musen, M., et al.: The National Center for Biomedical Ontology. Journal
     of the American Medical Informatics Association 19(2), 190–195 (2012)
[11] Noy, N.F., et al.: BioPortal: Ontologies and Integrated Data Resources at
     the Click of a Mouse. Nucleic Acids Research 37(suppl 2), W170–W173
     (2009)
[12] Rosse, C., Mejino Jr, J.L.: A Reference Ontology for Biomedical Informatics:
     the Foundational Model of Anatomy. Journal of Biomedical Informatics
     36(6), 478–500 (2003)
[13] Saeed, M., et al.: MIMIC-II: a Public-Access Intensive Care Unit Database.
     Critical Care Medicine 39(5), 952 (2011)
[14] Tan, H., Lambrix, P.: Selecting an Ontology for Biomedical Text Mining.
     In: Workshop on Current Trends in Biomedical NLP. pp. 55–62. ACL (2009)