A Joint Local-Global Approach for Medical Terminology
                         Assignment

                      Liqiang Nie                   Mohammad Akbari                                 Tao Li
                 National University of             National University of                   Zhejiang University
                      Singapore                          Singapore                       coylee917@gmail.com
             nieliqiang@gmail.com                   akbari@nus.edu.sg
                                                      Tat-Seng Chua
                                                    National University of
                                                         Singapore
                                                    chuats@nus.edu.sg

ABSTRACT
In community-based health services, vocabulary gap be-
tween health seekers and community generated knowledge
has hindered data access. To bridge this gap, this paper
presents a scheme to label question answer(QA) pairs by
jointly utilizing local mining and global learning approaches.
Local mining attempts to label individual QA pair by
independently extracting medical concepts from the QA pair
itself and mapping them to authenticated terminologies.
However, it may suﬀer from information loss and lower
precision, which are caused by the absence of key medical
concepts and presence of irrelevant medical concepts. Global
learning, on the other hand, works towards enhancing the              Figure 1: The illustration of a QA example from
local mining via collaboratively discovering missing key              community-based health services (HealthTap).
terminologies and keeping oﬀ the irrelevant terminologies
by analyzing the social neighbors. Practically, this unsuper-         health solutions in 2012 [4]. These metrics have reﬂected
vised scheme holds potential to large-scale data.                     the scope and scale of the online health seekers.
                                                                         To better serve the needs of health seekers, community-
Categories and Subject Descriptors                                    based health services have emerged as eﬀective platforms
                                                                      for health knowledge dissemination and exchange, such
J.3 [Life and Medical Sciences]: Health                               as HealthTap1 , HaoDF2 and WenZher[11].            They not
                                                                      only permit health seekers to freely post health-oriented
Keywords                                                              questions, but also encourage doctors to provide trustworthy
Community-based Health Services, Question Answers, Vo-                answers.    Figure 1 demonstrates one typical QA pair
cabulary Gap, Medical Terminology Assignment                          example. Over time, a tremendous number of QA pairs
                                                                      has been accumulated in their repositories, and in most
                                                                      circumstances, health seekers may directly locate good
1. BACKGROUND                                                         answers by searching from these archives, rather than
  The rise of digital technologies has transformed the                waiting for the experts’ responses or painfully browsing
patient-doctor relationships. Nowadays, when patients                 through a list of documents from the general search engines.
struggle with their health concerns, the majority usually
explore the Internet to research the problem before and
after they see their doctors. For example, 70% of Canadians           2.     CHALLENGES
turned to Internet to look up health-related information in             In many cases, the community generated health con-
2009 [8] and 72% of American Internet users searched for              tent may not be directly usable due to the vocabulary
                                                                      gap, since participants with diverse backgrounds do not
                                                                      necessarily share the same vocabulary. Take HealthTap
                                                                      as an example. The same question may be described in
                                                                      substantially diﬀerent ways by two individual health seekers.
                                                                      On the other hand, the answers provided by doctors may
                                                                      contain acronyms with multiple possible meanings, and non-
                                                                      standardized terms.
                                                                      1
Copyright is held by the author/owner(s).                                 https://www.healthtap.com/
                                                                      2
MedIR 2014, July 11, 2014, Gold Coast, Australia.                         www.haodf.com


                                                                 24
                                                 Local Mining Approach                                    Global Learning Approach
                                      Tri-stage Framework                   Local Mining Results
                                            Noun Phrase                     Likeliness Pregnancy                                         Terminology
                                                                                                            Lexical Similarities
                                                                                                                                       Sharing Network
                                             Extractor                          Birth Control
  Question:                                                                                                                                                Inter-expert
  What is the likeliness a               Medical Concept                           Pregnancy                                                               Relationship
  pregnancy could occur on                 Identifier                             Birth Control
  birth control?

  Answer 1:                              Medical Concept                    Unplanned Pregnancy
                                           Normalizer                          Contraception                                                            Hierarchical
  Answer 2:                                                                                                                                             Terminology
                                         Local Coding                            Corpus-aware                                                           Relationship
                                                                                  Vocabulary
                                      Unplanned Pregnancy   Contraception                                 Unplanned Pregnancy   Contraception   Uses Contraceptive Sheath
                                      What is the likeliness a pregnancy                                   What is the likeliness a pregnancy could occur on
                                      could occur on birth control?                                        birth control?


Figure 2: The schematic illustration of the proposed automatic medical terminology assignment scheme. The
answer part is not displayed due to the space limitation.

   In this work, we deﬁne medical concepts as medical                               terminologies of speciﬁc dataset and narrow down the
domain-speciﬁc noun phrases, and medical terminologies as                           candidates is the tough issue we are facing. In addition, the
authenticated phrases by well-known organizations that are                          varieties of heterogeneous cues were often not adequately
used to accurately describe the human body and associated                           exploited simultaneously. Therefore, a robust integrated
components, conditions and processes in a science-based                             framework to draw the strengths from various resources and
manner.      Even though some health communities have                               models is still expected.
recently suggested doctors to annotate their answers with
medical concepts, we cannot ensure that they are medical                            3.     METHOD
terminologies. Meanwhile, the tags adopted by doctors
                                                                                      To overcome these limitations, we propose a novel scheme
often vary greatly [3]. For example, “heart attack“ and
                                                                                    that is able to code the QA pairs with corpus-aware
“myocardial disorder“ are employed by diﬀerent doctors to
                                                                                    terminologies. As illustrated in Figure 2, the proposed
refer to the same medical diagnosis. It was shown that the
                                                                                    scheme consists of two mutually reinforced components,
inconsistency of community generated health data greatly
                                                                                    namely, local mining and global learning.
hindered the cross-resource data exchange, management and
integrity [9]. Even worse, it was reported that users had en-                       3.1     Local Mining
countered big challenges in reusing the archived content due
                                                                                       Local mining aims to locally code the QA pairs by
to the incompatibility between their search terms and those
                                                                                    extracting the medical concepts from individual instance
accumulated medical records [21]. Therefore, automatic
                                                                                    and then mapping them to terminologies based on the
coding of the QA pairs with standardized terminologies is
                                                                                    external authenticated vocabularies. To accomplish this
highly desired. It leads to a consistent interoperable way
                                                                                    task, we establish a tri-stage framework, which includes
of indexing, storing and aggregating across specialties and
                                                                                    noun phrase extraction, medical concept detection and
sites. In addition, it facilitates QA pair retrieval via bridging
                                                                                    medical concept normalization.
the vocabulary gap between the queries and archives by
                                                                                       To extract all the noun phrases, we initially assign part-
coding the new queries with the standardized terminologies.
                                                                                    of-speech tags to each word in the given QA pair by Stanford
   It is worth mentioning that there already exist several
                                                                                    POS tagger3 . We then extract tag sequences that match a
eﬀorts dedicated to research on automatically mapping
                                                                                    ﬁxed pattern of part-of-speech tags as noun phrases from
medical records to terminologies [19, 2, 10, 7, 17]. Most of
                                                                                    the texts. This pattern is formulated as follows.
these eﬀorts, however, focused on hospital generated health
data or health provider released sources by utilizing either                                    (Adjective|N oun)∗ (N oun            P reposition)                    (1)
isolated or loosely coupled rule-based and machine learning
                                                                                                ?(Adjective|N oun)∗ N oun.
approaches. Compared to this kind of data, the emerging
community generated health data is more colloquial, in                              A sequence of tags matching this pattern ensures that the
terms of inconsistency, complexity and ambiguity, which                             corresponding words make up a noun phrase. For example,
pose challenges for data access and analytics. Further,                             the following complex sequence can be extracted as a noun
most of the previous work simply utilizes the external                              phrase: “ineﬀective treatment of terminal lung cancer”.
medical dictionary to code the medical records rather than                            Inspired by the eﬀorts in [18, 6], in order to diﬀerentiate
considering the corpus-aware terminologies. Their reliance                          the medical concepts from other general noun phrases,
on the external corpus independent knowledge may poten-                             we assume that concepts that are relevant to medical
tially bring in inappropriate terminologies. Constructing a                         domain occur frequently in medical domain and rarely in
corpus-aware terminology vocabulary to prune the irrelevant                         3
                                                                                        http://nlp.stanford.edu/software/tagger.shtml


                                                                            25
non-medical ones. Based on this assumption, we employ                       locally mined terminologies. The target of global learning
the concept entropy impurity (CEI) [6] to comparatively                     is to learn appropriate terminologies from the global ter-
measure the domain-relevance of a concept by comparing                      minology space T to annotate each q in Q. In this work,
the term frequencies between two diﬀerent corpora D1 and                    the global learning task is regarded as a multi-label learning
D2 . D1 is our medical-domain corpus and D2 is a general                    problem[16]. It is formulated as,
English Gigaword data of Linguistic Data Consortium4 .                                     {                                        }
   As aforementioned, we cannot ensure that all medical con-                           ∑M                         ∑
                                                                                                                  M
                                                                               arg min       Ω(fi ) + λL(fi ) + µ   Rij ∥fi − fj ∥ , (2)
                                                                                                                                  2
cepts are standardized terminologies. Take “birth control”                           F
                                                                                         i=1                    j=1
as an example. It is recognized as a medical concept by
our approach, but it is not an authenticated terminology.                   where M refers to the number of classes, i.e., the number
Instead, we should map it into “contraception”. Therefore,                  of medical terminologies to be assigned. Vector fi is
it is essential to normalize the detected medical concepts ac-              the ith column of F, representing the relevance scores of
cording to an appropriate external standardized dictionary                  each QA pair to the i-th terminology. Ω(f) and L(f)
and this normalization is the key to bridging the vocabulary                denotes the regularizer on the hypergraph and empirical
gap. In this work, we use SNOMED CT5 as our dictionary,                     loss, respectively. In addition, Rij is the inter-terminology
since it provides the core general terminologies for the                    relationship between terminology i and terminology j. They
electronic health record and formal logic-based hierarchical                are mined by exploiting the external well-structured ontol-
structure. The terminologies and their descriptions in                      ogy, which are able to alleviate the granularity mismatch
SNOMED CT are ﬁrst indexed6 . We then search each                           problems and reduce the irrelevant sibling terminologies. By
medical concept against the indexed SNOMED CT. For                          diﬀerentiating the above equation with respect to F, we can
the medical concepts with multiple matched results, e.g.,                   obtain a closed-form solution.
two results returned for “female”, we keep all the returned                    The philosophy to formulate these three objectives is as
terminology candidates for further selection. Enlightened                   follows. The ﬁrst objective aims to guarantee that the
by Google distance [1], we estimate the semantic similarity                 relevance probability function is continuous and smooth in
between the medical concept and the returned terminology                    semantic space. This means that the relevance probabilities
candidates via exploring their co-occurrence on Google. We                  of semantically similar QA pairs should be close to each
then select the most relevant terminology candidate as the                  other. The second objective is ensured by the empirical
normalized result.                                                          loss function, which forces the relevance probabilities to ap-
   Local mining, however, may suﬀer from various problems.                  proach the initial roughly estimated relevance scores. These
The ﬁrst problem is incompleteness. This is because some                    two implicit constraints are widely adopted in reranking-
key medical concepts may not explicitly present in the QA                   oriented approaches [12, 13, 14, 15]. The last encourages
pairs. The QA pair illustrated in Figure 2 shows an example                 the values of QA pairs, which are connected by hierarchical
of this situation, where the accurate terminology: “use                     structured terminologies, to be similar to each other.
contraceptive sheath” is absent from the QA pair. The                          When it comes to hypergraph construction, the N QA
second one is the lower precision. This is due to some                      pairs from Q are regarded as vertices and they are connected
irrelevant medical concepts explicitly embedded in the QA                   by three types of hyperedges. The ﬁrst type takes each
pairs, and are mistakenly detected and normalized by the                    vertex as a centroid and forms a hyperedge by circling
local approach. For instance, given the question, “What are                 around its k-nearest neighbors based on QA pair content
the risks getting pregnant and giving birth later in life ? ”,              similarities. This procedure was ﬁrst adopted in [5]. The
the terminology “ﬁnding of life event” as normalized from                   second type is based on terminology-sharing network. For
the irrelevant medical concept “life” is assigned as code. It               each terminology, it groups all the QA pairs sharing the
is less informative to capture the main intent.                             same terminology together. The third type actually takes
                                                                            the users’ social behaviours into consideration by rounding
3.2 Global Learning                                                         up all the questions answered by closely associated doctors.
   It is noteworthy that most previous eﬀorts, including our                The inter-doctor relationships are inferred from the doctors’
local approach, attempted to map the QA pairs directly                      historical data. Speciﬁcally, doctors who are frequently
to the entries in external dictionaries without any pruning.                respond to the same kinds of questions probably share
This approach often presents problems since the external                    highly overlapping expertise, and thus the questions they
dictionaries usually cover relatively comprehensive termi-                  answered can be regarded as semantically similar to a certain
nologies and are far beyond the vocabulary scope of the given               extent. As a consequence, up to N + M + U hyperedges
corpus. It may result in the deterioration in coding perfor-                are constructed in our hypergraph, where U is the num-
mance in terms of eﬃciency and eﬀectiveness. The problem                    ber of involved doctors. Learning from this hypergraph,
is caused by the over-widened scope of vocabularies, which                  we are able to ﬁnd missing key concepts and propagate
may bring in unpredictable noises and make the precise                      precise terminologies among underlying connected records
terminology selection challenging. As a byproduct, a corpus-                over a large collection. Besides the semantic similarity
aware terminology vocabulary is naturally constructed by                    among QA pairs and terminology-sharing network, the inter-
our local mining approach, which can be used as terminology                 terminology and inter-expert relationships are seamlessly
space for further learning.                                                 integrated in the proposed model. It is noteworthy that
   Let Q = {q1 , q2 , ..., qN } and T = {t1 , t2 , ..., tM } respec-        a rich set of healthcare speciﬁc features are extracted and
tively denote a repository of QA pairs and their associated                 weighted for similarity estimation.
4
  http://www.ldc.upenn.edu/
5
  http://www.ihtsdo.org/snomed-ct/                                          4.     EXPERIMENTS
6
  http://viw2.vetmed.vt.edu/sct/menu.cfm                                         We crawled more than 109 thousand QA pairs from


                                                                       26
Table 1: The comparative evaluation results of medical terminology assignment in terms of S@K and P @K.
  ```
        ``` Metric
               ```         S@1       S@2        S@3       S@4      P@1      P@2       P@3      P@4
   Apporach         `
                    `
        LocalMining       72.0%     84.0%      91.0%     95.0%    72.0%    72.1%     69.7%    68.3%
      Local+Global       83.0%     92.0%      98.0%     100.0%   83.0%    81.5%      80.3%   78.8%

Table 2: Comparative illustration of the representative question samples with locally mined terminologies
and locally+globally recommended terminologies. Answers are not displayed due to limited space.
                QA pairs              Locally Mined Terminologies               Local Mining + Global Learning
                                  hair structure, dyed hair, feeling safe, hair structure, patient currently pregnant,
      Is it safe to color my hair
                                        patient currently pregnant,            coal tar allergy, hair color change,
          during pregnancy ?
                                        ﬁrst trimester pregnancy...              disorder of endocrine system...
     If I get an infection caused  infectious disease, gingival disease,    infectious disease, prematurity of fetus,
    by gum disease, can that be         entire fetus, inﬂammation,            gingival disease, periodontal disease
      transferred to my fetus ?            periodontal disease...                   low birth weight infant...

HealthTap, which involve 5, 958 unique doctors. For ground              the unstructured medical content into user needs-aware
truth construction, we invited three professionals with                 ontology by the recommended medical terminologies.
master degrees majored in medicine programme. The
labelers were trained with a short tutorial and a set of                6.     ACKNOWLEDGEMENTS
demonstrating examples. A majority voting scheme among
                                                                          This work was supported by NUS-Tsinghua Extreme
the three labelers can partially alleviate the subjectivity
                                                                        Search project under the grant number: R-252-300-001-490.
problem. The annotators were required to label only top
ﬁve recommended terminologies for each QA pair, and they
were labeled either as “positive” or “negative”. 100 QA pairs           7.[1] R.REFERENCES
                                                                                L. Cilibrasi and P. M. B. Vitanyi. The google similarity distance. IEEE
were labeled as testing set.                                                 Transactions on Knowledge and Data Engineering, 2007.
                                                                         [2] C. Dozier, R. Kondadadi, K. Al-Kofahi, M. Chaudhary, and X. Guo. Fast
   We adopted two metrics that are able to characterize                      tagging of medical terms in legal text. In Proceedings of the International
                                                                             Conference on Artificial Intelligence and Law, 2007.
precisions from diﬀerent aspects. The ﬁrst is average S@K                [3] A. e-HIM Work Group on Computer-Assisted Coding. Delving into
over all testing QA pairs, which measures the probability                    computer-assisted coding. Journal of American Health Information Management
                                                                             Association, 2004.
of ﬁnding a relevant terminology among the top K recom-                  [4] S. Fox and M. Duggan. Health online 2013. Survey, Pew Research Center,
                                                                             2013.
mended ones. To be speciﬁc, for each testing QA pair, S@K                [5] Y. Huang, Q. Liu, S. Zhang, and D. Metaxas. Image retrieval via
is assigned to 1 if a relevant terminology is positioned in the              probabilistic hypergraph ranking. In IEEE Conference on Computer Vision and
                                                                             Pattern Recognition, 2010.
top K and 0 otherwise. The second one is average P @K that               [6] M.-Y. Kim and R. Goebel. Detection and normalization of medical terms
                                                                             using domain-specific term frequency and adaptive ranking. In Information
stands for the proportion of recommended terminologies                       Technology and Applications in Biomedicine, IEEE International Conference on,
that are relevant[20]. P @K is deﬁned as P @K = |C∩R|  |C|
                                                                             2010.
                                                                         [7] L. S. Larkey and W. B. Croft. Automatic assignment of icd9 codes to
   where C is a set of the top K terminologies, and R is the                 discharge summaries. PhD Thesis, University of Massachusetts at Amherst, 1995.
                                                                         [8] M. Law. Online drug information in canada. Technical report, 2012.
manually labeled positive ones.                                          [9] G. Leroy and H. Chen. Meeting medical terminology needs-the
                                                                             ontology-enhanced medical concept mapper. IEEE Transactions on Information
   Table 1 displays the comparison. We can see that the                      Technology in Biomedicine, 2001.
local mining approach achieves the worst performance. This              [10] L. V. Lita, S. Yu, S. Niculescu, and J. Bi. Large scale diagnostic code
                                                                             classification for medical patient records. In Proceedings of the Conference on
is reasonable, because irrelevant concepts may be mapped                     Artificial Intelligence in Medicine, 1995.
to terminologies because of their presence in the QA pairs.             [11] L. Nie, T. Li, M. Akbari, J. Shen, and T.-S. Chua. Wenzher:
                                                                             Comprehensive vertical search for healthcare domain. In Proceedings of the
   Table 2 comparatively illustrates the representative QA                   International ACM SIGIR Conference, 2014.
                                                                        [12] L. Nie, M. Wang, Y. Gao, Z.-J. Zha, and T.-S. Chua. Beyond text qa:
pair samples with locally minded terminologies and local-                    Multimedia answer generation by harvesting web information. IEEE
                                                                             Transactions on Multimedia, 2013.
ly+globally recommended ones. Intuitively, the terminolo-               [13] L. Nie, M. Wang, Z.-J. Zha, and T.-S. Chua. Oracle in image search: A
gies are more comprehensive and reliable after enhancement                   content-based approach to performance prediction. ACM Transactions on
                                                                             Information System, 2012.
with global learning.                                                   [14] L. Nie, M. Wang, Z.-J. Zha, G. Li, and T.-S. Chua. Multimedia answering:
                                                                             Enriching text qa with media information. In Proceedings of the International
                                                                             ACM SIGIR Conference, 2011.
                                                                        [15] L. Nie, S. Yan, M. Wang, R. Hong, and T.-S. Chua. Harvesting visual
5. CONCLUSIONS AND FUTURE WORK                                               concepts for image search with complex queries. In Proceedings of the
                                                                             International Conference on Multimedia, 2012.
   This paper presented a medical terminology assignment                [16] L. Nie, Y.-L. Zhao, X. Wang, J. Shen, and T.-S. Chua. Learning to
                                                                             recommend descriptive tags for questions in social forums. ACM
scheme to bridge the vocabulary gap between health seekers                   Transactions on Information System, 2014.
and community generated knowledge. A strong uniﬁed                      [17] H. Suominen, F. Ginter, S. Pyysalo, A. Airola, T. Pahikkala, S. Salanterä,
                                                                             and T. Salakoski. Machine learning to automate the assignment of
framework of local mining and global learning is proposed to                 diagnosis codes to free-text radiology reports: a method description. In
                                                                             Proceedings of the ICML Workshop on Machine Learning for Health-Care
tackle this research issue, instead of the conventional isolated             Applications, 2008.
utilization.    It proposes the concept entropy impurity                [18] P. Velardi, M. Missikoff, and R. Basili. Identification of relevant terms to
                                                                             support the construction of domain ontologies. In Proceedings of the workshop
approach to comparatively detect and normalize the medical                   on Human Language Technology and Knowledge Management, 2001.
                                                                        [19] L. Yves A., S. Lyudmila, and F. Carol. Automating icd-9-cm encoding
concepts locally, which naturally construct a corpus-aware                   using medical language processing: A feasibility study. In Proceedings of the
                                                                             AMIA Annual Symposium, 2000.
terminology vocabulary with the help of external knowledge.             [20] Y.-L. Zhao, L. Nie, X. Wang, and T.-S. Chua. Personalized
In addition, it builds a novel global learning model to                      recommendations of locally interesting venues to tourists via cross region
                                                                             community matching. ACM Transactions on Intelligent Systems and Technology,
enhance the local coding results. This model seamlessly                      2013.
integrates various heterogeneous cues.                                  [21] G. Zuccon, B. Koopman, A. Nguyen, D. Vickers, and L. Butt. Exploiting
                                                                             medical hierarchies for concept-based information retrieval. In Proceedings
   In the future, we will investigate how to ﬂexibly organize                of the Seventeenth Australasian Document Computing Symposium, 2012.


                                                                   27