A Joint Local-Global Approach for Medical Terminology Assignment Liqiang Nie Mohammad Akbari Tao Li National University of National University of Zhejiang University Singapore Singapore coylee917@gmail.com nieliqiang@gmail.com akbari@nus.edu.sg Tat-Seng Chua National University of Singapore chuats@nus.edu.sg ABSTRACT In community-based health services, vocabulary gap be- tween health seekers and community generated knowledge has hindered data access. To bridge this gap, this paper presents a scheme to label question answer(QA) pairs by jointly utilizing local mining and global learning approaches. Local mining attempts to label individual QA pair by independently extracting medical concepts from the QA pair itself and mapping them to authenticated terminologies. However, it may suffer from information loss and lower precision, which are caused by the absence of key medical concepts and presence of irrelevant medical concepts. Global learning, on the other hand, works towards enhancing the Figure 1: The illustration of a QA example from local mining via collaboratively discovering missing key community-based health services (HealthTap). terminologies and keeping off the irrelevant terminologies by analyzing the social neighbors. Practically, this unsuper- health solutions in 2012 [4]. These metrics have reflected vised scheme holds potential to large-scale data. the scope and scale of the online health seekers. To better serve the needs of health seekers, community- Categories and Subject Descriptors based health services have emerged as effective platforms for health knowledge dissemination and exchange, such J.3 [Life and Medical Sciences]: Health as HealthTap1 , HaoDF2 and WenZher[11]. They not only permit health seekers to freely post health-oriented Keywords questions, but also encourage doctors to provide trustworthy Community-based Health Services, Question Answers, Vo- answers. Figure 1 demonstrates one typical QA pair cabulary Gap, Medical Terminology Assignment example. Over time, a tremendous number of QA pairs has been accumulated in their repositories, and in most circumstances, health seekers may directly locate good 1. BACKGROUND answers by searching from these archives, rather than The rise of digital technologies has transformed the waiting for the experts’ responses or painfully browsing patient-doctor relationships. Nowadays, when patients through a list of documents from the general search engines. struggle with their health concerns, the majority usually explore the Internet to research the problem before and after they see their doctors. For example, 70% of Canadians 2. CHALLENGES turned to Internet to look up health-related information in In many cases, the community generated health con- 2009 [8] and 72% of American Internet users searched for tent may not be directly usable due to the vocabulary gap, since participants with diverse backgrounds do not necessarily share the same vocabulary. Take HealthTap as an example. The same question may be described in substantially different ways by two individual health seekers. On the other hand, the answers provided by doctors may contain acronyms with multiple possible meanings, and non- standardized terms. 1 Copyright is held by the author/owner(s). https://www.healthtap.com/ 2 MedIR 2014, July 11, 2014, Gold Coast, Australia. www.haodf.com 24 Local Mining Approach Global Learning Approach Tri-stage Framework Local Mining Results Noun Phrase Likeliness Pregnancy Terminology Lexical Similarities Sharing Network Extractor Birth Control Question: Inter-expert What is the likeliness a Medical Concept Pregnancy Relationship pregnancy could occur on Identifier Birth Control birth control? Answer 1: Medical Concept Unplanned Pregnancy Normalizer Contraception Hierarchical Answer 2: Terminology Local Coding Corpus-aware Relationship Vocabulary Unplanned Pregnancy Contraception Unplanned Pregnancy Contraception Uses Contraceptive Sheath What is the likeliness a pregnancy What is the likeliness a pregnancy could occur on could occur on birth control? birth control? Figure 2: The schematic illustration of the proposed automatic medical terminology assignment scheme. The answer part is not displayed due to the space limitation. In this work, we define medical concepts as medical terminologies of specific dataset and narrow down the domain-specific noun phrases, and medical terminologies as candidates is the tough issue we are facing. In addition, the authenticated phrases by well-known organizations that are varieties of heterogeneous cues were often not adequately used to accurately describe the human body and associated exploited simultaneously. Therefore, a robust integrated components, conditions and processes in a science-based framework to draw the strengths from various resources and manner. Even though some health communities have models is still expected. recently suggested doctors to annotate their answers with medical concepts, we cannot ensure that they are medical 3. METHOD terminologies. Meanwhile, the tags adopted by doctors To overcome these limitations, we propose a novel scheme often vary greatly [3]. For example, “heart attack“ and that is able to code the QA pairs with corpus-aware “myocardial disorder“ are employed by different doctors to terminologies. As illustrated in Figure 2, the proposed refer to the same medical diagnosis. It was shown that the scheme consists of two mutually reinforced components, inconsistency of community generated health data greatly namely, local mining and global learning. hindered the cross-resource data exchange, management and integrity [9]. Even worse, it was reported that users had en- 3.1 Local Mining countered big challenges in reusing the archived content due Local mining aims to locally code the QA pairs by to the incompatibility between their search terms and those extracting the medical concepts from individual instance accumulated medical records [21]. Therefore, automatic and then mapping them to terminologies based on the coding of the QA pairs with standardized terminologies is external authenticated vocabularies. To accomplish this highly desired. It leads to a consistent interoperable way task, we establish a tri-stage framework, which includes of indexing, storing and aggregating across specialties and noun phrase extraction, medical concept detection and sites. In addition, it facilitates QA pair retrieval via bridging medical concept normalization. the vocabulary gap between the queries and archives by To extract all the noun phrases, we initially assign part- coding the new queries with the standardized terminologies. of-speech tags to each word in the given QA pair by Stanford It is worth mentioning that there already exist several POS tagger3 . We then extract tag sequences that match a efforts dedicated to research on automatically mapping fixed pattern of part-of-speech tags as noun phrases from medical records to terminologies [19, 2, 10, 7, 17]. Most of the texts. This pattern is formulated as follows. these efforts, however, focused on hospital generated health data or health provider released sources by utilizing either (Adjective|N oun)∗ (N oun P reposition) (1) isolated or loosely coupled rule-based and machine learning ?(Adjective|N oun)∗ N oun. approaches. Compared to this kind of data, the emerging community generated health data is more colloquial, in A sequence of tags matching this pattern ensures that the terms of inconsistency, complexity and ambiguity, which corresponding words make up a noun phrase. For example, pose challenges for data access and analytics. Further, the following complex sequence can be extracted as a noun most of the previous work simply utilizes the external phrase: “ineffective treatment of terminal lung cancer”. medical dictionary to code the medical records rather than Inspired by the efforts in [18, 6], in order to differentiate considering the corpus-aware terminologies. Their reliance the medical concepts from other general noun phrases, on the external corpus independent knowledge may poten- we assume that concepts that are relevant to medical tially bring in inappropriate terminologies. Constructing a domain occur frequently in medical domain and rarely in corpus-aware terminology vocabulary to prune the irrelevant 3 http://nlp.stanford.edu/software/tagger.shtml 25 non-medical ones. Based on this assumption, we employ locally mined terminologies. The target of global learning the concept entropy impurity (CEI) [6] to comparatively is to learn appropriate terminologies from the global ter- measure the domain-relevance of a concept by comparing minology space T to annotate each q in Q. In this work, the term frequencies between two different corpora D1 and the global learning task is regarded as a multi-label learning D2 . D1 is our medical-domain corpus and D2 is a general problem[16]. It is formulated as, English Gigaword data of Linguistic Data Consortium4 . { } As aforementioned, we cannot ensure that all medical con- ∑M ∑ M arg min Ω(fi ) + λL(fi ) + µ Rij ∥fi − fj ∥ , (2) 2 cepts are standardized terminologies. Take “birth control” F i=1 j=1 as an example. It is recognized as a medical concept by our approach, but it is not an authenticated terminology. where M refers to the number of classes, i.e., the number Instead, we should map it into “contraception”. Therefore, of medical terminologies to be assigned. Vector fi is it is essential to normalize the detected medical concepts ac- the ith column of F, representing the relevance scores of cording to an appropriate external standardized dictionary each QA pair to the i-th terminology. Ω(f) and L(f) and this normalization is the key to bridging the vocabulary denotes the regularizer on the hypergraph and empirical gap. In this work, we use SNOMED CT5 as our dictionary, loss, respectively. In addition, Rij is the inter-terminology since it provides the core general terminologies for the relationship between terminology i and terminology j. They electronic health record and formal logic-based hierarchical are mined by exploiting the external well-structured ontol- structure. The terminologies and their descriptions in ogy, which are able to alleviate the granularity mismatch SNOMED CT are first indexed6 . We then search each problems and reduce the irrelevant sibling terminologies. By medical concept against the indexed SNOMED CT. For differentiating the above equation with respect to F, we can the medical concepts with multiple matched results, e.g., obtain a closed-form solution. two results returned for “female”, we keep all the returned The philosophy to formulate these three objectives is as terminology candidates for further selection. Enlightened follows. The first objective aims to guarantee that the by Google distance [1], we estimate the semantic similarity relevance probability function is continuous and smooth in between the medical concept and the returned terminology semantic space. This means that the relevance probabilities candidates via exploring their co-occurrence on Google. We of semantically similar QA pairs should be close to each then select the most relevant terminology candidate as the other. The second objective is ensured by the empirical normalized result. loss function, which forces the relevance probabilities to ap- Local mining, however, may suffer from various problems. proach the initial roughly estimated relevance scores. These The first problem is incompleteness. This is because some two implicit constraints are widely adopted in reranking- key medical concepts may not explicitly present in the QA oriented approaches [12, 13, 14, 15]. The last encourages pairs. The QA pair illustrated in Figure 2 shows an example the values of QA pairs, which are connected by hierarchical of this situation, where the accurate terminology: “use structured terminologies, to be similar to each other. contraceptive sheath” is absent from the QA pair. The When it comes to hypergraph construction, the N QA second one is the lower precision. This is due to some pairs from Q are regarded as vertices and they are connected irrelevant medical concepts explicitly embedded in the QA by three types of hyperedges. The first type takes each pairs, and are mistakenly detected and normalized by the vertex as a centroid and forms a hyperedge by circling local approach. For instance, given the question, “What are around its k-nearest neighbors based on QA pair content the risks getting pregnant and giving birth later in life ? ”, similarities. This procedure was first adopted in [5]. The the terminology “finding of life event” as normalized from second type is based on terminology-sharing network. For the irrelevant medical concept “life” is assigned as code. It each terminology, it groups all the QA pairs sharing the is less informative to capture the main intent. same terminology together. The third type actually takes the users’ social behaviours into consideration by rounding 3.2 Global Learning up all the questions answered by closely associated doctors. It is noteworthy that most previous efforts, including our The inter-doctor relationships are inferred from the doctors’ local approach, attempted to map the QA pairs directly historical data. Specifically, doctors who are frequently to the entries in external dictionaries without any pruning. respond to the same kinds of questions probably share This approach often presents problems since the external highly overlapping expertise, and thus the questions they dictionaries usually cover relatively comprehensive termi- answered can be regarded as semantically similar to a certain nologies and are far beyond the vocabulary scope of the given extent. As a consequence, up to N + M + U hyperedges corpus. It may result in the deterioration in coding perfor- are constructed in our hypergraph, where U is the num- mance in terms of efficiency and effectiveness. The problem ber of involved doctors. Learning from this hypergraph, is caused by the over-widened scope of vocabularies, which we are able to find missing key concepts and propagate may bring in unpredictable noises and make the precise precise terminologies among underlying connected records terminology selection challenging. As a byproduct, a corpus- over a large collection. Besides the semantic similarity aware terminology vocabulary is naturally constructed by among QA pairs and terminology-sharing network, the inter- our local mining approach, which can be used as terminology terminology and inter-expert relationships are seamlessly space for further learning. integrated in the proposed model. It is noteworthy that Let Q = {q1 , q2 , ..., qN } and T = {t1 , t2 , ..., tM } respec- a rich set of healthcare specific features are extracted and tively denote a repository of QA pairs and their associated weighted for similarity estimation. 4 http://www.ldc.upenn.edu/ 5 http://www.ihtsdo.org/snomed-ct/ 4. EXPERIMENTS 6 http://viw2.vetmed.vt.edu/sct/menu.cfm We crawled more than 109 thousand QA pairs from 26 Table 1: The comparative evaluation results of medical terminology assignment in terms of S@K and P @K. ``` ``` Metric ``` S@1 S@2 S@3 S@4 P@1 P@2 P@3 P@4 Apporach ` ` LocalMining 72.0% 84.0% 91.0% 95.0% 72.0% 72.1% 69.7% 68.3% Local+Global 83.0% 92.0% 98.0% 100.0% 83.0% 81.5% 80.3% 78.8% Table 2: Comparative illustration of the representative question samples with locally mined terminologies and locally+globally recommended terminologies. Answers are not displayed due to limited space. QA pairs Locally Mined Terminologies Local Mining + Global Learning hair structure, dyed hair, feeling safe, hair structure, patient currently pregnant, Is it safe to color my hair patient currently pregnant, coal tar allergy, hair color change, during pregnancy ? first trimester pregnancy... disorder of endocrine system... If I get an infection caused infectious disease, gingival disease, infectious disease, prematurity of fetus, by gum disease, can that be entire fetus, inflammation, gingival disease, periodontal disease transferred to my fetus ? periodontal disease... low birth weight infant... HealthTap, which involve 5, 958 unique doctors. For ground the unstructured medical content into user needs-aware truth construction, we invited three professionals with ontology by the recommended medical terminologies. master degrees majored in medicine programme. The labelers were trained with a short tutorial and a set of 6. ACKNOWLEDGEMENTS demonstrating examples. A majority voting scheme among This work was supported by NUS-Tsinghua Extreme the three labelers can partially alleviate the subjectivity Search project under the grant number: R-252-300-001-490. problem. The annotators were required to label only top five recommended terminologies for each QA pair, and they were labeled either as “positive” or “negative”. 100 QA pairs 7.[1] R.REFERENCES L. Cilibrasi and P. M. B. Vitanyi. The google similarity distance. IEEE were labeled as testing set. Transactions on Knowledge and Data Engineering, 2007. [2] C. Dozier, R. Kondadadi, K. Al-Kofahi, M. Chaudhary, and X. Guo. Fast We adopted two metrics that are able to characterize tagging of medical terms in legal text. In Proceedings of the International Conference on Artificial Intelligence and Law, 2007. precisions from different aspects. The first is average S@K [3] A. e-HIM Work Group on Computer-Assisted Coding. Delving into over all testing QA pairs, which measures the probability computer-assisted coding. Journal of American Health Information Management Association, 2004. of finding a relevant terminology among the top K recom- [4] S. Fox and M. Duggan. Health online 2013. Survey, Pew Research Center, 2013. mended ones. To be specific, for each testing QA pair, S@K [5] Y. Huang, Q. Liu, S. Zhang, and D. Metaxas. Image retrieval via is assigned to 1 if a relevant terminology is positioned in the probabilistic hypergraph ranking. In IEEE Conference on Computer Vision and Pattern Recognition, 2010. top K and 0 otherwise. The second one is average P @K that [6] M.-Y. Kim and R. Goebel. Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking. In Information stands for the proportion of recommended terminologies Technology and Applications in Biomedicine, IEEE International Conference on, that are relevant[20]. P @K is defined as P @K = |C∩R| |C| 2010. [7] L. S. Larkey and W. B. Croft. Automatic assignment of icd9 codes to where C is a set of the top K terminologies, and R is the discharge summaries. PhD Thesis, University of Massachusetts at Amherst, 1995. [8] M. Law. Online drug information in canada. Technical report, 2012. manually labeled positive ones. [9] G. Leroy and H. Chen. Meeting medical terminology needs-the ontology-enhanced medical concept mapper. IEEE Transactions on Information Table 1 displays the comparison. We can see that the Technology in Biomedicine, 2001. local mining approach achieves the worst performance. This [10] L. V. Lita, S. Yu, S. Niculescu, and J. Bi. Large scale diagnostic code classification for medical patient records. In Proceedings of the Conference on is reasonable, because irrelevant concepts may be mapped Artificial Intelligence in Medicine, 1995. to terminologies because of their presence in the QA pairs. [11] L. Nie, T. Li, M. Akbari, J. Shen, and T.-S. Chua. Wenzher: Comprehensive vertical search for healthcare domain. In Proceedings of the Table 2 comparatively illustrates the representative QA International ACM SIGIR Conference, 2014. [12] L. Nie, M. Wang, Y. Gao, Z.-J. Zha, and T.-S. Chua. Beyond text qa: pair samples with locally minded terminologies and local- Multimedia answer generation by harvesting web information. IEEE Transactions on Multimedia, 2013. ly+globally recommended ones. Intuitively, the terminolo- [13] L. Nie, M. Wang, Z.-J. Zha, and T.-S. Chua. Oracle in image search: A gies are more comprehensive and reliable after enhancement content-based approach to performance prediction. ACM Transactions on Information System, 2012. with global learning. [14] L. Nie, M. Wang, Z.-J. Zha, G. Li, and T.-S. Chua. Multimedia answering: Enriching text qa with media information. In Proceedings of the International ACM SIGIR Conference, 2011. [15] L. Nie, S. Yan, M. Wang, R. Hong, and T.-S. Chua. Harvesting visual 5. CONCLUSIONS AND FUTURE WORK concepts for image search with complex queries. In Proceedings of the International Conference on Multimedia, 2012. This paper presented a medical terminology assignment [16] L. Nie, Y.-L. Zhao, X. Wang, J. Shen, and T.-S. Chua. Learning to recommend descriptive tags for questions in social forums. ACM scheme to bridge the vocabulary gap between health seekers Transactions on Information System, 2014. and community generated knowledge. A strong unified [17] H. Suominen, F. Ginter, S. Pyysalo, A. Airola, T. Pahikkala, S. Salanterä, and T. Salakoski. Machine learning to automate the assignment of framework of local mining and global learning is proposed to diagnosis codes to free-text radiology reports: a method description. In Proceedings of the ICML Workshop on Machine Learning for Health-Care tackle this research issue, instead of the conventional isolated Applications, 2008. utilization. It proposes the concept entropy impurity [18] P. Velardi, M. Missikoff, and R. Basili. Identification of relevant terms to support the construction of domain ontologies. In Proceedings of the workshop approach to comparatively detect and normalize the medical on Human Language Technology and Knowledge Management, 2001. [19] L. Yves A., S. Lyudmila, and F. Carol. Automating icd-9-cm encoding concepts locally, which naturally construct a corpus-aware using medical language processing: A feasibility study. In Proceedings of the AMIA Annual Symposium, 2000. terminology vocabulary with the help of external knowledge. [20] Y.-L. Zhao, L. Nie, X. Wang, and T.-S. Chua. Personalized In addition, it builds a novel global learning model to recommendations of locally interesting venues to tourists via cross region community matching. ACM Transactions on Intelligent Systems and Technology, enhance the local coding results. This model seamlessly 2013. integrates various heterogeneous cues. [21] G. Zuccon, B. Koopman, A. Nguyen, D. Vickers, and L. Butt. Exploiting medical hierarchies for concept-based information retrieval. In Proceedings In the future, we will investigate how to flexibly organize of the Seventeenth Australasian Document Computing Symposium, 2012. 27