=Paper=
{{Paper
|id=Vol-1180/CLEF2014wn-eHealth-ThakkarEt2014
|storemode=property
|title=Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered Information Retrieval System for Clinical Documents
|pdfUrl=https://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-ThakkarEt2014.pdf
|volume=Vol-1180
|dblpUrl=https://dblp.org/rec/conf/clef/ThakkarISM14
}}
==Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered Information Retrieval System for Clinical Documents==
Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered Information Retrieval system for Clinical Documents Harsh Thakkar1 , Ganesh Iyer2 , Kesha Shah3 , Prasenjit Majumder4 IR Lab, DA-IICT, Gandhinagar, Gujarat, India harsh9t1 , lastlegion2 , kesha.shah11063 , prasenjit.majumder4 @gmail.com Abstract. In this paper we, Team IRLabDAIICT, describe our partic- ipation in the ShARe/CLEF ehealth 2014 task 3: Information Retrieval for addressing questions related to patients health based on clinical re- ports. We submitted a total of six runs out of the seven in this years task. In our approach we focus on examining the relevance between the documents and user generated query by conducting experiments through query analysis. Our major challenge is to bridge the conceptual gap be- tween the user-generated queries (in-formal query) to biomedical specific terminology (formal query). We incorporate the MeSH (Medical Subject Headings) library , which is a medical thesaurus mapping layman terms to medical synonym terms in order to target the concept matching prob- lem. We use blind relevance feedback model for relevance feedback and query-likelihood model for query expansion which performed the best in the experiments conducted by us. The retrieval system is evaluated based on various parameters as: mean average precision, precision (P@5), precision (P@10), NDCG@5 and NDCG@10, with P@10 and NDCG@10 being the primary and secondary evaluation measures. The experiments were conducted on the gigantic 43.6 GB ShARe/CLEF 2013 Task 3 dataset harvested using (a) EU-FP7 Khresmoi project and and (b) a new 2014 set of English general realistic public queries based on the discharge summary contents. We have obtained the highest result in our baseline run (run 1), with compared to our other five runs, which is 0.706 as declared by ShARe/CLEF organizing committee. We further propose to incorporate a machine learning based retrieval algorithm prediction model for further exploration. 1 Introduction With the increase in awareness amongst people regarding health issues, a dire need for expanding the horizons of research on medical document retrieval has become mandatory these days. Patients now want answers to their health prob- lems on the touch of a finger. Discharge summaries obtained from the physi- cians has attracted a lot of attention from patients. Thus, the concept of health information retrieval has become more popular[1]. The main challenge in this 248 area is to answer the patients questions[2] in a format which is understand- able by the layman (i.e. the user/patient). The medical prescriptions and dis- charge summaries are written in professional medical terminology which makes no sense to the end user (patient). Taking this challenge as an opportunity the ShARe/CLEF (Cross Lingual Evaluation Forum) started an eHealth Task in year 2013, with a goal to develop such a system by attracting the young re- searchers from various organizations and universities of the world of computer science and biological domain and present a common platform for conducting the research. The ShARe/CLEF (Cross Lingual Evaluation Forum) community addressed this problem by initiating the eHealth Tasks from 2013[3] with a goal to evaluate systems that support laypeople in searching for and understanding their health information. This ShARe/CLEF challenge ehealth 2014[4] comprises three tasks1 . The specific use case considered is as follows: before leaving the hospital, a patient receives a discharge summary. This describes the diagnosis and the treatment that they received in the hospital. The first task considered in CLEF eHealth 2014[4] aims at delivering a visualization of the medical information extracted from the discharge summaries in manner which is conceivable by layman. While the second task requires normalization and expansion of abbreviations and acronyms present in the discharge summaries. The use case then postulates that, given the discharge summaries and the diagnosed disorders, patients often have questions regarding their health condition[5]. The goal of the third task[6, 7] is to provide valuable and relevant documents to patients, by developing a user-centered[2] or context-based[8] health information retrieval system so as to satisfy their health-related information needs. In order to aid the evaluation process of the systems involved in task 3, the task organizers have provided us with the potential user queries and their relevant judgments which have been obtained from medical professionals and an enormous dataset consisting of a variety of health and biomedical documents. As is common in evaluation of information retrieval (IR), the test collection consists of documents, queries, and corresponding relevance judgments. This paper describes our participation in Task 3 of 2014 ShARe/CLEF eHealth[4] Evaluation Lab. This paper is organized as follows: Section 2, dis- cusses the collection of documents provided in the corpus, its characteristics, relevance assessment of the documents and guidelines for the submission of re- sults. Section 3, presents the description of our proposed user-centered health information retrieval system. Section 4, discusses the conducted experimental runs and the analysis of harvested results. Section 5, concludes the paper with authors comments and future work. 2 Corpus The corpus provided by ShARe/CLEF organizers[6] consists of a distributed medical web crawl of a large collection of files containing the documents from 1 (http://clefehealth2014.dcu.ie/ 249 the EU-FP7 Khresmoi (http://khresmoi.eu/) project’s 2012 medical documents. This dataset consists of 1.6 million English documents covering a wide set of medical topics. This collection is prepared from a variety of sources (online) sources, including Health on the Net Foundation2 certified websites, as well as well-known medical sites and databases (e.g. Genetics Home Reference, Clini- calTrial.gov, Diagnosia3 ) The corpus is divided into eight archive files (zip files) named part number.zip. the size of the corpus is 6.3 GB compressed and about 43.6 GB uncompressed. This collection is the processed version of crawled documents after eliminating out the very small documents and correction of some errors in mark-up (i.e. applying Jsoup functions) from the original crawl. This document collection of CLEF eHealth 2014 Task 3 is split into a group of .dat files. Each of the .dat files in this collection contains the web crawl for one domain (where a domain is defined to be the root URL). The format of the data in the .dat files is described below: Each .dat file contains a collection of web pages and metadata (data about data i.e. keywords), where the data for one web page is organized as follows: 1. a unique identifier (#UID) for a web page in this document collection, 2. the date of crawl in the form YYYYMM (#DATE), 3. the URL (#URL) to the original web page, and 4. the raw HTML content (#CONTENT) of the web page This crawled dataset is a result of work by: Health on the Net Foundation (HON), Switzerland and University of Applied Sciences Western Switzerland (HES-SO), Switzerland. 2.1 Relevance assessment The information provided by the ShARe/CLEF on the relevance assessment is as follows: – The official training query and result set for eHealth Task 3 consists of 5 queries and corresponding result set generated from manual relevance as- sessment (by medical professionals) on a shallow pool. – Relevance assessments for these 5 training queries were formed based on pooled sets generated using the Vector Space Model and Okapi BM25. – Pooled sets were created by taking the top 30 ranked documents returned by the two retrieval models with duplicates removed. – Relevance is provided on a 2-point-scale: Non relevant (0); Relevant (1), and on a 4-point scale: Non relevant (0); on topic, but unreliable (1); somewhat relevant (2); highly relevant (3). 2 http://www.healthonnet.org 3 http://www.diagnosia.com/ 250 A sample query from the official 5 training queries[6] is shown below for refer- ence:2.2 Submission guidelines The guidelines stated for submission of the results of task 3 are as follows: Participants are asked to submit up to seven ranked runs for the English (EN) queries in Task 3. The runs to be submitted are described as following: 1. Run 1 (mandatory) is a baseline: only title and description in the query can be used, and no external resource (including discharge summary[5], corpora, ontology, etc.) can be used. 2. Runs 2-4 (optional) any experiment WITH the discharge summaries. 3. Runs 5-7 (optional) any experiment WITHOUT the discharge summaries. One of the runs from 2-4 and one from 5-7 must use the IR technique in Run 1 as a baseline. The idea being to allow analysis of the impact of discharge summaries/other techniques on the performance of the baseline Run 1. The optional runs must be ranked in order of priority (for Runs 2-4, 2 is the highest priority; for Runs 5-7, 5 is the highest priority). 3 Retrieval system 3.1 System overview The figure 1 represents the block diagram of our proposed retrieval system. First we pre-process the dataset provided. We first convert the data of the collection to standard TREC format which is acceptable by Indri4 . Indri is a software product of lemur for building indexing and retrieval systems based on language models. Then this pre-processed data goes through subsequent language processing steps as stemming (using porter-stemmer), stop-word removal (using a mixture of standard English and free medical dictionary comprising of 4000 medical words). Thus the data provided was first cleaned for the next (indexing) process. 4 Indri - Online at http://www.lemurproject.org/indri.php 251 Fig. 1. A systematical block diagram of the system for eHealth Task 3. 3.2 Information retrieval process The entire process of information/text retrieval is divided in to sub-parts namely: – Indexing – Query expansion – Documents retrieval for every query – Experiments and results Indexing: In the first phase, Indexing is done on the cleaned and format- ted document set using Indri with parameters including tokenizing, stop-word removal and stemming. For stop-word removal, we have used the PubMed list of stop-words5 . Stanford Porter stemmer is used for stemming during indexing. Query expansion: In the second phase, for query expansion stop-word re- moval from the same PubMed list which is used during indexing is used. Spell- checking is also performed on the query terms. Two dictionaries are used for spell checking and correcting, one is the general English (US) dictionary used in enchant and the second dictionary is specifically for medical domain.6 Blind relevance feedback is also used to re-rank the documents. We use Metamap[9] to integrate MeSH7 to extract medical nomenclature of layman terms, used in the subsequent runs 6 and 7 for query expansion. MeSH (stands for Medical Subject 5 Free Medical Dictionary at http://www.ncbi.nlm.nih.gov/books/NBK3827/table /pubmedhelp.T43/ 6 http://www.essex1.com/people/cates/free.htm 7 MeSH - Online at www.ncbi.nlm.nih.gov/mesh 252 Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed. MeSH is used to map the query from a textual base (layman rep- resentation) to a concept base (medical terminology). Runs 6 and 7 are based on MeSH. We make use of discharge summaries provided from the ShARe/CLEF ehealth task 2 dataset in runs 2 and 3 for the task of query expansion. We obtain the rele- vant discharge summary file for the user query from the documents indexed previ- ously. Further, we extract medical terms from the tags like {Major Surgical or Invasive Procedure, Past Medical History, Past Surgical History, Chief Complaint, Discharge Diagnosis} from the information (clinical analysis data consisting of various other tags) provided in that specific discharge summary file. We use a combination of the the words extracted from user query and the discharge summaries with a ratio of 4:1 (i.e. 0.8 weightage is given to words extracted from user query whereas 0.2 weightage is given to those extracted from discharge summaries). Thus, we employ an alternative approach to using medical thesaurus is employed in our retrieval system. Documents retrieval: In the third phase, scoring of documents is done for each query. The scores were calculated by running query on three retrieval models namely Okapi, tf-idf and Query Likelihood model. Okapi and tf-idf are non-language based models whereas Query Likelihood is language based model. However, after evaluating the lab results on last years test queries we decided discard the tf-idf runs due to its poor performance. The tf-idf model was reported to perform even worse than the okapi model. Hence, we have not compiled the tf-idf statistics in this report. Thus we submitted six successful runs to the task which consists of okapi, MeSH and query-likelihood (baseline run). Query-likelihood model The Query-likelihood is a language based model[10]. Using this, we construct from each document in the collection a language model Md . Our goal is to rank documents d by P (d|q), where the probability of a document is interpreted as the likelihood[11] that it is relevant to the query. Using Bayes rule, we have P (d|q) = P (q|d) ∗ P (d)/P (q) Since the probability of the query P(q) is the same for all documents, this can be ignored. Further, it is typical to assume that the probability of documents is uniform. Thus, P(d) is also ignored. Hence, P (d|q) = P (q|d). Documents are then ranked by the probability that a query is observed as a random sample from the document model. The multinomial unigram language model is commonly used to achieve this. We have: Y P (q|Md ) = Kq P (t|Md )tft,q (1) tV Where the multinomial coefficient is : Kq = Lq !/(tft1,q !tft2,q !...tftM,q !) (2) 253 In practice the multinomial co-efficient is usually removed from the calculation. The calculation is repeated for all documents to create a ranking of all documents in the document collection. Okapi model We make use of the famous Okapi Model[12, 11]. The weighting based documents score is calculated as in below formulae: X N (k1 + 1)tfd (k3 + 1)tftq RSVd = log . . (3) tq dft k1 ((1 − b) + b(Ld /Lave )) + tfd k3 + tftq n X f (qi , D).(k1 + 1) score(D, Q) = IDF (qi ) |D| (4) i=1 f (qi , D) + k1 .(1 − b + b. avgdl ) N − n(qi ) + 0.5 IDF (qi ) = log (5) n(qi ) + 0.5 Here, tf is term’s frequency in the document. qtf is term’s frequency in the query, N is the total number of documents in the collection, df is document frequency that contains the term, dl is the document length (in bytes) and avdl is the average document length. This formula has three components: The first is the idf part which reflects the discriminative power of each word. Second part is tf component which is the number of documents in which that particular term is encountered. The value of tf generally increases, but reaches an asymptotic limit. This implies that whether a term appears a 100 times or a 1000 times the function will weight it almost the same. Also, there is a correction for document weight. If a document is short, the tf for all its words is increased; if a document is long the tf for all its words is decreased. The count of each word is measured with respect to the document of average length in the collection. The third part is qtf component. If a word in the query appears more times than another it should be weighted higher. 3.3 Experiments and results Submitted Runs We submitted six of the seven runs in the task that are described as follows. 254 Run Query- MeSH Okapi Discharge Summary Likelihood summaries 1 X mandatory Baseline run 2 X X optional run WITH discharge summary 1 3 X X optional run WITH discharge summary 2 5 X optional run WITHOUT dis- charge summary 1 6 X X optional run WITHOUT dis- charge summary 2 7 X X optional run WITHOUT dis- charge summary 3 1. RUN 1: the first run is the system baseline run. In this run we use only the primitive blind relevance feedback mechanism for query feedback and query- likelihood model for query expansion. Use of external libraries and resources is exempted from in this run. 2. RUN 2: Run 2 and 3 are the runs WITH Discharge summaries. In run 2 we use the combination of query-likelihood model and discharge sum- maries, text extracted from {Major Surgical or Invasive Procedure, Past Medical History, Past Surgical History, Chief Complaint, Discharge Diagnosis} tags, for query expansion and document retrieval processes re- spectively. The medical words are extracted from discharge summaries and incorporated with the word set obtained by query-likelihood model using a special weight function. The words obtained from the query are given promi- nent weightage (0.7) while the words extracted from discharge summaries are given lesser weightage (0.3). 3. RUN 3: It is a variant of run 2, in which okapi model is introduced for retrieval process along-with discharge summaries for query expansion . 4. RUN 5: Run 5 through 7 is the runs WITHOUT Discharge summaries. Run 5 make use of the Okapi model along with blind relevance feedback for retrieving documents. 5. RUN 6: we used the Query Likelihood model for retrieving documents. We also used MeSH for query expansion. Medical concepts are identified using MetaMap and their synonyms were used for query expansion following the same weighting strategy as with discharge summaries. 6. RUN 7: we used the Okapi model for retrieval along with MeSH for query expansion. We discuss the official findings released by ShARe/CLEF organizing com- mittee in the following section along with their analysis. 255 4 Official results and Discussion The precision at 10 (P@10) and normalized discounted cumulative gain were selected as the primary and secondary evaluation parameters for ehealth task 3 2014. Figure 2 shows the results of the submitted six runs. Figure 4 shows the variance in the performance of the six runs. It is clear from figure 2 that run 1 (baseline) is the best run yielding the highest values followed by run 2, 6 and 5 respectively. Whereas run 7 is the least performing run followed by run 3. It is Fig. 2. official values of P@10 and NDCG CUT@10 values for ehealth task 3 2014 released by CLEF. observed that the best performing models are the query-likelihood model and its duo combination with discharge summaries or MeSH. Whereas the trio (i.e. the combination of query-likelihood, okapi and Mesh or discharge summaries) shows a drastic fall in the performance of the system. This is caused by the caching of irrelevant words extracted from the different texts. We observe that the query length is increased by leaps and bounds in the trio with respect to that in the duo. Thus, in the case of medical text retrieval, extensive use of keywords does not guarantee higher performance. Instead it suppresses the more relevant words and increases the vagueness in the query. It also observed from the comparison result of the runs 2 and 3 (with discharge summaries) that the use of combination of the okapi model and discharge summaries do not show any improvement over the baseline run rather a 12% downfall of the system performance. Whereas the 256 combination of query-likelihood and discharge summaries preserves the system performance when compared to the baseline run. Fig. 3. Comparison graph of query wise evaluation of the baseline run (run 1) with other participating teams. Query-wise comparison of the baseline run: Figure 3 shows the query wise performance graph of the participating teams in the ehealth task 3 2014. Figure 4 is computed based on these given results for each run. For the baseline run it is observed that our system out-performs other systems in queries 2, 8, 9, 15, 17, 28, 32 and 36 respectively. On the other hand, our baseline system lags in queries 11, 22, 24, 26, 30, 34, 38, 44, 48,and 50 respectively. Figure 4 shows the query wise performance of other five runs. It is clear from the above figures that the query-likelihood model out-performs the okapi model in run 2 and run 6 as compared to in run 3 and run 5. 5 Conclusion & Future work The nature of query varies from being very precise to being extremely vague. The model selected for a specific query type performs in accordance with the nature of the query. For the medical text retrieval task it is clear that the query- likelihood model works the best so far than the other models like okapi and tf-idf. We carried out experiments with the tf-idf model in the lab tests but its results were poor than that of the okapi model and thus were later excluded from the 257 Fig. 4. Comparison of performance variance between the six runs of ehealth task 3 2014. submitted runs (as mentioned in section 3.2). Hence, it can be concluded that tf- idf is not a suitable model for medical document information retrieval. Moreover, there is a need for developing a mechanism through which the deployment of these models could be predicted beforehand. By judging the nature of the query from the text, we can incorporate which model or which combination of models to use. Keeping the current constraints in mind, we propose to develop a machine learning based retrieval algorithm prediction model for predicting query per- formance based on the features extracted from the query as future work. The features comprise of various factors like the combination of keywords/terms used in the query, length of query, query similarity score and etc. Incorporating such a mechanism promises to improve the evaluation score of our system by a factor of 8-10 %. 6 Acknowledgments We would like to specially thank and acknowledge our faculty advisor Prof. Prasenjit Majumder for being around to provide quality inputs for the system. We would also like to convey our regards to the ShARe/CLEF team for orga- nizing the eHealth Task enabling teams like ours to participate and give us a chance to contribute to the community to the best of our abilities. 258 References 1. Lopes, C.T.: Health information retrieval - a state of art report. Technical Report, Faculdade de Engenharia da Universidade do Porto (2013) 2. Burstein, F., Fisher, J., McKemmish, S., Manaszewicz, R., Malhotra, P.: User centred quality health information provision: benefits and challenges. Proceedings of the 38th Hawaii International Conference on System Sciences (2005) 3. Suominen, H., Salanterä, S., Velupillai, S., Chapman, W.W., Savova, G., Elhadad, N., Pradhan, S., South, B.R., Mowery, D.L., Jones, G.J., et al.: Overview of the share/clef ehealth evaluation lab 2013. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization. Springer (2013) 212–231 4. Kelly, L., Goeuriot, L., Suominen, H., Schrek, T., Leroy, G., Mowery, D.L., Velupil- lai, S., Chapman, W.W., Martinez, D., Zuccon, G., Palotti, J.: Overview of the share/clef ehealth evaluation lab 2014. In: Proceedings of CLEF 2014. Lecture Notes in Computer Science (LNCS), Springer (2014) 5. Zhu, D., Wu, S., James, M., Carterette, B., Liu, H.: Using discharge summaries to improve information retrieval in clinical domain. Proceedings of the ShARe/-CLEF eHealth Evaluation Lab (2013) 6. Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G., Mueller, H.: Share/clef ehealth evaluation lab 2014, task 3: User-centred health information retrieval. In: Proceedings of CLEF 2014. (2014) 7. Goeuriot, L., Jones, G., Kelly, L., Leveling, J., Hanbury, A., Müller, H., Salanterä, S., Suominen, H., Zuccon, G.: Share/clef ehealth evaluation lab 2013, task 3: Information retrieval to address patients questions when reading clinical reports. Online Working Notes of CLEF, CLEF (2013) 8. Zhong, X., Xia, Y., Xie, Z., Na, S., Hu, Q., Huang, Y.: Concept-based medical document retrieval: Thcib at clef ehealth lab 2013 task 3. Proceedings of the ShARe/CLEF eHealth Evaluation Lab (2013) 9. Aronson, A., Lang, F.M.: An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17 (2010) 229–236 10. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on Re- search and development in information retrieval, ACM (1998) 275–281 11. Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Volume 1. Cambridge university press Cambridge (2008) 12. Robertson, S.: Understanding inverse document frequency: on theoretical argu- ments for idf. Journal of documentation 60 (2004) 503–520 259 QTRAIN2014.1 MRSA and wound infection What is MRSA infection and is it dangerous? This 60 year old lady has had coronary artery bypass grafting surgery and during recovery her wound has been infected. She wants to know how dangerous her infection is, where she got it and if she can be infected again with it. Documents should contain information about sternal wound infection by MRSA. They should describe the causes and the complications.