LABERINTO at ShARe/CLEF eHealth Evaluation Lab 2014 Juan Manuel Córdoba Malagón and Manuel Jesús Maña López LABERINTO Laboratorio de Recuperación de información y Minería de Texto y Datos Universidad de Huelva Carretera Palos de La Frontera s/n 21819 Palos de la Frontera (Huelva), Spain juanmanuel.cordoba@gmail.com,manuel.mana@dti.uhu.es Abstract. This paper describes the participation of LABERINTO team at the ShARe/CLEF eHealth Evaluation Lab 2014 task 3a. We perform four dierent experiments which consist of a baseline and three variants of the baseline model. The rst was mandatory baseline system with only title and description in the query. Our baseline retrieval system used a Lucene Index scheme with traditional stopping and stemming, no external resources was used. We submitted three additional runs (without the discharge summaries), two from a Lucene-based system with MeSH query expansion and one of which made use of the National Library of Medicine's MetaMap tool to perform term boosting. Keywords: Lucene, Solr, MetaMap, MeSH, query expansion 1 Introduction ShARe/CLEF (Cross-Language Evaluation Forum) eHealth Evaluation Lab goal is to evaluate systems that support laypeople in searching for and understanding their health information [8]. It comprises three tasks: Visual-Interactive Search and Exploration of eHealth Data (Task 1), Information extraction from clini- cal text (Task 2) and User-centred health information retrieval (Task 3). Task 3 goal is to develop methods and resources for the evaluation of Information Retrieval (IR) from patients' perspective. Towards this, ShARe/CLEF eHealth Evaluation Lab 2014 task 3 is split into two parts: monolingual retrieval (task 3a) and multilingual retrieval (task 3b) [5]. In particular, LABERINTO team have focused on task 3a. The LABERINTO group contributed 4 runs to this year's challenge. Our methods are based on Lucene retrieval engine. It's the rst time that we are participating in the ShARe/CLEF eHealth Evaluation Lab and, for a rst ap- proximation, our baseline submission uses Lucene's default standard analyzer to process free-text title and description elds. The remaining submissions build upon this baseline approach. Specically, we consider the contribution to re- trieval eectiveness using boosting of Metamap identied terms and two versions of query expansion using MeSH. 195 This working notes are organised as follows. In Section 2 we describe some issues related to document preprocessing and indexing. Section 3 describes our approaches for task 3a. Section 4 lists the results of our work in comparison to median and best values obtained across all systems. Finally, we make conclusions and state the future work in Section 5. 2 Document collection preprocessing and indexing The goal of the third task is to provide valuable and relevant documents to pa- tients, so as to satisfy their health-related information needs. To evaluate systems that tackle this third task, the lab organizers provide potential patient queries and a document collection containing various health and biomedical documents for task participants to create their search system. As is common in evaluation of information retrieval (IR), the test collection consists of documents, queries, and corresponding relevance judgements [4]. Specically, Task 3a uses an approx- imately one million medical documents made available by the EU-FP7 Khresmoi project[6] 1 and a set of English general public queries that individuals may re- alistically pose . 2 This collection consists of web pages covering a broad range of health topics, targeted at both the general public and healthcare professionals. The crawled documents are provided in the dataset in their raw HTML (Hyper Text Markup Language) format along with their uniform resource locators (URL). In order to remove html tags ,raw webpages are preprocessed to extract main content by the Html parser Apache Tika. The Apache Tika toolkit detects and extracts meta- data and structured text content from various documents using existing parser libraries including Html. Tika is a project of the Apache Software Foundation and was formerly a subproject of Apache Lucene. After the data has been cleaned, we indexed all the documents. We used a very traditional IR system based on the Apache Lucene open-source toolkit that was essentially a successor to the system used by LABERINTO team for the 2011 TREC medical track [2]. Lucene is a powerful Java library that lets you easily add document retrieval to any application. In recent years, Lucene has become exceptionally popular and is now the most widely used information retrieval library. Documents and elds are Lucene's fundamental units of indexing and search- ing. A document is Lucene's atomic unit of indexing and searching. It is a con- tainer that holds one or more elds, which in turn contain the  real content. Each eld has a name to identify it, a text or binary value, and a series of de- tailed options that describe what Lucene should do with the eld value when you add the document to the index. To index our collection sources, we must rst translate it into Lucene's documents and elds. Our indexing module takes every clean Html le from preprocessed collection into a single Lucene document. 1 http://www.khresmoi.eu/ 2 http://clefehealth2014.dcu.ie/task-3 196 What we end up after running Lucene is a directory named index, which contains les used by Lucene to associate terms with documents. To accomplish this, a Lucene index was created with a specic analyzer model-dependent. An Analyzer takes a series of terms or tokens and creates the terms to be indexed. A unique kind of Lucene index has been used for all developed models, or in other words, all LABERINTO models for ShARe/CLEF eHealth Evaluation Lab 2014 share the same Lucene index. 3 Retrieval approaches This section presents the dierent models developed for evaluation. Among the dierent test models developed, four have been selected for submission:  Mandatory baseline run using only title and description elds3 .  Model with Metamap boost terms4 .  Baseline with Query expansion using MeSH5 .  Baseline with query expansion using MeSH and adding narrative eld6 . All the proposed models no uses the discharge summaries. The dierences be- tween models are described in the following sections. 3.1 Baseline This model has been designed to be the simplest approximation to the task. The Baseline model is based on the method of bag of words. In this model, topics text is represented as an unordered collection of words, disregarding grammar and even word order. Therefore, the model matches the words in the topic with the words contained in the index. The usefulness of the model is twofold: provides a basis results for comparing and, on the other hand, its code serves as a basis for implementing more complex models. In order to maintain the simplicity, the baseline model makes match the topics words only with the title and description elds. To develop this model we used Lucene's default StandardAnalyzer. The ana- lyzer takes only the text and provides a set of terms to be searched in the index. Our base analyzer discards stops words with little semantic value, such as "the", "a", "an","for",.... Cutting down on the number of terms indexed can save time and space in an index, but it can also limit accuracy. 3 UHU_EN_Run1.dat run submission. 4 UHU_EN_Run5.dat run submission. 5 UHU_EN_Run6.dat run submission. 6 UHU_EN_Run7.dat run submission. 197 3.2 Metamap terms Boost Lucene provides the relevance level of matching documents based on the terms found. The higher the boost factor, the more relevant the term will be. Boosting allows you to control the relevance of a document by boosting its term . 7 In this model, we used medical main concepts identication by Metamap. Queries was built from the title and description elds. Next, UMLS concepts in queries are recognized using MetaMap [1]. Because our method don't need any special score mapping or disambiguation, Metamap's default options was used. Only medical term identication function- ality was used. In other words, our method only need medical terms detection in order to know which topic part needs to be boosted. No overmatches are allowed and the highers score mappings are selected. For this UMLS detected mappings, a boost factor hits value is set. In particular, values from 1.25 to 2.25 (carried out with an interval of 0.25) were applied by an exploratory approach, centred mainly in the training topics and the ImageCLEFmed 2013 database and topics [3]. From this experimental test we selected a boost of 1.5 for submission. Since the set of training topics so small, we believed that no further tuning wasn't possible. 3.3 Query expansion with MeSH Term expansion is one possible retrieval technique that can benet from public accessibility of structured medical vocabularies. Applied at query-time usually it deals with the problem that real-world concepts are referred to using diferent terms. An information retrieval system can help users and also automatically rene their queries by exploiting the semantic relationships between terms [7]. MeSH (Medical Subject Headings) is a controlled vocabulary, produced and maintained by the U. S. National Library of Medicine [9]. There are currently over 26,000 descriptors or Main Headings and almost 180,000 alternative expres- sions (ENTRY TERMS), thus, MeSH oers many possibilities for expanding the query by MeSH tree structure and/or entry terms [10]. In this model, an open source implementation of SKOS-based 8 term expan- sion for Solr is used. For every term included in both title and description elds, term expansion through MeSH is performed with SKOS. In this approach, we expand the query terms with related terms from MeSH, duplicate related terms are removed. A default expansion terms weighting of 0.7 is used according to the previous work of Bernhard, Martins and Magalhães [7]. 3.4 Query expansión with MeSH adding narrative eld As in the case mentioned above, this model use query expansion with MeSH with SKOS. The only dierence lies in the use of the query expansion in addition of 7 http://lucene.apache.org/core/2_9_4/queryparsersyntax.html 8 https://github.com/behas/lucene-skos 198 narrative eld terms. In this case, for every term included in title, description and narrative elds, term expansion through MeSH was performed and terms in narrative eld were added. It is worth mentioning at this point that narrative eld hasn't an important relevance for query expansion. Major dierences for information retrieval in this model comes from adding narrative eld terms that MeSH concepts expansion determination. In relation to this issue, table 1 shows a summary of MeSH concepts detected in topics for query expansion and some data for synonyms extracted from MeSH ontology in the performed expansion. For MeSH concepts, minimum, maximum and average number of MeSH concepts detected per topic has been collected (along with the standard deviation). In the same way, data from the synonyms entries used for expansion are gathered. Table 1: Statistics data extracted from the query expansion performed. Min Max Average Standard deviation MeSH Concepts 1 3 1.45 0.88 Synonyms 0 25 3.23 4.24 4 Results Two main metrics were taken account for Share/CLEF eHealth 2014 task 3: Precision at 10 (P@10) as primary measure, and Normalised Discounted Cu- mulative Gain at rank 10 (nDCG@10) as secondary measure. The Share/CLEF eHealth 2014 task 3 built result pools from participants submissions considering the top 10 documents ranked by baseline systems (run 1), and the two highest priority runs that used the discharge summaries (run 2 and 3) and the highest two priority runs that did not used the discharge summaries (run 5 and 6); thus runs 4 and 7 were not sampled to form the assessment pool. Table 2: Results of the submitted runs to Share/CLEF eHealth 2014 task 3a. Measure RUN1 RUN5 RUN6 RUN7 P@10 0.8000 0.5860 0.5140 0.5100 nDCG@10 0.5530 0.5985 0.5163 0.5158 map 0.2624 0.3152 0.2588 0.3009 Table 2 shows the results of our submitted runs. In this table, we have taken account the two main metrics for 2014 edition and the mean average precision (MAP), a metric usually used in Information Retrieval, as a reference point. 199 First, we nd that two of our submitted runs outperform the baseline model, one from Metamap boost model (in all metrics) and another one from query expansion model (taken MAP as reference). This shows that these approaches could be more eective than a simple baseline model. In contrast, the concept ex- pansion approach based only in title and description elds decrease the retrieval performance in all metrics. We think that the improper query expansion settings may be the reason of this performance. Our query expansion models dier on elds use, RUN6 model uses the title and description elds, and RUN7 model uses the title, description and narrative elds to select terms to expand. Results show that rening the set of terms used for query expansion often prevents the query drift caused by blind expansion and yields substantial improvements in retrieval eectiveness. Although RUN7 improves map, and taking into account the reference values used at 2014 lab, our query expansion runs have proved that not all query expansion lead to improvements of retrieval. By other hand, in RUN5, we consider positive the contribution of increase weights of query medical concepts for scoring a document. Empirical results show that considering boost medical concepts along with the original query concepts can improve retrieval eectiveness; which concepts to consider (with Metamap or other tool) and how to weight these is however a challenging issue. Plots comparing each of our runs against the median and best performance (p@10) across all systems submitted to CLEF for each query topic are shown in gure 1. In particular, for each query the height of a bar represents the gain/loss of our system and the best system (for that query) over the median system. Per-topic comparison allows observe how performance varies in a important manner by model. Thus, we can see in Fig.1a that baseline has 12 queries perform better than the median, while 26 queries perform worse than the median, and other 12 queries perform in the median line. In g.1b, Metamap based model has 14 queries perform better than the median, while 26 queries perform worse than the median, and other 10 queries perform in the median line. It means that comparing baseline and our best method, the UMLS concept-based method can do something better but heavily needs improvement to surpass baseline. As regards query expansion models, we can see in Fig.1c that RUN6 has only 8 queries perform better than the median, while 32 queries perform worse than the median, and other 10 queries perform in the median line. In g.1d, RUN7 has just 7 queries perform better than the median, while 33 queries perform worse than the median, and other 10 queries perform in the median line. Though one of the query expansion system has been better than baseline, per topic analysis shows a general poor performance for query expansion. 5 Conclusions and future work We have presented dierents approachs to medical Information Retrieval from patients' perspective. Our models were based mainly on concept identication by Metamap and query expansion by MeSH. Both, the concept boosting and query expansion needs to be improved and rened. Some hints to improve, like 200 (a) Baseline model using only title and description elds. (b) Model with Metamap boost terms. 201 (c) Baseline with Query expansion using MeSH. (d) Baseline with query expansion using MeSH + narrative eld. Fig. 1: Per-topic comparison between submitted runs and the other systems (Best vs Median). terms selection for expansion or tuning boosting parameters, has been exposed. Despite the inconspicuous results, we think that this rst participation provides a platform for further development into medical concept based and query expan- sion retrieval systems for dealing with medical data from patients' perspective. Acknowledgement This work has been partially funded by the Andalusian Ministry of Economy, Innovation and Science (Bidamir project, TIC 07629). References 1. Alan R Aronson. Eective mapping of biomedical text to the umls metathesaurus: the metamap program. In Proceedings of the AMIA Symposium, page 17. American Medical Informatics Association, 2001. 2. Juan Manuel Córdoba, Manuel J Maña López, Noa P Cruz Díaz, Jacinto Mata Vázquez, Fernando Aparicio, Manuel de Buenaga Rodríguez, Daniel Glez-Peña, and Florentino Fdez-Riverola. Medical-miner at trec 2011 medical records track. In TREC, 2011. 3. A García Seco de Herrera, Jayashree Kalpathy-Cramer, D Demner Fushman, Sameer Antani, and Henning Müller. Overview of the imageclef 2013 medical tasks. Working notes of CLEF, 2013. 4. Lorraine Goeuriot, G Jones, Liadh Kelly, Johannes Leveling, Allan Hanbury, Hen- ning Müller, Sanna Salanterä, Hanna Suominen, and Guido Zuccon. Share/clef ehealth evaluation lab 2013, task 3: Information retrieval to address patients ques- tions when reading clinical reports. Online Working Notes of CLEF, CLEF, 2013. 5. Lorraine Goeuriot, Liadh Kelly, Wei Li, Joao Palotti, Pavel Pecina, Guido Zuccon, Allan Hanbury, Gareth Jones, and Henning Mueller. Share/clef ehealth evaluation lab 2014, task 3: User-centred health information retrieval. In Proceedings of CLEF 2014, 2014. 6. A Hanbury and H Müller. Khresmoimultimodal multilingual medical information search. MIE Village of the Future, 2012. 7. Bernhard Haslhofer, Flávio Martins, and João Magalhães. Using skos vocabularies for improving web search. In Proceedings of the 22nd international conference on World Wide Web companion, pages 12531258. International World Wide Web Conferences Steering Committee, 2013. 8. Liadh Kelly, Lorraine Goeuriot, Hanna Suominen, Tobias Schrek, Gondy Leroy, Danielle L. Mowery, Sumithra Velupillai, Wendy W. Chapman, David Martinez, Guido Zuccon, and Joao Palotti. Overview of the share/clef ehealth evaluation lab 2014. In Proceedings of CLEF 2014, Lecture Notes in Computer Science (LNCS). Springer, 2014. 9. Henry J Lowe and G Octo Barnett. Understanding and using the medical subject headings (mesh) vocabulary to perform literature searches. Jama, 271(14):1103 1108, 1994. 10. Jacinto Mata, Mariano Crespo, and Manuel J Maña. Laberinto at imageclef 2011 medical image retrieval task. Working Notes of CLEF, 2011, 2011. 202