ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search)

ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search) EdwinThuma thumae@mopipi.ub.bw Department of Computer Science University of Botswana NkwebiMotlogelwa motlogel@mopipi.ub.bw Department of Computer Science University of Botswana TeboLeburu-Dingalo Department of Computer Science University of Botswana ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search) 68A2AF0E74870408FDB7CE147116E34F GROBID - A machine learning software for extracting information from scholarly documents Explicit relevance Feedback Proximity Search Pseudo Relevance Feedback

In this paper, we describe the methods deployed in the different runs submitted for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: ad-hoc search. Specifically, we deploy DPH term weighting model with explicit relevance feedback, where the expansion terms are selected from documents which were previously identified as relevant by assessors for each query. As improvement we deployed proximity search using both Full Dependence (FD) and Sequential Dependence (SD) variants of the Markov Random Fields and the Divergence From Randomness (DFR) based dependence models to re-rank documents, which have query terms in close proximity. In another approach, we deploy pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a first pass retrieval. In addition, we deploy proximity search using the SD variant of the DFR based dependence model.

Introduction

In this paper, we describe the methods used for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: adhoc search. Detailed task description is available in the overview paper of Task 3 [7]. This task is a continuation of the previous CLEF eHealth Information Retrieval (IR) task that ran in 2013 [3], 2014 [4], 2015 [6] and 2016 [5]. The CLEF eHealth task aims to evaluate the effectiveness of information retrieval systems when searching for health related content on the web, with the objective to foster research and development of search engines tailored to health information seeking [6,5]. The CLEF eHealth Information Retrieval task was motivated by the problem of users of information retrieval systems formulating circumlocutory queries, using colloquial language instead of medical terms as studied by Zuccon et al. [9] and Stanton et al. [8]. In their studies, they found that modern search engines are ill-equipped to handle such queries; only 3 out of the to 10 results were highly useful for self diagnosis. In this paper, we attempt to tackle this problem by using explicit relevance feedback in order to improve the retrieval effectiveness. In addition, we deploy proximity search to further improve the retrieval effectiveness of our system. Moreover, we investigate whether pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a first pass retrieval can improve the retrieval effectiveness. This paper is structured as follows. Section 2 contains a background on algorithms used. Section 3 describes the experimental environment. In Section 4, we describe the experimental the 5 runs submitted by team ub-botswana. Section 5 presents and discusses results on training data.

Background

In this section, we begin by presenting a brief but essential background on the different algorithms used in our experimental investigation and evaluation. We start describing the DPH term weighting model in Section 2.1. We then describe the Bose-Einstein 1 (Bo1) model for query expansion in Section 2.2.

DPH Term Weighting Model

For all our experimental investigation and evaluation we used the parameterfree DPH term weighting model from the Divergence from Randomness (DFR) framework [2]. The DPH term weighting model calculates the score of a document d for a given query Q as follows:

score DP H (d, Q) = t∈Q qtf • norm • tf • log((tf • avg l l ) • ( N tf c )) + 0.5 • log(2 • π • tf • (1 − t M LE ))(1)

where qtf , tf and tf c are the frequencies of the term t in the query Q , in the document d and in the collection C respectively. N is number of documents in the collection C, avg l is the average length of documents in the collection C and l is the length of the document d. t

M LE = tf l and norm = (1−t M LE ) 2 tf +1

Bose-Einstein 1 (Bo1) Model for Query Expansion

In our experimental investiagtion and evaluation, we used the Terrier-4.0 Divergence from Randomness (DFR) Bose-Einstein 1 (Bo1) model to select the most informative terms from the topmost documents after a first pass document ranking. The DFR Bo1 model calculates the information content of a term t in the top-ranked documents as follows [1]:

w(t) = tf x • log 2 1 + P n (t) P n (t) + log 2 (1 + P n (t))(2)P n (t) = tf c N(3)

where P n (t) is the probability of t in the whole collection, tf x is the frequency of the query term in the top x ranked documents, tf c is the frequency of the term t in the collection, and N is the number of documents in the collection.

FAQ Retrieval Platform: For all our experimental evaluation, we used Terrier-4.

2, an open source Information Retrieval (IR) platform. All the documents (ClueWeb 12 B13) used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Stopword removal was enabled and we used Terrier stopword list. The index was created using blocks to save positional information with each term. For pseudo relevance feedback, we used Terrier-4.2 DFR Bose-Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from the top 3 ranked documents.

Description of the Different Runs

Term Weighting Model: For all our runs, we used the parameter-free DPH Divergence From Randomness term weighting model in Terrier-4.2 IR platform to score and rank the documents in the ClueWeb 12 B13 document collection.

ub-botswana IRTask1 run1: We ranked the documents using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback, where we selected expansion terms from the top 3 documents that were explicitly marked relevant by assessors for each query. We used the Terrier-4.2 DFR Bose-Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from these documents. In addition, we deployed the Full Dependence (FD) variant of the Markov Random Fields for terms dependence. Full Dependence assumes all query terms are in some way dependent on each other. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.

ub-botswana IRTask1 run2: We performed a first pass retrieval using DPH DFR term weighting model. As improvement, we deployed explicit relevance feedback, where we deployed DFR Bo1 model for query expansion to select the expansion terms.

ub-botswana IRTask1 run3: We produced an initial ranking using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback and used the DFR Bo1 model for query expansion to select the expansion terms.

In addition, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.

ub-botswana IRTask1 run4: We used the parameter-free DPH DFR term weighting model to produce and initial ranking. As improvement, we deployed a simple pseudo-relevance feedback on the local collection. We used the Bo1 model for query expansion to select the expansion terms. We then performed a second pass retrieval on the local collection with the new expanded query.

ub-botswana IRTask1 run5: We used ub-botswana IRTask1 run4 as the baseline system. As improvement, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based term dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.

Results and Discussion

These working notes were compiled and submitted before the relevance judgments were released. Below we present the results of our runs using the 2016 query relevance judgments. Please note that the official results to be released will be different because new query relevance judgments will be released 1 presents our results on the training data. From this table, we see a degradation in performance when we incorporate term dependence only in our ranking (ub-botswana IRTask1 run5 ). However, when we deploy pseudo relevance feedback (ub-botswana IRTask1 run4 ), we see an improvement in the retrieval performance in terms of precision at 5 (P@5), precision at 10 (P@10) and recall (rel ret). Moreover significant improvement in the recall is obtained when explicit relevance feedback is deployed ((ub-botswana IRTask1 run1 ), (ubbotswana IRTask1 run2 ) and (ub-botswana IRTask1 run3 )). In addition, we obtain mixed results when we incorporate proximity search after deploying explicit relevance feedback. For example, there was an improvement in the retrieval performance in terms of P@5 and P@10 when we deploy the FD variant of the Markov Random Fields for term dependence using a window size of 15 (ubbotswana IRTask1 run1 ). In contrast, we obtain a degradation in the retrieval performance in terms of P@5 and P@10 when we deploy the SD variant of the Divergence from Randomness based term dependence model using a window size of 15 (ub-botswana IRTask1 run5 ).

Table 1 .1Retrieval Results for all 5 Runs using 2016 qrelRun IDP@5 P@10 rel retDPH Baseline0.2973 0.2710 10104ub-botswana IRTask1 run1 0.5093 0.4423 13661ub-botswana IRTask1 run2 0.4513 0.4097 13661ub-botswana IRTask1 run3 0.4433 0.4073 13661ub-botswana IRTask1 run4 0.3160 0.2903 11129ub-botswana IRTask1 run5 0.2873 0.2617 10104

Table

Probabilistic Models for Information Retrieval based on Divergence from Randomness GAmati June 2003 University of Glasgow,UK PhD Thesis FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track GAmati EAmbrosi MBianchi CGaibisso GGambosi Proceedings of the 16th Text REtrieval Conference (TREC-2007) the 16th Text REtrieval Conference (TREC-2007)

Gaithersburg, Md., USA

TREC 2007 Text REtrieval Conference ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information Retrieval to Address Patients' Questions when Reading Clinical Reports LGoeuriot GJJones LKelly JLeveling AHanbury HMüller SSalantera HSuominen GZuccon CLEF 2013 Online Working Notes CEUR-WS 2013 8138 Share/clef ehealth Evaluation Lab 2014, Task 3: User-Centred Health Information Retrieval LGoeuriot LKelly WLi JPalotti PPecina GZuccon AHanbury GJJones HMueller CLEF 2014 Online Working Notes CEUR-WS 2014 Overview of the CLEF eHealth Evaluation Lab LiadhKelly LorraineGoeuriot HannaSuominen AurélieNévéol JoãoPalotti GuidoZuccon 2016. 2016 Springer International Publishing Cham CLEF eHealth Evaluation Lab 2015 task 2: Retrieving Information about Medical Symptoms JPalotti GZuccon LGoeuriot LKelly AHanbury GJ FJones MLupu PPecina CLEF 2015 Online Working Notes CEUR-WS 2015 CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab JPalotti GZuccon PJimmy MPecina LLupu LGoeuriot AKelly Hanbury Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings 2017 Circumlocution in Diagnostic Medical Queries IStanton SIeong NMishra Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval the 37th international ACM SIGIR conference on Research & development in information retrieval ACM 2014 Diagnose This If You Can: On the Effectiveness of Search Engines in Finding Medical Self-Diagnosis Information GZuccon BKoopman JPalotti Advances in Information Retrieval (ECIR 2015) Springer 2015