-

ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search)

Edwin Thuma

Nkwebi Motlogelwa

Tebo Leburu-Dingalo

0 0 Department of Computer Science, University of Botswana

In this paper, we describe the methods deployed in the different runs submitted for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: ad-hoc search. Speci cally, we deploy DPH term weighting model with explicit relevance feedback, where the expansion terms are selected from documents which were previously identi ed as relevant by assessors for each query. As improvement we deployed proximity search using both Full Dependence (FD) and Sequential Dependence (SD) variants of the Markov Random Fields and the Divergence From Randomness (DFR) based dependence models to re-rank documents, which have query terms in close proximity. In another approach, we deploy pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a rst pass retrieval. In addition, we deploy proximity search using the SD variant of the DFR based dependence model.

Explicit relevance Feedback Proximity Search Pseudo Relevance Feedback

In this paper, we describe the methods used for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: adhoc search. Detailed task description is available in the overview paper of Task 3 [ 7 ]. This task is a continuation of the previous CLEF eHealth Information Retrieval (IR) task that ran in 2013 [ 3 ], 2014 [ 4 ], 2015 [ 6 ] and 2016 [ 5 ]. The CLEF eHealth task aims to evaluate the e ectiveness of information retrieval systems when searching for health related content on the web, with the objective to foster research and development of search engines tailored to health information seeking [ 6, 5 ]. The CLEF eHealth Information Retrieval task was motivated by the problem of users of information retrieval systems formulating circumlocutory queries, using colloquial language instead of medical terms as studied by Zuccon et al. [ 9 ] and Stanton et al. [ 8 ]. In their studies, they found that modern search engines are ill-equipped to handle such queries; only 3 out of the to 10 results were highly useful for self diagnosis. In this paper, we attempt to tackle this problem by using explicit relevance feedback in order to improve the retrieval e ectiveness. In addition, we deploy proximity search to further improve the retrieval e ectiveness of our system. Moreover, we investigate whether pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a rst pass retrieval can improve the retrieval e ectiveness. This paper is structured as follows. Section 2 contains a background on algorithms used. Section 3 describes the experimental environment. In Section 4, we describe the experimental the 5 runs submitted by team ub-botswana. Section 5 presents and discusses results on training data. 2

Background

In this section, we begin by presenting a brief but essential background on the di erent algorithms used in our experimental investigation and evaluation. We start describing the DPH term weighting model in Section 2.1. We then describe the Bose-Einstein 1 (Bo1) model for query expansion in Section 2.2. 2.1

DPH Term Weighting Model

For all our experimental investigation and evaluation we used the parameterfree DPH term weighting model from the Divergence from Randomness (DFR) framework [ 2 ]. The DPH term weighting model calculates the score of a document d for a given query Q as follows: scoreDPH(d; Q) = Pt2Q qtf norm tf log((tf avlg l ) ( tNfc )) + 0:5 log(2 tf (1 tMLE)) (1) where qtf , tf and tf c are the frequencies of the term t in the query Q , in the document d and in the collection C respectively. N is number of documents in the collection C, avg l is the average length of documents in the collection C and l is the length of the document d. tMLE = tlf and norm = (1 tMLE)2 . tf+1 2.2

Bose-Einstein 1 (Bo1) Model for Query Expansion

In our experimental investiagtion and evaluation, we used the Terrier-4.0 Divergence from Randomness (DFR) Bose-Einstein 1 (Bo1) model to select the most informative terms from the topmost documents after a rst pass document ranking. The DFR Bo1 model calculates the information content of a term t in the top-ranked documents as follows [ 1 ]: w(t) = tf x log2

+ log2(1 + Pn(t)) 1 + Pn(t)

Pn(t) Pn(t) = tf c N (2) (3) where Pn(t) is the probability of t in the whole collection, tf x is the frequency of the query term in the top x ranked documents, tf c is the frequency of the term t in the collection, and N is the number of documents in the collection.

Experimental Setting

FAQ Retrieval Platform: For all our experimental evaluation, we used Terrier4.2, an open source Information Retrieval (IR) platform. All the documents (ClueWeb 12 B13) used in this study were rst pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Stopword removal was enabled and we used Terrier stopword list. The index was created using blocks to save positional information with each term. For pseudo relevance feedback, we used Terrier-4.2 DFR BoseEinstein 1 (Bo1) model for query expansion to select the 10 most informative terms from the top 3 ranked documents. 4

Description of the Di erent Runs

Term Weighting Model: For all our runs, we used the parameter-free DPH Divergence From Randomness term weighting model in Terrier-4.2 IR platform to score and rank the documents in the ClueWeb 12 B13 document collection. ub-botswana IRTask1 run1: We ranked the documents using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback, where we selected expansion terms from the top 3 documents that were explicitly marked relevant by assessors for each query. We used the Terrier-4.2 DFR Bose-Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from these documents. In addition, we deployed the Full Dependence (FD) variant of the Markov Random Fields for terms dependence. Full Dependence assumes all query terms are in some way dependent on each other. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data. ub-botswana IRTask1 run2: We performed a rst pass retrieval using DPH DFR term weighting model. As improvement, we deployed explicit relevance feedback, where we deployed DFR Bo1 model for query expansion to select the expansion terms. ub-botswana IRTask1 run3: We produced an initial ranking using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback and used the DFR Bo1 model for query expansion to select the expansion terms. In addition, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data. ub-botswana IRTask1 run4: We used the parameter-free DPH DFR term weighting model to produce and initial ranking. As improvement, we deployed a simple pseudo-relevance feedback on the local collection. We used the Bo1 model for query expansion to select the expansion terms. We then performed a second pass retrieval on the local collection with the new expanded query. ub-botswana IRTask1 run5: We used ub-botswana IRTask1 run4 as the baseline system. As improvement, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based term dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data. 5

Results and Discussion

These working notes were compiled and submitted before the relevance judgments were released. Below we present the results of our runs using the 2016 query relevance judgments. Please note that the o cial results to be released will be di erent because new query relevance judgments will be released Table 1 presents our results on the training data. From this table, we see a degradation in performance when we incorporate term dependence only in our ranking (ub-botswana IRTask1 run5 ). However, when we deploy pseudo relevance feedback (ub-botswana IRTask1 run4 ), we see an improvement in the retrieval performance in terms of precision at 5 (P@5), precision at 10 (P@10) and recall (rel ret). Moreover signi cant improvement in the recall is obtained when explicit relevance feedback is deployed ((ub-botswana IRTask1 run1 ), (ubbotswana IRTask1 run2 ) and (ub-botswana IRTask1 run3 )). In addition, we obtain mixed results when we incorporate proximity search after deploying explicit relevance feedback. For example, there was an improvement in the retrieval performance in terms of P@5 and P@10 when we deploy the FD variant of the Markov Random Fields for term dependence using a window size of 15 (ubbotswana IRTask1 run1 ). In contrast, we obtain a degradation in the retrieval performance in terms of P@5 and P@10 when we deploy the SD variant of the Divergence from Randomness based term dependence model using a window size of 15 (ub-botswana IRTask1 run5 ).

Amati . Probabilistic Models for Information Retrieval based on Divergence from Randomness . University of Glasgow,UK, PhD Thesis , pages 1 { 198 , June 2003 .

Amati , E. Ambrosi,

Bianchi ,

Gaibisso , and G. Gambosi. FUB , IASICNR and University of Tor Vergata at TREC 2007 Blog Track . In Proceedings of the 16th Text REtrieval Conference (TREC-2007) , pages 1 { 10 , Gaithersburg , Md., USA., 2007 . Text REtrieval Conference (TREC).

Goeuriot ,

G.J.F

Jones ,

Kelly ,

Leveling ,

Hanbury , H. Muller, S. Salantera,

Suominen , and G. Zuccon. ShARe/CLEF eHealth Evaluation Lab 2013 , Task 3: Information Retrieval to Address Patients' Questions when Reading Clinical Reports . In CLEF 2013 Online Working Notes , volume 8138 . CEUR-WS , 2013 .

Goeuriot ,

Kelly ,

Li ,

Palotti ,

Pecina ,

Zuccon ,

Hanbury ,

G.J.F

Jones , and

Mueller . Share/clef ehealth Evaluation Lab 2014 , Task 3: UserCentred Health Information Retrieval . In CLEF 2014 Online Working Notes. CEUR-WS , 2014 .

Liadh

Kelly , Lorraine Goeuriot, Hanna Suominen, Aurelie Neveol, Joa~o Palotti, and Guido Zuccon. Overview of the CLEF eHealth Evaluation Lab 2016 , pages 255 { 266 . Springer International Publishing, Cham, 2016 .

Palotti ,

Zuccon ,

Goeuriot ,

Kelly ,

Hanbury ,

G.J.F.

Jones ,

Lupu , and

Pecina . CLEF eHealth Evaluation Lab 2015 task 2: Retrieving Information about Medical Symptoms . In CLEF 2015 Online Working Notes. CEUR-WS , 2015 .

Palotti , G. Zuccon, Jimmy,

Pecina ,

Lupu ,

Goeuriot ,

Kelly , and

Hanbury . CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab . In In Working Notes of Conference and Labs of the Evaluation (CLEF) Forum . CEUR Workshop Proceedings , 2017 .

Stanton ,

Ieong , and

Mishra . Circumlocution in Diagnostic Medical Queries . In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval , pages 133 { 142 . ACM, 2014 .

Zuccon ,

Koopman , and

Palotti . Diagnose This If You Can: On the Effectiveness of Search Engines in Finding Medical Self-Diagnosis Information . In Advances in Information Retrieval (ECIR 2015 ), pages 562 { 567 . Springer, 2015 .