Miracl at Clef 2015 : User-Centred Health
              Information Retrieval Task

Nesrine KSENTINI1 , Mohamed TMAR1 , Mohand BOUGHANEM2 , and Faı̈ez
                            GARGOURI1

 1 MIRACL Laboratory, City ons Sfax, University of Sfax, B.P.3023 Sfax TUNISIA
             2 IRIT Laboratory, University of Toulouse, France
                        ksentini.nesrine@ieee.org,
                       mohamed.tmar@isimsf.rnu.tn,
                              bougha@irit.fr,
                      faiez.gargouri@isimsf.rnu.tn,


        Abstract. This paper presents our second participation in user-centred
        health information retrieval task at the CLEFeHealth 2015. This task
        has as objective to retrieve pertinent documents that answer circumlo-
        cutory queries posed by users when they are faced with symptoms and
        signs of a disease.
        We have submitted five runs, the first is the baseline system and the
        other runs use the pseudo relevance feedback technique to expand the
        original query in different ways.
        This year, result pools are built from receiving submissions for all partic-
        ipants considering the top 10 documents ranked by every three highest
        priority runs (1, 2, 3). The obtained results are motivating, but we can be
        improved with P@10=0.3212 for the baseline system and p@10=0.2939
        for the best P@10 in the other runs.

        Keywords: information retrieval, least square method, medical docu-
        ments, query expansion, blind feedback, circumlocutory queries.


1     Introduction
With the proliferation of documents on the web, searching pertinent documents
to user’s need becomes a difficult task, especially if the queries are not well ex-
pressed.
This process of searching documents in a corpus in order to meet a need is called
Information Retrieval (IR). Many information retrieval systems are developed
in order to provide all necessary functions to seek relevant information.
Evaluating these systems becomes important to know the quality of returned
results. For that, International campaigns for assessment have intervened in a
competitive context, such as TREC 1 and CLEF, in order to evaluate several
research systems of the various participants in these competitions.
This paper presents our participation at clef ehealth 2015: User-Centred Health
1
    http://trec.nist.gov/
Information Retrieval task [8].
The goal of this task is to evaluate the effectiveness of information retrieval sys-
tems when searching health content on the web. The objective is to promote
development of search engines and research adapted to health information.
This task is the further of previous CLEF eHealth Task 3 that held in 2013
and 2014, and adopts the TREC-style evaluation process, with a collection of
documents and queries shared between participants [3, 4].
In this year’s, we study queries which are different from queries of the previous
years. In fact, Queries are expressed by not medical expert people whose are con-
fronted with a symptom or a sign and try to find more about the disease they
may have [10]. For example, non experts may use queries when confronted with
signs of jaundice, like ”white part of eye turned green” to search for information
allowing them to diagnose themselves and to understand their health conditions.
These queries are usually long with ambiguous words used in place of the actual
name of the disease. Recent research has shown that these queries are used by
health consumers, and that current web search engines fail to effectively support
these queries [11].
In our participation, we used Vector Space Model that allows to compute a
degree of similarity between documents and queries; also it returns ranking doc-
uments according to their relevance. We submitted five runs, the first was the
baseline system and the others presents obtained results using query expansion
techniques based on semantic relationships between terms.


2     Material and Methods
2.1   Database
Document Collection:
 The data set is composed of a crawl of about one million documents, which have
been made available to CLEF eHealth task through the Khresmoi project [2, 8].
This collection comprises a set of web pages covering a wide range of health
subjects, addressed to both for the public and experts in the health sector.
Web pages in this corpus are mainly medical and health-related websites that
have been certified by the Health on the Net, as well as other commonly used
health and medical websites such as diagnosia, Drugbank, and Trip Answers. The
crawled documents are provided in the dataset in their raw HTML (Hyper Text
Markup Language) format along with their URL (Uniform Resource Locators).

Topics:
  The topic set contains 67 circumlocutory queries that users may pose when
faced with symptoms and signs of a medical condition [8]. These queries contain
the following fields:
 – Num: Number of query.
 – Query: longer description given by the user.
2.2     Methods
Our work was to index and search the top-1000 relevant medical documents for
each topic. We used for the first run a very traditional information retrieval (IR)
system based on the terrier platform [7]. For other runs, we used the same plat-
form with expanded queries.
This platform used by MIRACL team for the 2014 Clef Ehealth task [6] is an
efficient, effective and flexible open source search engine written in Java, easily
deployable on large-scale collections of documents. Indeed, terrier implements
state-of-the-art of indexing functionalities in first step like tokenization, remov-
ing stop words, stemmatisation and storage of information with special structure
called inverted file. In second step, it implements retrieval functionalities such
as information retrieval models (Boolean, Tf-Idf, BM25).
It is an open source, and a comprehensive and transparent platform for research
and experimentation in text retrieval. It is developed at the School of Computing
Science, University of Glasgow.


3     Retrieval Approaches
This section presents the different models developed for evaluation:
    * Mantadory baseline run.
    * Baseline runs with automatic query expansion with different manners of
      adding related terms to the original query.

      3.1   Mandatory baseline run
      This model has been designed to be the simplest model to the task. It is
      based on Vector Space Model (VSM) developed by Salton [9] that uses a
      vector representation for documents as well as queries, and allows computing
      a degree of similarity between documents and queries and returning ranking
      documents according to their relevance.
      The most widely used numeric similarity measures to calculate the relevance
      of a document is the cosine of the angle between the vector of the query and
      the document vector.


      3.2   Baseline runs with automatic query expansion
      In this section, we present submitted runs based on query expansion and
      using the same model (VSM) employed to mandatory submitted run.
      The idea is to use the blind relevance feedback technique; called also pseudo
      relevance feedback; to automatically expand the original query without any
      user interaction [1].
      In this method based on local analysis , the top k documents returned by
     the baseline system are assumed as pertinent (relevant). Then we calculate
     semantic relationships between terms of these selected documents as refer-
     ring to the statistical least square method (LSM) [5].
     Indeed, we calculate for each term (ti ) its relation with other terms in a
     linear way (see equation 1).

            ti ≈ α1 t1 + α2 t2 + · · · + αi−1 ti−1 + αi+1 ti+1 + · · · + αn tn +    (1)
     Where α are the real values of the model and present degrees of relationships
     between terms and  represents the minimum associated error of the relation.
     In fact, we added to the original query only terms which have α that are
     above a certain threshold. We have submitted four runs that differ by the
     number of chosen documents (k) and by the fixed threshold.


4     Results

For this first year, the Share/CLEF eHealth 2015 TASK 2 built result pools
from receiving submissions for all participants considering the top 10 documents
ranked by every baseline system (run 1), and the two highest priority runs (run
2 and 3). This was because, unlike in last year, all submissions greatly differed
between each other, and thus lead to very large (and different) pools.
Accordingly, principal measures proposed by the Share/CLEF eHealth 2015
TASK 2 to evaluate systems are:

    - precision at 10 (P @10).
    - Normalised Discounted Cumulative Gain at rank 10 (ndcg − cut − 10).

Metrics such as MAP (Mean Average Precision) may not be reliable for evalua-
tion due to the limited pool depth.


                    Table 1. Obtained results for submitted runs

                           Run P@10 ndgc@10 k     α
                            1 0.3212 0.2787
                            2 0.2424 0.1965 250 > 0.0
                            3 0.2515 0.1833 100 > 0.0
                            4 0.1894 0.1572 100 > 0.05
                            5 0.2939 0.2465 100 > 0.3


    In table 1, we show obtained results of the five submitted runs.
In the baseline system (run 1), we have obtained 0.32 of P@10 and 0.27 of
ndgc@10. These values decrease slightly in the four other runs that uses a pseudo
relevance feedback technique to expand the original query.
In fact, in the second run, we proposed that the top 250 documents returned by
the baseline system as relevant. Then we calculated relations between terms that
appear in these selected documents using the statistical method: least square [5].
At last, we added only terms which have positive values of α with at least one
term in the query.
Although this method has given results that are slightly lower compared to re-
sults of the first run, it allows to improve results of some queries. Take as an
example the topic (query) 9:
< top >
< num > clef 2015.test.9 < /num >
< query >red itchy eyes < /query >
< /top >
we took the top 250 documents returned by the baseline system as relevant. This
set form a list of 8685 terms. Then we calculated relations between terms in the
query with this set of terms and we expanded the original query by adding terms
which have positive values of α. we have obtained this new query:

< top >
< num > clef 2015.test.9 < /num >
< query >
red itchy eye see conjunct allerg nose drop gei zilink zaditor eyelid keratocon-
junct lastacaft visin allergi relief antihistamin dose spray allergen tablet pollen
claritin hydrochlorid olopatadin levocabastin bottl naphazolin liquid benzalko-
nium rxnorm formtitl inflamm flaki stye eyeexamin bedinghaus hive ige swollen
pink bruis bump tear eyelash hay springtim slit dermat prescript rash sen-
sit pharmaceut cream zyrtec blister section ketotifen western amerisourc fcab
scream dismiss mart wal fbab mite flare urticaria ahref erysipela vital health
zon tnss ectoin bhcsite mast overreact diphenhydramin azelastin cetirizin seb-
orrh thrush hrcolor fluticason iannelli eczemadermat chkd prd beauti skindis-
eas outgrow respivert msfphover idiopath ledum vitamin tagid zpack eyebright
emolli walgreen zyr butterbur hcl botrulelrulerrul allergicchild injector paradis
euphrasia pheniramin nit mrhd enclos kmart ayala disk dolgencorp dryl person-
alhealthzon dbdc spoon teeter allina morri hydrochloridetablet bactrban
< /query >
< /top >

Where the three first terms present the original query and the other terms present
the added terms in their root form.
We notice that almost all added terms are in strong relationship with the terms
of the initial query. For example terms like conjunct, allerg, eyelid, inflamm,
pink, keratoconjunct, dermat present the conjunctivitis disease also known as
pink eye. It is an inflammation of the conjunctiva (the outer layer of the eye and
the inner surface of the eyelids) due to an allergic reaction.
Terms like olopatadin, eyebright, zaditor, azelastin, levocabastin present some
medication names for this disease. Some other terms present forums, sites and
library of medicine such as :

  - rxnorm: National Library of Medicine.
  - bhcsite: Better Health Channel which provides health and medical informa-
    tion.
  - mast: Medical Academy for Science and Technology

This expanded query achieves the best performance on p@10 measure in this
run (p@10 = 0.5) compared to the first run (p@10 = 0.1), see figures (1 and 2).
This same process is used in (run 3) but relations are calculated just for the top
100 documents.

    In (runs 4,5), we calculated relations for the top 100 documents and we added
only terms that have α above a certain threshold with all terms in the original
query. In these latter runs, we tried to keep the context of the initial query to
not have a query drift problem.
We notice that measures of P@10 et ndgc@10 in (run 5), that expand query with
strong related terms (α > 0.3), are increased compared to runs 2 and 3.
Some queries in this run have been improved compared to the baseline system
based on p@10 measure. For example query number 5:

< top >
< num > clef 2015.test.5 < /num >
< query >whistling noise and cough during sleeping + children< /query >
< /top >

In the first run, we obtained −0.1 of p@10 by against this value becomes positive
0.1 in (run 5) due to the new added terms to the initial query which are in strong
relationship with initial query terms. Indeed, the new request is the following:

< top >
< num > clef 2015.test.5 < /num >
< query >
whistl nois cough sleep children ear wheez maud depress sound plug scream
vitalhealthzon deafness research allergen chkd decibel otiti himend istictac per-
tussi toclevel maincat audiogram mcug wim opic ufa
< /query >
< /top >

   The plots below in figures (1,2,3,4,5) compares our runs against the median
and best performance (p@10) across all systems submitted to Ehealth task for
each query.
In particular, for each query, the height of a bar represents the gain/loss of
submitted system and the best system (for that query) over the median system.
The height of a bar in then given by:

             greybars : height(q) = ourp @10(q) − medianp @10(q)

            whitebars : height(q) = bestp @10(q) − medianp @10(q)
Fig. 1. Comparison between run 1 and the other systems against the median and best
performance


Fig. 2. Comparison between run 2 and the other systems against the median and best
performance
Fig. 3. Comparison between run 3 and the other systems against the median and best
performance


Fig. 4. Comparison between run 4 and the other systems against the median and best
performance
Fig. 5. Comparison between run 5 and the other systems against the median and best
performance


We can see that (run 1) has one query that achieve the best performance, but
in runs(2,3) we have ameliorated this number of achieved queries of best perfor-
mance, we have obtained at respectively 8 and 3 (like query number 9 in run 2).
In (run 5), we have obtained just one query that achieved this best, although it
achieves 21 queries perform better than the median line, while 12 queries were
worse than the median.
In comparison with (run 2) which have 16 queries perform better than the me-
dian line and 28 queries were worse than the median, we can conclude that
proposed method to expand the original query taking into account the context
of this latter car improve results for some queries but we need to improve general
result of system.


5   Conclusion and future works

In our second participation in User-Centred Health Information Retrieval Task
at the CLEF eHealth 2015 in order to evaluate our proposed expansion query
method based on semantic relationships between terms defined with a new sta-
tistical method, we obtained motivating results when searching in a large col-
lection of medical documents to answer circumlocutory queries that users may
pose when faced with symptoms and signs of a medical condition.
For future work, we will try to improve these obtained results by modifying pa-
rameters of k and α and by using reduction dimensionality techniques to cope
with problems of sparse and large dimension matrix.


References
 1. C. Carpineto and G. Romano. A survey of automatic query expansion in informa-
    tion retrieval. ACM Computing Surveys (CSUR), 44(1):1, 2012.
 2. L. Goeuriot, A. Hanbury, G. J. Jones, L. Kelly, S. Kriewel, I. Martinez Rodriguez,
    H. Muller, and M. Tinte. Supporting collaborative improvement of resources in
    the khresmoi health information system. Springer, 2012.
 3. L. Goeuriot, G. J. Jones, L. Kelly, J. Leveling, A. Hanbury, H. Müller, S. Salantera,
    H. Suominen, and G. Zuccon. Share/clef ehealth evaluation lab 2013, task 3:
    Information retrieval to address patients’ questions when reading clinical reports.
    In CLEF 2013 Online Working Notes, volume 8138. CEUR-WS, 2013.
 4. L. Goeuriot, L. Kelly, W. Li, J. Palotti, P. Pecina, G. Zuccon, A. Hanbury, G. J.
    Jones, and H. Mueller. Share/clef ehealth evaluation lab 2014, task 3: User-centred
    health information retrieval. In CLEF 2014 Online Working Notes. CEUR-WS,
    2014.
 5. N. Ksentini, M. Tmar, and F. Gargouri. Detection of semantic relationships be-
    tween terms with a new statistical method. In Proceedings of the 10th International
    Conference on Web Information Systems and Technologies, pages 340–343, 2014.
 6. N. Ksentini, M. Tmar, and F. Gargouri. Miracl at clef 2014: ehealth information
    retrieval task. Proceedings of the ShARe/CLEF eHealth Evaluation Lab, 2014.
 7. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier:
    A high performance and scalable information retrieval platform. In Proceedings of
    the OSIR Workshop, pages 18–25. Citeseer, 2006.
 8. J. Palotti, G. Zuccon, L. Goeuriot, L. Kelly, A. Hanbury, G. J. Jones, M. Lupu,
    and P. Pecina. Clef ehealth evaluation lab 2015, task 2: Retrieving information
    about medical symptoms. In CLEF 2015 Online Working Notes. CEUR-WS, 2015.
 9. G. Salton, A. Wong, and C.-S. Yang. A vector space model for automatic indexing.
    Communications of the ACM, 18(11):613–620, 1975.
10. I. Stanton, S. Ieong, and N. Mishra. Circumlocution in diagnostic medical queries.
    In Proceedings of the 37th international ACM SIGIR conference on Research &
    development in information retrieval, pages 133–142. ACM, 2014.
11. G. Zuccon, B. Koopman, and J. Palotti. Diagnose this if you can: On the effective-
    ness of search engines in finding medical self-diagnosis information. In Advances
    in Information Retrieval (ECIR 2015), pages 562–567. Springer, 2015.