<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Miracl at Clef 2018 : Consumer Health Search Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Siwar ZAYANI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nesrine KSENTINI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed TMAR</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faiez GARGOURI zayani.siouar@gmail.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ksentini.nesrine@ieee.org</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mohamed.tmar@isimsf.rnu.tn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>faiez.gargouri@isims.usf.tn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MIRACL Laboratory, City ons Sfax, University of Sfax</institution>
          ,
          <addr-line>B.P.3023 Sfax</addr-line>
          <country country="TN">TUNISIA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents our participation in Consumer Health Search Task at the CLEFeHealth 2018 which is a continuation of the previous CLEF eHealth information retrieval (IR) tasks that ran in the period between 2013 and 2017. This task focuses on improving access to medical information on the web. We have submitted four runs; two baseline runs with di erent weighting models and using no additional information or external resources. The two other runs present obtained results of our proposed approach which use the MeSH ontology, to perform query expansion with di erent ways by scope notes and by related terms.</p>
      </abstract>
      <kwd-group>
        <kwd>eHealth information retrieval pansion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Nowadays, medical information on the web grow with an etonnant and
noticeable way. Indeed, medical content is becoming easily available electronically in a
variety of forms ranging from patient records, scienti c publications and
healthrelated websites to medical-related topics.</p>
      <p>Following the medical information overloaded today, it is increasingly di cult
to retrieve and digest valid and relevant information to make health-centered
decisions. In fact, clinicians and policy-makers need to easily retrieve, and make
sense of medical content to support their decision making.</p>
      <p>
        Information retrieval (IR) systems have been commonly used as a means to
access health information available online in order to meet user's needs.
However, the reliability and the quality of returned results varies greatly between
the di rent information retrieval systems. Some systems tries to nd high recall
or coverage, that is, nding all relevant information for a user query, some others
seek to obtain a high precision. Furthermore, web users in the health domain
also experience di culties in expressing their information needs as queries.
CLEF (Cross-Language Evaluation Forum) eHealth aims to bring together
researchers working on related information access topics and provide them with
datasets to work with and validate the outcomes. This, the sixth year of the
evaluation lab, o ers the following three tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
- Task 1: Multilingual Information Extraction
- Task 2: Technologically Assisted Reviews in Empirical Medicine
- Task 3: Patient-centred information retrieval
The goal of the CLEF eHealth Evaluation Lab is to evaluate systems that
support patients in searching for and understanding their health information.
In our case, our team MIRACL participated in the task 3 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in order to evaluate
our own retrieval system.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Main objectives of experiments</title>
      <p>Retrieving for health data and advice is an important task performed by
individuals on the web. Thus, most of search engine users in recent years have
conducted a web search for information about a speci c disease or health
problem.</p>
      <p>The growing importance of health IR has provided the motivation for a number
of evaluation campaigns focusing on health information. For example, the TREC
(Text REtrieval Conference) 1 and the CLEF 2 which are present an
international campaigns for assessment in a competitive context in order to evaluate
several research systems of the various participants.</p>
      <p>
        Our goal to participate this year to task 3 ("Patient-centred information
retrieval") is like in the past years, to evaluate the e ectiveness of our proposed
information retrieval system to search health content on the web [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Retrieval approaches used</title>
      <p>This section presents the di erent search approaches developed for evaluation.
We have submitted 4 runs:
* Two baseline runs
* Two runs with automatic query expansion using MeSH ontology with di
erent ways.
3.1</p>
      <sec id="sec-3-1">
        <title>Baseline runs</title>
        <p>
          For comparison, we created our own baseline experiments by implementing two
information retrieval baselines with T F:IDF and Okapi BM 25 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] methods.
1 https://trec.nist.gov/
2 http://www.clef-initiative.eu/
T F:IDF is a weighting method often used in information retrieval and
especially in text mining. This statistical measure makes it possible to evaluate the
importance of a term contained in a document. The weight increases
proportionally to the number of occurrences of the term in the document (T F ).
IDF represent the inverse document frequency is a measure of the importance
of the term throughout the collection. In the TF-IDF scheme, it aims to give
greater weight to the less frequent terms considered more discriminating. It
consists in calculating the logarithm (in base 10) of the inverse of the proportion of
documents of the collection which contain the term (see equation 1).
idftermi =
        </p>
        <p>logjDj
jdj : termi 2 dj j
(1)
Where jDj : total number of documents in the collection.
jdj : termidjj: number of documents where the termi appears.</p>
        <p>OkapiBM 25 is a weighting method used in information retrieval. It is an
application of the probabilistic model of relevance. The method is more simply called
BM 25, the term "Okapi" referring to the name of the research system of the
University of London where it was initially implemented.</p>
        <p>With each method, we calculate similarity (scores) between the user's query and
documents in the collection and ranked the documents according to their scores
in descending order. The top 1000 documents with the highest scores were
returned as relevant documents for each query.</p>
        <p>Each ligne in the result les (runs) contains the following elds :
qid Q0 docno rank score tag
where:
qid: is the query number
Q0: is the literal Q0
docno: is the id of a document returned by our retrieval system for qid
rank: is the rank of this response for this qid
score: is a system-generated indication of the quality of the response
tag: is the identi er for our system for example M IRACL
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Runs with automatic query expansion</title>
        <p>In this sub-section, we describe our proposed retrieval approach used in
submitted runs and based on query expansion. The idea is to use an external resource to
ameliorate user's query and to automatically expand the original query without
any user interaction.</p>
        <p>As the provided collection of documents is medical, we proceed to use a domain
ontology; the MeSH ontology.</p>
        <p>
          MeSH (Medical Subject Headings) is a controlled vocabulary, produced and
maintained by the U. S. National Library of Medicine [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. There are currently
over 26 million descriptors or Main Headings and almost 180,000 alternative
expressions (ENTRY TERMS)[7].
        </p>
        <p>Another de nition of MeSH provided by [8], MeSH thesaurus is a controlled
Vocabulary used for indexing, cataloging, identifying and searching biomedical
database. MeSH thesaurus contains approximately 26 million terms and is
updated time-to-time to re ect changes in medical terminology.</p>
        <p>MeSH has a hierarchical structure with a set of terms and descriptors [8];
naming that allows various levels of searching. It will allow retrieving the document
where the same concept is explained with di erent terminology.
The Hierarchy and the MeSH Structure are illustrated by the gure 1.
Each MeSH record consists of one or more Concepts, and each Concept consists
in one or more synonymous terms and Scope Note (i.e., a text description of the
term).</p>
        <p>Each of the subordinate concepts also will have a preferred term, as well as a
labeled (e.g. narrower) relationship to the preferred concept. Terms meaning the
same will be grouped in the same concept.</p>
        <p>In gures (2,3), we illustrate two real examples of descriptors records. This
Descriptors record consists of two Concepts and ve terms. Each record has a
Preferred Concept and each Concept has a Preferred Term, which is also said to
be the name of the Concept.</p>
        <p>In our case, we focus to use MeSH ontology to expand automatically user's
queries, with two di erent methods:</p>
      </sec>
      <sec id="sec-3-3">
        <title>Query expansion method based on scope notes: This method is based on</title>
        <p>concepts extracted from the MeSH ontology.</p>
        <p>Indeed, for each term in the initial query, if it's a MeSH concept, we add its
scope notes which represent the medical de nition of this term (by adding only
the key terms).</p>
      </sec>
      <sec id="sec-3-4">
        <title>Query expansion method based on related terms: This method is based</title>
        <p>on selecting terms semantically related to all terms in the initial query.
We have 2 cases when adding related terms:
- If a term in the initial query is a MeSH concept, then we add the list of
synonymous terms related to this concept.
- If a term in the original query is a MeSH term, then we add their parent
concepts.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Resources employed</title>
      <sec id="sec-4-1">
        <title>Datasets</title>
        <p>The document collection used in CLEFeHealth 2018 is composed of web pages
acquired from the CommonCrawl.</p>
        <p>
          An initial list of websites was identi ed for acquisition. The list was built by
submitting the CLEF 2018 queries to the Microsoft Bing Apis (through the Azure
Cognitive Services) repeatedly over a period of a few weeks by the CLEF team,
and acquiring the URLs of the retrieved results. The list was further increased
by including a number of known reliable health websites and other known
unreliable health websites [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ].
        </p>
        <p>The structure of the provided collection is as follows: the corpus is divided into
folder by domain name. Each folder contains les which each one corresponds
to a webpage from the domain. The document id for each webpage that is used
in the collection (e.g. for the qrels) is the lename.</p>
        <p>The full collection, named clefehealth2018, occupies about 480GB of space,
uncompressed. With this gigantic base, the CLEF team made available to the
participants a prepared index with di erent resources (ElasticSearch index,
Indri index, and Terrier index).</p>
        <p>
          Since we use the terrier platform to implement our information retrieval [11],[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ],
we exploit the provided terrier index to retrieve pertinent documents which have
as size around 42GB compressed. This platform developed at the School of
Computing Science at University of Glasgow, is e cient, e ective and exible open
source search engine written in Java, easily deployable on large-scale collections
of documents.
        </p>
        <p>Indeed, terrier implements state-of-the-art of indexing functionalities in rst step
like tokenization, removing stop words, stemmatisation and storage of
information with special structure called inverted le. In second step, it implements
retrieval functionalities such as information retrieval models (Boolean, Tf-Idf,
BM25) [11]. It is an open source, and a comprehensive and transparent platform
for research and experimentation in text retrieval.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Queries</title>
        <p>The query set for CLEF eHealth2018 consists of 50 queries issued by the general
public to the HON and TRIP search services [12].</p>
        <p>Queries are formatted one per line in the tab-separated query le, with the rst
string present the query id, and the second string present the query text (see
gure 4). In Ad-hoc Search task which we have participated, we should use only
the &lt; en &gt; &lt; =en &gt; part of the query le.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future works</title>
      <p>In our third participation at the CLEFeHealth competition, we try to evaluate
our new proposed query expansion method based this time on an external
resource; the MeSH ontology.</p>
      <p>The aim of our participation is to test our research system with a new large
collection of medical documents and to obtain competitive results with other
participant teams.</p>
      <p>
        For future work, we will try to combine this proposed method with our previous
query expansion methods [13{16] described in previous participation [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
7. Mata, J., Crespo, M., and Maa, M. J. LABERINTO at ImageCLEF 2011 medical
image retrieval task. Working notes of CLEF, (2011).
8. Rivas, A. R., Iglesias, E. L., and Borrajo, L. Study of query expansion techniques
and their application in the biomedical information retrieval. The Scienti c World
Journal, (2014).
9. https : ==www:nlm:nih:gov=pubs=techbull=ma00=ma00mesh:html
10. https : ==www:nlm:nih:gov=mesh=conceptstructure:html
11. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier:
A high performance and scalable information retrieval platform. In Proceedings of
the OSIR Workshop, pages 1825. Citeseer, (2006).
12. Goeuriot, L and Hanbury, A and Hegarty, B and Hodmon, J and Kelly, L and
Kriewel, S and Lupu, M and Markonis, D and Pecina, P and Schneller, P (2014)
D7.3 Meta-analysis of the second phase of empirical and user-centered evaluations.
      </p>
      <p>Public Technical Report, Khresmoi Project, August 2014.
13. Ksentini, N., Tmar, M., and Gargouri, F. Detection of Semantic Relationships
between Terms with a New Statistical Method. In WEBIST (2) (pp. 340-343). (2014).
14. Ksentini, N., Tmar, M., and Gargouri, F. Controlled automatic query expansion
based on a new method arisen in machine learning for detection of semantic
relationships between terms. In Intelligent Systems Design and Applications (ISDA),
2015 15th International Conference on (pp. 134-139). IEEE. (2015, December).
15. Ksentini, N., Tmar, M., and Gargouri, F. Towards Automatic Improvement of
Patient Queries in Health Retrieval Systems. Applied Medical Informatics, 38(2),
73-80. (2016).
16. Ksentini, N., Tmar, M., and Gargouri, F. The Impact of Term Statistical
Relationships on Rocchios Model Parameters For Pseudo Relevance Feedback. International
Journal of Computer Information Systems and Industrial Management
Applications, 8, 135-44. (2016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Suominen</surname>
          </string-name>
          , H and
          <string-name>
            <surname>Kelly</surname>
          </string-name>
          , L and
          <string-name>
            <surname>Goeuriot</surname>
          </string-name>
          , L and
          <string-name>
            <surname>Kanoulas</surname>
            , E and Azzopardi, L and Spijker,
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          and
          <string-name>
            <surname>Nvol</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ramadier</surname>
          </string-name>
          , L and
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          and Zuccon, G and Palotti,
          <source>J and Jimmy. Overview of the CLEF eHealth Evaluation Lab</source>
          <year>2018</year>
          .
          <source>CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Jimmy and Zuccon, G and Palotti,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Overview of the CLEF 2018 Consumer Health Search Task</article-title>
          .
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ksentini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tmar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gargouri</surname>
          </string-name>
          , F.
          <source>Miracl at Clef</source>
          <year>2014</year>
          :
          <article-title>eHealth Information Retrieval Task</article-title>
          .
          <source>In CLEF (Working Notes)</source>
          (pp.
          <fpage>203</fpage>
          -
          <lpage>209</lpage>
          ). (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ksentini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tmar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boughanem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gargouri</surname>
          </string-name>
          , F. Miracl at Clef 2015:
          <article-title>UserCentred Health Information Retrieval Task</article-title>
          .
          <source>In CLEF (Working Notes)</source>
          .(
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Sprck</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Simple, proven approaches to text retrieval (No. UCAM-CL-</article-title>
          TR-
          <volume>356</volume>
          ). University of Cambridge, Computer Laboratory. (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Barnett</surname>
            ,
            <given-names>G. O.</given-names>
          </string-name>
          <article-title>Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches</article-title>
          .
          <source>Jama</source>
          ,
          <volume>271</volume>
          (
          <issue>14</issue>
          ),
          <fpage>1103</fpage>
          -
          <lpage>1108</lpage>
          . (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>