<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KDEIR at CLEF eHealth 2016: Health Documents Re-ranking Based on Query Variations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md Zia Ullah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masaki Aonoy</string-name>
          <email>aono@tut.jpy</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Toyohashi University of Technology</institution>
          ,
          <addr-line>1-1 Hibarigaoka, Tempaku-Cho, Toyohashi, 441-8580, Aichi</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe our participation in the CLEF eHealth 2016 task 3: Patient-Centred Information Retrieval focusing on the clinical web documents based on user queries in the health forum. In our participation, we submitted three runs in ad-hoc search and two runs in query variation search subtasks. In ad-hoc search, the main challenge is to retrieve high quality clinical documents based on user query. For ad-hoc search, we employ multiple features based unsupervised reranking method to the documents retrieved by a baseline system. During the query variation search, the main challenge is to generate a ranked list of documents covering the di erent variations of the query. To tackle the query variation problem, rst we formulate a query and a set of information needs from the query variation. Then, we re-rank the documents retrieved for the formulated query by focusing on the set of information needs.</p>
      </abstract>
      <kwd-group>
        <kwd>Health Informatics</kwd>
        <kwd>Query variation</kwd>
        <kwd>Re-ranking</kwd>
        <kwd>Diversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Users often seek health related information and issue their information needs in
the Web search engine. Unfortunately, the documents on the Web are mostly
lowquality and contain a lot of spam information. Therefore, laypeople are usually
unsuccessful in getting their health related answers. CLEF eHealth Evaluation
Lab [1{5] has been organizing a task for the last couple of years to help the
laypeople to obtain health related information. In 2016, task 3 includes three
subtasks such as ad-hoc search, query variation search, and multilingual search.
We participated in the ad-hoc search and query variation search. In particular,
we desire to evaluate the following research question in both of the subtasks:
1. Is the spam identi cation method su ciently reliable for the ad-hoc retrieval
task?
2. Is multiple features based ranking method enough to estimate the topical
relevance of the query and documents?
3. Is the diversity based ranking successfully handling query variation?</p>
    </sec>
    <sec id="sec-2">
      <title>Our Submitted Runs</title>
      <p>
        We submitted a total ve runs where three runs in an ad-hoc retrieval and two
runs in query variation. We make use of Clueweb12-B13 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] corpus by indexing
with the Indri search engine [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. To emphasize on high quality documents, we
lter out the spam documents from the corpus by using the Waterloo spam
score [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>Ad-hoc Search</title>
        <p>To prepare three runs in ad-hoc search, we apply some common procedures.
Given a query, rst, we tokenize the query and format it for ad-hoc retrieval
with the Indri Search engine. Second, we retrieve at most 1000 documents from
the Clueweb12-B13 corpus using a query likelihood model with the Dirichlet
smoothing model as baseline retrieval. Third, document with a spam score less
than 70 is ltered out of the retrieved documents. The three runs are described
as follows:</p>
        <p>Run 1 (KDEIM EN Run1): following the common procedures described above,
we re-rank the documents by fusing the page rank and documents' baseline
(language model) scores and take the top 200 documents.</p>
        <p>
          Run 2 (KDEIM EN Run2): following the common procedures stated above,
we extract multiple query-independent and query-dependent features including
reciprocal rank, topic cohesiveness [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], average term length, vector space
similarity [10], coordinate level matching, BM25 [11], PL2 [12], DFR [12],
KullbackLaibler [13]. We re-rank the documents using the extracted features by employing
a bipartite graph based ranking approach and take the top 200 documents.
        </p>
        <p>Run 3 (KDEIM EN Run3): following the common procedures stated above,
in comparison to the previous two runs, we tokenize the query and format it
for expert retrieval, where we consider document title, header, body, and anchor
text. Then, we re-rank the documents by combining page rank and documents'
baseline (language model) score, and take the top 200 documents.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Query variation Search</title>
        <p>To prepare two runs in query variation search, we apply some common
procedures. First, we tokenize all the six query variations and formulate a vector space
model representation of the query from the query variations. We also consider
all the query variations as information needs (sub-queries) of the users. Second,
we retrieve at most 1000 documents from the Clueweb12-B13 corpus based on
the formulated query as baseline retrieval. Third, document with a spam score
less than 70 is ltered out of the retrieved documents.</p>
        <p>Run 1 (KDEIM EN Run1): following the common procedures described above,
we re-rank the documents using page rank and documents' baseline (language
model) scores as a relevance based ranking. By considering the query variations
as sub-queries (aka, information needs), we employ an explicit diversi cation
algorithm [14] and take the top 100 documents.</p>
        <p>
          Run 2 (KDEIM EN Run2): following the common procedures stated above,
we re-rank the documents using multiple query-independent and query-dependent
features including page rank, reciprocal rank, topic cohesiveness [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], average
term length, vector space similarity [10], coordinate level matching, BM25 [11],
PL2 [12], DFR [12], Kullback-Laibler [13]. Then, we explicitly diversify the
documents based on the sub-queries and take the top-100 documents.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper, we described the participation of KDEIR at CLEF eHealth 2016
Patient-Centred Information Retrieval task, where we proposed our approaches
to ad-hoc search and query variations of clinical documents.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>This research was partially supported by the HORI FOUNDATION of JAPAN,
Grant-in-Aid C114.
10. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval.</p>
      <p>Information processing &amp; management 24(5) (1988) 513{523
11. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and
beyond. Now Publishers Inc (2009)
12. Amati, G.: Probability models for information retrieval based on divergence from
randomness. PhD thesis, University of Glasgow (2003)
13. La erty, J., Zhai, C.: Document language models, query models, and risk
minimization for information retrieval. In: Proceedings of the 24th annual international
ACM SIGIR conference on Research and development in information retrieval,
ACM (2001) 111{119
14. Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for web
search result diversi cation. In: Proceedings of the 19th international conference
on World wide web, ACM (2010) 881{890</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Budaher</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deacon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The ir task at the clef ehealth evaluation lab 2016: User-centred health information retrieval</article-title>
          . In:
          <article-title>CLEF 2016 Evaluation Labs</article-title>
          and Workshop: Online Working Notes. CEUR-WS (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the clef ehealth evaluation lab 2016</article-title>
          .
          <source>In: CLEF 2016 - 7th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Salantera,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Velupillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.W.</given-names>
            ,
            <surname>Savova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Elhadad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Pradhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>South</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.R.</given-names>
            ,
            <surname>Mowery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.L.</given-names>
            ,
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.J.</surname>
          </string-name>
          , et al.:
          <article-title>Overview of the share/clef ehealth evaluation lab 2013</article-title>
          . In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization. Springer (
          <year>2013</year>
          )
          <volume>212</volume>
          {
          <fpage>231</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of the share/clef ehealth evaluation lab 2014</article-title>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanlen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the clef ehealth evaluation lab 2015</article-title>
          .
          <article-title>In: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction. Springer (
          <year>2015</year>
          )
          <volume>429</volume>
          {
          <fpage>443</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Clueweb09 data set (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Strohman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Metzler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turtle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>Indri: A language model-based search engine for complex queries</article-title>
          .
          <source>In: Proceedings of the International Conference on Intelligent Analysis</source>
          ,
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>2005</year>
          ) 2{
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cormack</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>E cient and e ective spam ltering and re-ranking for large web datasets</article-title>
          .
          <source>Information retrieval 14(5)</source>
          (
          <year>2011</year>
          )
          <volume>441</volume>
          {
          <fpage>465</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bendersky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Quality-biased ranking of web documents</article-title>
          .
          <source>In: Proceedings of the fourth ACM international conference on Web search and data mining</source>
          ,
          <source>ACM</source>
          (
          <year>2011</year>
          )
          <volume>95</volume>
          {
          <fpage>104</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>