<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ECNU at 2017 eHealth Task 2: Technologically Assisted Reviews in Empirical Medicine</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiayi Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Su Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Song</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongyu Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yueyao Wang</string-name>
          <email>yywangg@ica.stc.sh.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qinmin Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang He</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yan Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science &amp; Technology, East China Normal University</institution>
          ,
          <addr-line>Shanghai, 200062</addr-line>
          ,
          <institution>China Shanghai Engineering Research Center of Intelligent Service Robot</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The 2017 CLEF eHeath Task2 requires to rank the retrieval results given by medical database. The purpose is to reduce e orts that experts devote to nding indeed relevant documents. We utilize a customized Learning-to-Rank model to re-rank the retrieval result. Additionally, we adopt word2vec to represent queries and documents and compute the relevant score by cosine distance. We nd that the combination of the two methods achieves a better performance.</p>
      </abstract>
      <kwd-group>
        <kwd>Learning to Rank</kwd>
        <kwd>Word2vec</kwd>
        <kwd>Health Information Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>
        In this task, we rst customize a Learning-to-Rank (L2R) model[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Furthermore,
we apply word2vec to represent queries and documents.
2.1
      </p>
      <p>
        Learning-to-Rank Model
The Learning-to-Rank model has shown good performance [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]. The
architecture of L2R model is shown in Fig.1:
There are three stages in the L2R model: Query Expansion, Feature Extraction,
Model Training. In the L2R model, we combine each document and each query
into a query-document pair. The L2R model gives a relevance score for each
query-document pair.
      </p>
      <p>
        Query Expansion: In the query expansion stage, we intend to improve retrieval
precision by expanding queries. We apply the similar model proposed in the 2014
TREC Microblog track[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], 2015 TREC Clinical Decision Support track[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
2015 CLEF eHealth Task 2[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>{ Query is submitted to Google and the top-10 concurrent web titles and
snippets(if exist) is crawled.
{ The MeSH database is applied to extract medical terms from titles and
snippets.</p>
      <p>Feature Extraction: In this stage,we need to extract features of each
querydocument pair. When a document is retrieved under a query, it is attached with
a weighting score and a rank. So we use the weighting score and the rank from
the rst retrieval round as features. To take advantages of di erent retrieval
models, we adopt BM25[6], PL2[7] and BB2[8] models to obtain the scores and
ranks of the query-document pair. Hence the dimension of the feature vector is
six.</p>
      <p>Model Training: The L2R model judges the relevance of a query-document
pair by using the random forest classi er. We choose the topics and documents
of the 2013 and 2014 tasks as the training data. The aforementioned feature
vectors are applied to represent query-document pairs in this stage.
2.2</p>
      <p>Word2vec Model
Assuming a document of n words is D = {d1; d2; :::; dn} , we can represent each
word di in D as a vector di. Hence the vector of the whole document vector D
can be calculated by the average of vectors di:</p>
      <p>D =
n1 ∑ di:</p>
      <p>1 i n
Similarly, a query q could also be represented as a vector q. We can compute
the similarity between query q and document D. In this task, we use the cosine
distance to compute the similarity between document D and query q:
sim(D; q) = cos(D; q) =</p>
      <p>D · q
∥D∥ · ∥q∥
:
After similarities between the query and documents listed are computed, we can
rank these documents in a descend order.
2.3</p>
      <p>Combination
We use SL(D; q) to denote the score of document D with query q from L2R
model, and SW (D; q) to that from Word Vector model. is the weight of
SL(D; q) and is the weight of SW (D; q). The nal score is computed as below:
S(D; q) = SL(D; q) + SW (D; q);
+</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>Dataset
We are provided with development set and test set. In development set there are
twenty topics while in test set there are thirty topics. Each topic le contains
four parts:
{ Topic-id
{ The title of review written by experts
{ The boolean query manually constructed by experts
{ The set of PubMED Document Identi ers (PID's) returned by MEDLINE.
Since the query of a topic is a boolean query, we remove three words near the
negation word "not" to avoid misleading the intension of the query .
3.2</p>
      <p>Runs
We submit three runs whose descriptions are followed below:
run-1: result of the Word Vector model. We use the pre-trained word vectors
from Stanford University trained by GloVe model[9]. The size of vocabulary is
2.2M and the dimension of each vector is 300. The vector of the word that does
not occur in the pre-trained word vectors is 0.
run-2: result of L2R model.We adopt terrier-4.0.0 to run BM25, BB2 and PL2
model. We select top-1000 PIDs for each topic.
run-3: result of the combination of L2R model and Word Vector model. The
parameters are tuned on training set. Finally we choose = 0:8 and = 0:2
in equation (4). However, a PID of a topic may not occur in the result of L2R
model. In this case, = 0 and = 1. Similar to run-2, we choose top-1000 PIDs
for each topic.</p>
      <p>The evaluation results of three runs are shown in Table.1. These results are
provided by the organizer.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In the 2017 CLEF eHealth Task 2, we ECNU ICA team take advantages of
the Learning-to-Rank model. We also adopt word2vec to represent queries and
documents and compute their similarities by cosine distance. Although the
combination of two methods performs well, the performance of our word2vec model
needs to be improved. In the future, we will apply better methods which can
avoid losing semantic information.</p>
      <p>Runs
run-1
run-2
run-3
Runs
run-1
run-2
run-3
Runs
run-1
run-2
run-3
Runs
run-1
run-2
run-3</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This research is funded by the National Nature Science Foundation of China (No.
61602179) and the Science and Technology Commission of Shanghai Municipality
(No.15PJ1401700).</p>
      <p>This work was supported by Xiaoi Research, by Shanghai Municipal Commission
of Economy and Information Under Grant Project No.201602024.
6. Stephen E., Robertson, S.W., Susan J., Micheline H.B., Mike G.: Okapi at
TREC3. Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg,
USA
7. Amati, Gianni, Cornelis Joost, and Van Rijsbergen.: Probabilistic models for
information retrieval based on divergence from randomness. (2003).
8. Amati, G., Cornelis J., Van R.: Probabilistic models for information retrieval based
on divergence from randomness. (2003).
9. Je rey P.,Richard S.,Christopher D.:2014. GloVe: Global Vectors for Word
Representation. Empirical Methods in Natural Language Processing (EMNLP) (2014).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Spijker</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF technologically assisted reviews in empirical medicine</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2017</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          ., CEUR Workshop Proceedings. CEUR-WS.org,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neveol</surname>
          </string-name>
          , G. Zuccon, and
          <string-name>
            <given-names>J. R. M.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF ehealth evaluation lab 2017</article-title>
          .
          <article-title>In Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 8th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11- 14</source>
          ,
          <year>2017</year>
          , Proceedings, Lecture Notes in Computer Science. Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Q.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haacke</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>ECNU at 2015 eHealth Task 2: User-centred Health Information Retrieval</article-title>
          .
          <source>Proceedings of the ShARe/CLEF eHealth Evaluation Lab</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Q.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>Y.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :ECNU at TREC 2014:
          <article-title>Microblog Track</article-title>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Q.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>ECNU at 2015 CDS Track: Two Re-ranking Methods in Medical Information Retrieval</article-title>
          .
          <source>Proceedings of the 2015 Text Retrieval Conference</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>