<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SiS at CLEF 2017 eHealth TAR Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vassil Kalphov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Georgiadis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leif Azzopardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>vassil.kalphov.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@uni.strath.ac.uk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>georgios.georgiadis.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@uni.strath.ac.uk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leif.Azzopardi@strath.ac.uk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Strathclyde</institution>
          ,
          <addr-line>Glasgow</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents Strathclyde iSchool's (SiS) participation in the Technological Assisted Reviews in Empirical Medicine Task. For the ranking task, we explored two ways in which assistance to reviewers could be provided during the assessment process: (i) topic models, where we use Latent Dirichlet Allocation to identify topics within the set of retrieved documents, ranking documents by the topic most likely to be relevant and (ii) relevance feedback, where we use Rocchio's algorithm to update the query model for subsequent rounds of interaction. A third approach combines the topic and relevance feedback to quickly identify the relevant abstracts. For the thresholding task, we apply a score threshold, and exclude documents which did not exceed the threshold given BM25.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        CLEF 2017 introduced a new eHealth retrieval problem - that of providing
technological assistance to reviewers of systematic reviews - where the goals of the
task were to explore how Information Retrieval techniques could be used to: (i)
identify relevant material more quickly in the ranking challenge and (ii) identify
when reviewers could stop processing documents in the thresholding challenge [
        <xref ref-type="bibr" rid="ref2 ref3">3,
2</xref>
        ]. During the review process, reviewers will routinely examine hundreds to
thousands of abstracts to decided if the document (and evidence it contains) could
be included in the systematic review that they are conducting [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Once they
have identi ed a subset of abstracts, which are potentially relevant, they
examine the document's contents to decide whether the document should be included
or excluded. The track focused on the rst part, identifying potential relevant
documents during the, so called, screening phase.
      </p>
      <p>In this work we considered two di erent approaches - one which uses topic
modelling and the other which uses relevance feedback. In selecting these
approaches we thought that such techniques could be used in the following way.
For topic modelling, we envisaged that the download abstracts could be
semantically clustered - and the di erent clusters could be presented to the reviewer
the reviewer could then start the review process by selecting a cluster that they
felt was most likely to contain the relevant documents. Since we did not have
recourse to reviewers, we explored a number of di erent ways to automatically
select the best cluster. For relevance feedback, we envisaged that as the reviewer
starts to examine documents, the query could be updated to bring back the next
most relevant documents, so that they would quickly nd all the relevant
material as soon as possible. Obviously, if the aim is to reduce the workload of the
reviewers, then we need to be able to select a point where the reviewer can stop
assessing documents - however, this runs the risk of losing relevant documents.
To this end, we explore various heuristics to select the threshold such that we
minimize e ort and maximise recall (but ideally obtain total recall).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Set-up</title>
      <p>Data: Given the list of topic descriptions the PubMed IDs were extracted, and
a scripted fetched the Abstract and associated Metadata from the PubMed API.
From the topics, we extracted the title for each topic and use that as the query.
Indexing and Retrieval System: We used Lucene 6.2 to create a separate
index for each of the topic (where stop words were removed, no stemming was
applied). A Lucene Document was created where the following elds were index:
Title, Abstract, Author, and Publication Name. The baseline retrieval algorithm
we employed was elded BM25 with standard parameters settings i.e. b = 0:75,
and equal weights between elds (denoted as BM25).</p>
      <p>
        Relevance Feedback: We implemented Rocchio's Algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in Lucene
where feedback was used to provide relevance information. In each round of
feedback, 30 documents were examined, and the query model updated, to
provide a re-ranking of the subsequent documents. This was performed on the rst
10%, 20%, etc of documents associated with the topic. Here, we only report
the 30% runs (AL30) as these generally performed the best and little change in
performance was observed on the training set using more feedback.
Topic Modelling: We used MALLET toolkit, and thus Latent Dirichlet
Allocation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to semantically cluster the documents within each topic. We set the
number of latent topics to 5, and =. To rank the documents we selected one of
the latent topics zi, and ordered the documents by the probability p(zijdj). In an
attempt to select the cluster that provides the best ranking, we took the BM25
ranking from above, and use the top 100 ranked documents as pseudo-relevance
feedback. Then we ranked each latent topic (given the ordering by p(zijdj)) and
select the one with the highest overlap with BM25 (TMBM).
      </p>
      <p>Combined: Since our topic modelling approaches do not use feedback, we
decided to see whether we could start the process of relevance feedback using the
topic modelling run, and re ne the query model accordingly. Thus, we selected
the best performing method TMAL, given the training data, and used this as
starting point for the active learning. Again we considered obtained feedback for
the rst 10%,20% and 30% of the documents per topic.</p>
      <p>Thresholded Runs To create thresholded runs, we took the BM25 run and
applied a simple score based threshold. Using Lucene the scores for a query
given BM25 are from 1.0 or greater, so we used thresholds 1.0, 1.5, 2.0 and
2.5. This led to a reasonable reduction in the number of documents without
sacri cing much recall.</p>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <p>Tables 1 and 3 report the performance on the Ranking Task, while Tables 2
and 4 report the performance on the Threshold Task. Our best performing run
on the Ranking Task, in terms of normalized area under the gain curve, was
AL30, which used 30% of the documents as feedback. From our results, it is
clear that the topic modelling approach, as we have employed it, has not lead
to signi cantly better improvements over the BM25 baseline. However, when
inspecting the individual topic based ranking, the most probably topic, was not
always the best performing topic. So we will direct more research into topic
selection. This is because, when formulating queries, reviewers could use topic
modelling to understand the space of documents retrieved, and then re ne their
query further - and thus make savings a priori rather than have to trawl through
hundreds and thousands of documents. Another factor that could signi cantly
improve our results is that for these initial runs we used the title as the query,
as opposed to the boolean query provided within the topics. It is quite possible
that the more complex and verbose boolean queries could lead to a better initial
ranking - and so when used in conjunction with relevance feedback the relevant
items could be found sooner. We leave these directions for further work.</p>
      <p>Run BM25 T1 T1.5 T2 T2.5
NumRels 1857 1857 1857 1857 1857
NumFeed 0 0 0 0 0
RelsFound 1857 1828 1809 1784 1758</p>
      <p>AP 0.17 0.17 0.17 0.17 0.17
MinRel 2851 2503 2333 2068 1877
WSS100 0.29 28 0.27 0.23 0.22</p>
      <p>Area 0.81 0.81 0.80 0.80 0.79
NCG10 0.45 0.45 0.45 0.45 0.45
NCG20 0.65 0.65 0.65 0.65 0.65
NCG30 0.75 0.75 0.75 0.75 0.75</p>
      <p>TotalCost 3918 3435 3165 2824 2536
TotalCostUniform 3918 3786 3865 3748 3902
TotalCostWeighted 3918 3454 3280 3117 2905
loss er 0.54 0.54 0.38 0.33 0.27
loss r 0.00 0.001 0.005 0.01 0.014
loss e 0.54 0.54 0.38 0.32 0.26</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          ,
          <issue>993</issue>
          {
          <fpage>1022</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <article-title>CLEF 2017 eHealth evaluation lab overview</article-title>
          .
          <source>In: CLEF 2017 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          . Springer (
          <year>September 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
          </string-name>
          , R.:
          <article-title>echnologically assisted reviews in empirical medicine</article-title>
          .
          <source>In: CLEF 2017 Evaluation Labs and Workshop: Online Working Notes</source>
          , CEUR-WS (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rocchio</surname>
            ,
            <given-names>J.J.:</given-names>
          </string-name>
          <article-title>Relevance feedback in information retrieval (</article-title>
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Shemilt</surname>
          </string-name>
          , Khan, Park, Thomas:
          <article-title>Use of cost-e ectiveness analysis to compare the e ciency of study identi cation methods in systematic reviews</article-title>
          .
          <source>Systematic reviews</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>