<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Distributed E ort Approach for Systematic Reviews. IMS Unipd at CLEF 2019 eHealth Task 2.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This is the third participation of the Information Management Systems (IMS) group at CLEF eHealth Task of Technologically Assisted Reviews in Empirical Medicine. This task focuses on the problem of medical systematic reviews, a problem which requires a recall close (if not equal) to 100%. Semi-Automated approaches are essential to support these type of searches when the amount of data exceed the limits of users, i.e. in terms of attention or patience. We present a variation of the system we presented last year; in particular, not only we set the maximum amount of documents that the physician is willing to read, but we distribute the e ort across the topics proportionally to the number of documents in the pool. We compare the results of this approach with the \frozen" system we used in 2018 and a BM25 baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper, we describe the participation of the Information Management
Systems (IMS) group at CLEF eHealth 2019 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Technology Assisted Review
Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This task focuses on the problem of systematic reviews, that is the
process of collecting articles that summarise all evidence (if possible) that has
been published regarding a certain medical topic. This task requires long search
sessions by experts in the eld of medicine; for this reason, semi-automatic
approaches are essential to support these type of searches when the amount of data
exceed the limits of users, i.e. in terms of attention or patience.
      </p>
      <p>
        The objective of our participation was to compare the system that we used
in the previous year, with a new strategy to distribute the e ort of the user (the
physician or an expert in the eld of medicine) across the topics. In particular,
{ we re-use the stopping strategy to simulate the maximum amount of
documents that a physician is willing to review in the two-dimensional approach
presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ];
{ we distribute the e ort, in terms of number of documents to read,
proportionally to the size of the pool of documents for each topic;
{ we estimate the 95% con dence interval of the proportion of relevant
documents present in the collection [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The source code of the experiments is available for reproducibility purposes.3
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        In this paper, we continue to investigate the interaction with the two dimensional
interpretation of the BM25 model applied to the problem of explicit relevance
feedback [
        <xref ref-type="bibr" rid="ref3 ref5 ref6 ref7 ref8 ref9">9, 3, 8, 5, 7, 6</xref>
        ].
      </p>
      <p>
        In particular, the two-dimensional representation of probabilities [
        <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
        ] is an
intuitive way of presenting a two-class classi cation problem on a two-dimensional
space. Given two classes, for example relvant R and non-relevant N R, a
document d is assigned to category R if the following inequality holds:
P (djN R) &lt; m P (djR) +q
| {yz } | {xz }
(1)
where P (djR) and P (djN R) are the likelihoods of the object d given the two
categories, while m and q are two parameters that can be optimized to compensate
for either the unbalanced class issues or di erent misclassi cation costs.
      </p>
      <p>We focused on the following problems:
1. study the e ectiveness of a classi er given a xed amount of documents that
a physician is willing to review;
2. design a sampling strategy to estimate the 95% con dence interval of the
number of relevant documents in the collection.</p>
      <p>
        In the experiments, we used the same procedure we used lst year [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
{ we set a number n of documents that the physician is willing to read and
a number s that tells the algorithm when (every s documents) to randomly
sample a document from the collection instead of presenting to the physician
the next most relevant document;
{ for each topic, we run an optimized (hyper-parameters) BM25 retrieval
model and we obtain the relevance feedback for the rst abstract in the
ranking list;
{ from the second document until n=2 1, we continuously update the relevance
weights of the terms according to the explicit relevance feedback given by
the physician (simulated by the qrels available with the test collection);
{ for the last half of the documents n=2 that the physician is willing to read, we
use a Nave Bayes classi er continuously updated with the explicit relevance
feedback [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3 https://github.com/gmdn/CLEF2019
      </p>
      <p>Instead of setting n equal for all topics, this year we tried a di erent approach
in order to let the user to read more documents for those topics with more
documents in the pool. In Table 1, we show, for each topic, the number of
documents in the pool, the proportion of documents of the pool compared to
the total number of documents pooled, the number of documents we will show
to the user (to be multiplied by 2).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        For all the experiments, we set the values of the BM25 hyper-parameters in the
same way we did in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>O cial Runs</title>
        <p>We submitted runs for three di erent systems:
{ a BM25 baseline with continuous active learning and a xed threshold for
each topic,
{ the \frozen" system fo 2018 with di erent proportions of documents to be
read for the initial phase but with a xed threshold for each topic,
{ the new approach with a di erent threshold for each topic.</p>
        <p>In particular, for the frozen system, we used 10% or 50% of the initial pool of
documents per topic to build the classi er. The new distributed e ort approach
uses 10% of the pool at the beginning of the training, but, in general, it may
stop earlier compared to the other approach if the e ort required for a topic is
low in terms of documents allowed.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Uno cial Runs</title>
        <p>In order to compare the BM25 model with a similar proportion of documents
shown to the user, we added some BM25 runs and removed some others that
showed a di erent number of documents.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluation Measures</title>
        <p>In order to evaluate the performance of the systems, we chose the number of
documents shown to the user as one of the performance measures since, in our
case, it is also the point where we stop retrieving documents. In addition, we use
recall and averaged recall across topics to measure the accuracy of the retrieval.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Results</title>
        <p>In Figures 1 and 2, we show a topic by topic comparison of groups of runs: BM25,
distributed e ort, orginal 2018 with 10% or 50% of the initial pool selected. By
increasing the threshold of the number of documents shown to the user, we are
able to tune the performance of the system and reach an average recall close to
100% for all the systems under evaluation. Some topics are much more di cult
than others; for example, topic CD011558 requires the retrieval of most of the
pooled documents in order to achieve a reasonable recall (around 0.8).</p>
        <p>In Figure 3, we show the performance of the four groups of runs in terms of
average recall (across topics) given the number of documents shown to the user.
By increasing the number of documents (from left to right) the four approaches
increase the average recall and go beyond 90% even with less than 4% of the
total number of documents, for example the two 2018 approaches of the frozen
system.</p>
        <p>The distributed e ort approach we proposed this year performed worse than
expected. It seems that by reducing the number of documents allowed per topic
too much, especially for topics with smaller pools, we obtain a suboptimal system
compared to the original one. In other terms, it may be more convenient to set
up a xed cost per topic and use all the documents of the pool if necessary,
instead of saving some resources for topics with more documents in the pool.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>
        In this work, we presented a variation of the continuous active learning approach
used in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that uses a xed stopping strategy to simulate the maximum amount
of documents that a physician is willing to review and a sampling strategy that is
used to estimate the number of relevant documents in the collection. The result
of the distributed e ort approach were worse than expected, compared to the
original approach in presented in 2018. The performance of the new system is
still remarkable since it achieves an average recall of 90% by using only 10% of
the documents in the collection; however, the original system can achieve the
same results by reducing the number of documents shown to the user by half.
      </p>
      <p>We are currently analyzing the results provided by the organizers and adding
to the o cial runs a set of uno cial runs that will complete the picture of all
the possible settings. As future work, we will study a methodology to
dynamically vary the amount of documents according to the estimate of the amount of
relevant documents still missing.
1.00
0.75
9 2 1 1 4 8 6 8 49 4090009D0C10D0C10D0C1D0CDC
6 6 4 46 6078008D0C0D0CDC
7 7 0 4 6964203823905518070511D10C11D0C11D0C1D0C1D0C11D0C1D0C1D0CDC
3 40558571168617687871977206120012D0C14D03C12D30C12D0C1D0C1D0CDC
9 801622 3 424525 12 61 6012D01
5 5 5 2
7 6 6 2
1 69768
0901004D0C0D0CDC
CD0C0D0CDC</p>
      <p>topic
(a) Topic by topic BM25 results
run
2019_baseline_bm25_t200
2019_baseline_bm25_t400
2019_baseline_bm25_t600
2019_baseline_bm25_t800
2019_baseline_bm25_t1000
2019_baseline_bm25_t2000
996126014401046D406087D806078D807049D004049D006099D604120D003180D203190D505180D705131D104101D5051t8o1D50p7C11i1Dc608C161D706C181D708C171D907C172D006C192D008C102D106C142D203C132D304C122D405C12D50C12D0C12D0C1D0
5 5 5 7 6 6 2
1 6 6 2 1
1 69768
0000D0 D (b) Topic by topic distributed e ort results
CD0CDC C C C C C C C C C C C C C C</p>
      <p>Fig. 1: Results for BM25 and distributed e ort runs
1.00
0.75
96261441046406087D8060C78D8070C49D0040C49D0060C99D60(41C20aD003)1C80D20T31C90D5o051C8p0D705i1C3c1D1041C0b1D505y1Ct8o1D50p7t1C1i1Doc6081pC61D7i061Cc81D708o1C71D9r071Ci72Dg0061Ci92Dn0081Ca02D1l061C42D22031C032D31041C228D4051C52pD5051C12D5001C72Dr061CeD6s0u2lts
1 6 6 2 1
1 69768
9 1 0 0 DC
0 0 0 DC
0 0 DC
CD0CDC
run 2018_stem_original_t100
96261441046406087D8060C78D8070C49D0040C49D0060C99D6041C20D0031C80D2031C90D5051C80D7051C31D1041C01D5051C81D5071C11D6081C61D7061C81D7081C71D9071C72D0061C92D0081C02D1061C42D2031C32D3041C22D4051C52D5051C2D501C72D061CD60 2
1 6 6 2 1
1 69768
9 1 0 0 DC
0 0 0 DC
0 0 DC
CD0CDC</p>
      <p>topic
(b) Topic by topic original 2018 p50</p>
      <p>Fig. 2: Results for original 2018 p10 and p50 runs</p>
      <p>30000
documents shown (feedback)
40000
50000</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leif</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , and Rene Spijker, editors.
          <article-title>CLEF 2019 Technology Assisted Reviews in Empirical Medicine Overview</article-title>
          .
          <article-title>CLEF 2019 Evaluation Labs</article-title>
          and Workshop: Online Working Notes.,
          <source>CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Liadh</given-names>
            <surname>Kelly</surname>
          </string-name>
          , Hanna Suominen, Lorraine Goeuriot, Mariana Neves, Evangelos Kanoulas,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leif</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , Rene Spijker, Guido Zuccon, Jimmy, and Joao Palotti, editors.
          <source>Overview of the CLEF eHealth Evaluation Lab</source>
          <year>2019</year>
          .
          <source>CLEF 2019 - 10th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>A new decision to take for cost-sensitive nave bayes classi ers</article-title>
          .
          <source>Inf</source>
          . Process. Manage.,
          <volume>50</volume>
          (
          <issue>5</issue>
          ):
          <volume>653</volume>
          {
          <fpage>674</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>Interactive text categorisation: The geometry of likelihood spaces</article-title>
          .
          <source>Studies in Computational Intelligence</source>
          ,
          <volume>668</volume>
          :
          <fpage>13</fpage>
          {
          <fpage>34</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>A study of an automatic stopping strategy for technologically assisted medical reviews</article-title>
          .
          <source>In Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, pages
          <volume>672</volume>
          {
          <fpage>677</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Giacomo Ciu reda, and Federica Vezzani.
          <article-title>Interactive sampling for systematic reviews</article-title>
          .
          <source>IMS unipd at CLEF</source>
          <year>2018</year>
          <article-title>ehealth task 2</article-title>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          .,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and
          <string-name>
            <given-names>Federica</given-names>
            <surname>Vezzani</surname>
          </string-name>
          .
          <article-title>A gami ed approach to nave bayes classi cation: A case study for newswires and systematic medical reviews</article-title>
          .
          <source>In Companion of the The Web Conference 2018 on The Web Conference</source>
          <year>2018</year>
          ,
          <article-title>WWW 2018</article-title>
          , Lyon , France,
          <source>April 23-27</source>
          ,
          <year>2018</year>
          , pages
          <fpage>1139</fpage>
          {
          <fpage>1146</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and Daniel Zilio.
          <article-title>Gami cation for machine learning: The classi cation game</article-title>
          .
          <source>In Proceedings of the Third International Workshop on Gami cation for Information Retrieval co-located with 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2016</year>
          ), Pisa, Italy, July
          <volume>21</volume>
          ,
          <year>2016</year>
          ., pages
          <volume>45</volume>
          {
          <fpage>52</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Zilio</surname>
          </string-name>
          .
          <article-title>The university of padua (IMS) at TREC 2016 total recall track</article-title>
          .
          <source>In Proceedings of The Twenty-Fifth Text REtrieval Conference</source>
          , TREC 2016, Gaithersburg, Maryland, USA, November
          <volume>15</volume>
          -
          <issue>18</issue>
          ,
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>