<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interactive Sampling for Systematic Reviews. IMS Unipd At CLEF 2018 eHealth Task 2.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Ciu reda</string-name>
          <email>giacomo.ciuffreda@studenti.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <email>federica.vezzani@phd.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Linguistic and Literary Studies</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This is the second participation of the Information Management Systems (IMS) group at CLEF eHealth Task of Technologically Assisted Reviews in Empirical Medicine. This task focuses on the problem of medical systematic reviews, a problem which requires a recall close (if not equal) to 100%. Semi-Automated approaches are essential to support these type of searches when the amount of data exceed the limits of users, i.e. in terms of attention or patience. We present a variation of the two-dimensional approach which 1) sets the maximum amount of documents that the physician is willing to read, 2) takes into account a sampling strategy to estimate the 95% con dence interval of the number of relevant documents present in the collection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In this paper, we describe the participation of the Information Management
Systems (IMS) group at CLEF eHealth 2018 [10] Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This task focuses on
the problem of systematic reviews, that is the process of collecting articles that
summarise all evidence (if possible) that has been published regarding a certain
medical topic. This task requires long search sessions by experts in the eld of
medicine; for this reason, semi-automatic approaches are essential to support
these type of searches when the amount of data exceed the limits of users, i.e.
in terms of attention or patience.
      </p>
      <p>The objective of our participation to this task was to:</p>
      <p>
        tf
where wi is the weight of the i-th term, k1 and b are two parameters (some
default parameters are3 k1 = 1:2 and b = 0:75), tf is the term frequency in the
document, and wBIM is the Binary Independence Model weight of the i-th term:
i
where iR and iN R are the parameters of the Bernoulli random variable that
represent the presence (or absence) of the i-th term in the relevant (R) and
non-relevant (N R) documents. The estimate of each parameter is:
feedback [
        <xref ref-type="bibr" rid="ref2 ref4 ref6 ref7 ref8">8, 2, 7, 4, 6</xref>
        ]. In order to explain how the two-dimensional BM25 space
works, in the following sections we present a brief review of the BM25 model.
2.1
      </p>
      <p>
        BM25
The BM25 is a probabilistic retrieval model where the weight of a term in a
document is equal to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]:
(1)
(2)
(3)
(4)
(5)
iR =
      </p>
      <p>ri +
R +</p>
      <p>R +</p>
      <p>R
iN R =</p>
      <p>N
ni
R +
ri +</p>
      <p>R</p>
      <p>N R
N R +</p>
      <p>N R
where R is the number of relevant documents, ri the number of relevant
documents in which the i-th term appears, N is the total number of documents and
ni is the total number of documents in which the i-th term appears. Parameters
and correspond to the hyper-parameter of the conjugate beta prior
distribution of the Bernoulli random variable. For R = R = 0:5 and R =N R= 0:5,
we obtain the de nition of the well-known Robertson - Sparck Jones weight
wRSJ . Given a document d, the probability of the document being relevant is
i
proportional to:</p>
      <p>P (Rjd) /</p>
      <p>X wBM25(tf )</p>
      <p>i
i2d
2.2</p>
      <sec id="sec-1-1">
        <title>Two-Dimensional Model</title>
        <p>
          The two-dimensional representation of probabilities [
          <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
          ] is an intuitive way of
presenting a two-class classi cation problem on a two-dimensional space. Given
3 http://terrier.org
two classes, for example relvant R and non-relevant N R, a document d is
assigned to category R if the following inequality holds:
        </p>
        <p>P (djN R) &lt; m P (djR) +q
| {yz } | {xz }
where P (djR) and P (djN R) are the likelihoods of the object d given the two
categories, while m and q are two parameters that can be optimized to compensate
for either the unbalanced class issues or di erent misclassi cation costs.</p>
        <p>If we interpret the two likelihoods as two coordinates x and y of a two
dimensional space, the problem of classi cation can be studied on a two-dimensional
plot. The decision of the classi cation is represented by the line y = mx + q
that splits the plane into two parts: all the points that fall `below' this line are
classi ed as objects that belong to class R.</p>
        <p>Two-dimensional BM25 In order to link the two-dimensional model to the
BM25 model, rst we de ne the BIM weight as a di erence of logarithms:
We now have all the elements to de ne the two coordinates x = P (djR) and
y = P (djN R) in the following way:
where Pi2d indicates (with an abuse of notation) the sum over all the terms of
document d.</p>
        <p>In Figure 1, we show an example of the visualization of a collection of
documents using the two-dimensional BM25 model. Relevant and non relevant
documents which have already been judged by a user (in our case the physician) are
colored in green and red; documents that have not been judged are greyed. The
two lines represents two possibile decision lines (see Equation 6) to rank/classify
new documents as relevant.
rel
rel
not rel
not judged
−200
−150
−100
x
−50
0
1. study the e ectiveness of a classi er given a xed amount of documents that
a physician is willing to review;
2. design a sampling strategy to estimate the 95% con dence interval of the
number of relevant documents in the collection.</p>
        <p>
          In the experiments, we used the following procedure:
{ we set a number n of documents that the physician is willing to read and
a number s that tells the algorithm when (every s documents) to randomly
sample a document from the collection instead of presenting to the physician
the next most relevant document;
{ for each topic, we run an optimized (hyper-parameters) BM25 retrieval
model and we obtain the relevance feedback for the rst abstract in the
ranking list;
{ from the second document until n=2 1, we continuously update the relevance
weights of the terms according to the explicit relevance feedback given by
the physician (simulated by the qrels available with the test collection);
{ for the last half of the documents n=2 that the physician is willing to read, we
use a Nave Bayes classi er continuously updated with the explicit relevance
feedback [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Experiments</title>
      <p>{
{</p>
      <p>R = N R = 1:0</p>
      <p>
        R = N R = 0:01
For all the experiments, we set the values of the BM25 hyper-parameters in the
following way:
These values are consistent with other experiments and indicate that a beta prior
distribution that discounts the `presence' of a term in favour of its `absence' (high
and low ) results in a better retrieval performance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The slope m of the
decision line is set m = 1:0 and q = 0 for the rst half n=2 of the documents; then,
m and q are continuously updated according to the relevance information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
4.1
      </p>
      <sec id="sec-2-1">
        <title>O cial runs</title>
        <p>We submitted three runs by varying the number of documents n that the
physician is willing to read per topic: n = 1000, n = 2000, n = 3000. We set the
parameter s = 10, this means that every ten documents we sample a random
document from the collection instead of showing to the physician the next ranked
document. The three o cial runs are named as follows:
{ ims unipd t500.task2, n = 1000
{ ims unipd t1000.task2, n = 2000
{ ims unipd t1500.task2, n = 3000</p>
        <p>In Figure 2, we show the recall per topic for each o cial run. We see that there
are two topic in particular that are more di cult than the others: CD009263 and
CD012010 with a recall less (or close to) 0.6 for all the runs. Seven topics can
be considered as medium di cult (recall between 0.6 and 0.6 for at least one
of the experiments): CD008567, CD010213, CD010502, CD012165, CD012179,
CD012281, CD012599.</p>
        <p>in Figure 3, we compare the results of our three runs with the summary of
all the other CLEF 2018 participant. This plot con rms that most of high and
medium di cult topics are also topics that, on average, were di cult for most
of the participants (barplots more stretched and median far from value 1.00).
Con dence intervals of number of relevant documents During the
experiments, every 10 documents we sample a random document from the collection
and show the document for relevance assessment in order to estimate the number
of relevant documents in the collection. In Table 1, 2, and 3, we show a breakdown
1.0
0.8
of the number of documents per topic, how many topics were read (explicit
relevance feedback), the number of relevant documents, how many documents were
randomly sampled, the estimate of the number of relevant documents based on
the random sample as well as the 95% con dence interval (minimum and
maximum range), and the number of relevant documents found within the limit of the
threshold. In most cases, the estimate of the number of relevant documents (and
the 95% range) is much larger than the true number of relevant documents. The
analysis of the results shown in these table is still under study since we would
need a more sophisticated cost-bene t model to understand whether we want to
put more e ort in the estimate of the number of relevant documents or in the
automatic classi er.
4.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Uno cial runs</title>
        <p>
          In addition to the three o cial runs, we prepared two uno cial runs in order to
study the feasibility of the query rewriting approach based on the work of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We
asked two experts in linguistics to rewrite the query, each with a di erent goal:
the rst variant is written with the aim of creating a list of keywords resulting
from the semic analysis (the study of meaning in linguistic units) of the technical
terms contained in the initial query. The second variant is written with the aim
of reformulating the information need into a humanly readable sentence using
1.00
0.75
llca0.50
e
r
alternative terms such as synonyms, orthographic variants, related forms and/or
acronyms. The two experts worked independently from each other by
following a structured linguistic methodology and focusing on di erent terminological
aspects. We name these two experiments with \keyword" and \readable".
Linguistic Methodology: Terminological Record The methodology
applied for the process of query rewriting is based on a linguistic and terminological
analysis of all the technical terms contained in the information needs provided
in the dataset. The approach is divided into the following steps:
1. Recognition of technical terms;
2. Extraction of technical terms;
3. Linguistic and semantic analysis;
4. Formulation of terminological records;
5. Query rewriting.
        </p>
        <p>The core of our methodology is basically a new model of terminological record
used for the analysis of medical terminology [11]. This tool is a structured set
of terminological data referring to a speci c concept and it is used in order to
provide linguistic information about the concept itself and the term used for its
1.0
0.9
0.8
designation both for its lexical and semantic framing. This terminological record
is composed of four general elds, which individually refer to formal features,
semantics, corpus and references. Each eld in turn is divided in speci c sub elds
describing the term according to linguistic and notional criterions.</p>
        <p>Focusing on the rst two sub elds, the section named \formal feature"
contains lexical and morphosyntactic information such as genre, tonic accent, spelling,
etymology (derivation and composition), orthographic variant, acronyms/expansions
and related forms. From the semantic viewpoint, the sub eld \semantics"
contains the de nition of the term, its semic analysis, cases of phraseology
(collocations and colligations) and all the possible semantic variants.</p>
        <p>For example for topic CD011602, the information need provided is:
Ultrasonography for diagnosis of alcoholic cirrhosis in people with
alcoholic liver disease.</p>
        <p>We initially proceeded with the extraction of technical terms (both single-word
and multi-word terms) such as ultrasonography, diagnosis, alcoholic cirrhosis,
cirrhosis, alcoholic liver disease, liver, disease and then we started to formulate
terminological records for each of them. The sub eld named \formal feature"
was useful for the human readable reformulation, whereas `semantics" sub eld
provided the information necessary for the keywords reformulation.
First variant: keywords reformulation In particular, semic analysis turns
out to be the most useful process for the keyword reformulation and it aims to
decompose the meaning of the term analyzed. This process consists of breaking
down the sememe (i.e. the meaning) of a word in all its sense components, e.g.
the semes. So for exemple, for the term cirrhosis the process of decomposition
of meaning produced the following list of keywords: /chronic disease/ /liver/
/degeneration/ /cells/ /human body/ /in ammation/ / brous/ /thickening/
/tissue/ /alcoholism/ /hepatitis/.</p>
        <p>We repeat this kind of analysis of each technical term in the information need
and considering the above mentioned exemple for topic CD011602, the keyword
reformulation is the following:
/technique/ /echoes/ /ultrasound pulses/ /ultrasound/ /pulse/
/delineate/ /areas/ /di erent density/ /body/ /human being/ /cells/
/examination/ /evaluation/ /diagnostic/ /diagnosing/ /diagnose/ /alcohol/
/chronic/ /disease/ /cirrhosis of the liver/ /liver/ degeneration/ /cells/
/in ammation/ / brous/ /thickening/ /tissue/ /alcoholism/
/hepatitis/ /patient/ /large lobed glandulare organ/ /abdomen/ vertebrates/
/metabolic processes/ /disorder/ /structure/ /function/ / symptoms/
/a ect/ /location/ /physical injury/.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Second variant: human readable reformulation The second type of query</title>
        <p>was written with the aim of reformulating the information need in a humanly
readable sentence. Thanks to terminological records, we have been able to replace
original terms with validly attested synonyms and use orthographic alternatives
as variants of the medical terms provided in the original information need as
well as to systematically replace acronyms with their expansions and expansions
with their acronyms. Considering the previous topic CD011602, we obtained the
following readable reformulation:</p>
        <p>Diagnostic accuracy of medical ultrasound, known as diagnostic
sonography or ultrasonography, for the detection of alcoholic liver disease (ALD)
as the liver manifestations of alcohol overconsumption, including fatty
liver, alcoholic hepatitis, and chronic hepatitis with liver brosis or
cirrhosis.
topic
original readable keyword original readable keyword original readable keyword
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>We are currently evaluating the results of these reformulation topic by topic,
reformulation in the top 10 retrieved documents, Table 5.</p>
      <p>In this phase of the analysis, we noted that there are some topics for which
the two reformulations (\keywords" and/or \readable") retrieved, in the
rst
10 positions, more relevant documents than the original query. Table 6 shows
these topics and the number of documents retrieved depending on the type
of reformulation. We then proceed with the manual analysis of such topics by
reading the abstracts of the relevant documents retrieved from the two variants
and we started to analyse from a linguistic viewpoint which terms contained in
the two reformulations allowed the retrieval of such relevant documents.</p>
      <p>As a rst and approximate analysis, we noted that the terms that were most
frequently used in the two reformulations are those related to the diagnostic and
evaluative sphere such as diagnosis and related forms as diagnostic, diagnose and
diagnosing as well as evaluation, examination, test and detection. Furthermore,
even the replacement of the full multi-word terms with the acronym such as
DMSA for Dimercaptosuccinic Acid Scan, VUR for Vesicoureteral Re ux and
UTI for Urinary Tract Infection, has turned out to be a good approach because
reduced lexical forms are one of the typical feature of medical language and
abbreviations are used in order to rapidly transmit health information.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Ongoing and Future Work</title>
      <p>In this work, we presented a continuous active learning approach that uses a
xed stopping strategy to simulate the maximum amount of documents that a
physician is willing to review, and a sampling strategy that is used to estimate
the number of relevant documents in the collection. We are currently performing
a failure analysis to understand the possibile reasons of a recall below 90% and
identify the linguistic aspects of a query rewriting approach that may help to
improve the performance of an interactive system.
10. Hanna Suominen, Liadh Kelly, Lorraine Goeuriot, Evangelos Kanoulas, Leif
Azzopardi, Rene Spijker, Dan Li, Aurelie Neveol, Lionel Ramadier, Aude Robert,
Joao Palotti, Jimmy, and Guido Zuccon, editors. Overview of the CLEF eHealth
Evaluation Lab 2018. CLEF 2018 - 8th Conference and Labs of the Evaluation
Forum, volume Lecture Notes in Computer Science (LNCS). Springer, September
2018.
11. Federica Vezzani, Giorgio Maria Di Nunzio, and Genevieve Henrot. Trimed: A
multilingual terminological database. In Proceedings of the Eleventh International
Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan,
May 7-12, 2018., 2018.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          , Rene Spijker,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          , and Leif Azzopardi, editors.
          <article-title>CLEF 2018 Technology Assisted Reviews in Empirical Medicine Overview</article-title>
          .
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes,
          <source>CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>A new decision to take for cost-sensitive nave bayes classi ers</article-title>
          .
          <source>Inf</source>
          . Process. Manage.,
          <volume>50</volume>
          (
          <issue>5</issue>
          ):
          <volume>653</volume>
          {
          <fpage>674</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>Interactive text categorisation: The geometry of likelihood spaces</article-title>
          .
          <source>Studies in Computational Intelligence</source>
          ,
          <volume>668</volume>
          :
          <fpage>13</fpage>
          {
          <fpage>34</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>A study of an automatic stopping strategy for technologically assisted medical reviews</article-title>
          .
          <source>In Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, pages
          <volume>672</volume>
          {
          <fpage>677</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Federica Beghini, Federica Vezzani, and
          <string-name>
            <given-names>Genevieve</given-names>
            <surname>Henrot</surname>
          </string-name>
          .
          <article-title>An interactive two-dimensional approach to query aspects rewriting in systematic reviews. IMS unipd at CLEF ehealth task 2</article-title>
          .
          <source>In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and
          <string-name>
            <given-names>Federica</given-names>
            <surname>Vezzani</surname>
          </string-name>
          .
          <article-title>A gami ed approach to nave bayes classi cation: A case study for newswires and systematic medical reviews</article-title>
          .
          <source>In Companion of the The Web Conference 2018 on The Web Conference</source>
          <year>2018</year>
          ,
          <article-title>WWW 2018</article-title>
          , Lyon , France,
          <source>April 23-27</source>
          ,
          <year>2018</year>
          , pages
          <fpage>1139</fpage>
          {
          <fpage>1146</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and Daniel Zilio.
          <article-title>Gami cation for machine learning: The classi cation game</article-title>
          .
          <source>In Proceedings of the Third International Workshop on Gami cation for Information Retrieval co-located with 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2016</year>
          ), Pisa, Italy, July
          <volume>21</volume>
          ,
          <year>2016</year>
          ., pages
          <volume>45</volume>
          {
          <fpage>52</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Maria Maistro, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Zilio</surname>
          </string-name>
          .
          <article-title>The university of padua (IMS) at TREC 2016 total recall track</article-title>
          .
          <source>In Proceedings of The Twenty-Fifth Text REtrieval Conference</source>
          , TREC 2016, Gaithersburg, Maryland, USA, November
          <volume>15</volume>
          -
          <issue>18</issue>
          ,
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            and
            <given-names>Hugo</given-names>
          </string-name>
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <article-title>The probabilistic relevance framework: BM25 and beyond</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <volume>333</volume>
          {
          <fpage>389</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>