<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying Diagnostic Test Accuracy Publications using a Deep Model.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gaurav Singh</string-name>
          <email>gaurav.singh.15@ucl.ac.uk</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iain Marshall</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Thomas</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Byron Wallace</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kings College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Northeastern University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we used a deep model architecture to identify DTA studies pertaining to a given review topic. We were provided the list of relevant documents selected based on abstracts and full text for di erent reviews topics. We extracted the abstract and title to be used as features to describe those documents, and learned the deep neural net model that takes as input the abstract and title of the studies, and topic of the review to obtain a binary classi cation of whether that study is a relevant DTA to the review in question.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The proposed model takes as input the title and abstract of the paper as
sequences of words. These are then fed into the embeddings layer that outputs
a matrix of words vectors corresponding to the given words. It is then passed
through a 1-dimensional convolution layer of lter length 3. Similarly, the topic
of the review in question is also passed through the embedding layer and into
the convolution layer of lter length 3. The embeddings generated by the three
di erent convolution layers are then merged, and passed through a dense fully
connected layer with dropout, and sigmoid activation function for output. The
loss function used at the output layer is binary cross-entropy.
2.1</p>
      <p>Tuning
All the parameters were tuned on a held out validation dataset. The probabilities
of dropout were tuned over a range of 10 equidistant values in the interval [0; 1].
The optimal value of dropout probability obtained was 0.6. The structure of the
network was also trained on the held out validation dataset. We experimented
with di erent lter lengths, and di erent number of convolution layers.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>We can see the performance of the model on the held out dataset in Figure 2.
We can observe that the model managed to work much better than a random
classi er would have performed. We can see the macro-averaged performance of
the model in identifying relevant abstracts, and relevant full text documents in
Table 1. We can see the micro-averaged performance of the model in identifying
relevant abstracts and relevant full text documents in Table 2, obtained using
the script provided for evaluation.</p>
      <p>
        Fig. 2: It plots the number of relevant documents identi ed based on abstracts
versus the number of documents manually annotated (left), and the number of
relevant documents identi ed based on full text versus the number of documents
manually annotated (right). It is based on the held out data during training.
In previous work, we have built a classi er which, when presented with an
unknown citation (i.e. title/abstract), can predict whether it describes a
Randomized Controlled Trial (RCT) or not. Performance and technical details can be
found in Wallace et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The performance of this classi er on studies retrieved
in searches for systematic reviews is good, and can reduce the manual screening
burden by up to 80% while maintaining 100% recall. This is potentially very
useful, but it is able to do this because: 1) it has been built on a large unbiased
training dataset of 280,000 manually-labelled citations; and 2) the searches for
systematic reviews of RCTs retrieve a large number of references which are not
RCTs.
      </p>
      <p>We appear to have a di erent situation with regards to DTA studies. We
do not have the luxury of a large dataset on which to build a DTA classi er.
The data presented for this exercise, for example, are the result of searches
and screening decisions for DTA systematic reviews - rather than searches and
screening decisions for DTAs. This means that the negative class in the DTA
dataset contains large numbers of DTA studies, because they were irrelevant for
the speci c DTA review in question. This makes it impossible to use this dataset
to build a generic DTA classi er. Moreover, we also built a DTA classi er from
records we obtained outside this dataset - approximately 1,500 records which
were manually labelled as to whether they described a DTA study or not. The
results obtained, when using this classi er against the DTA training dataset
for this task are shown in the Figure 3 and 4. Other than a small boost at
the bottom left of the graph in Figure 4, we can see that this classi er does
not perform well. Especially, in comparison to the results of the deep model
presented in the previous section.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
      <p>JT and GS acknowledge support from Cochrane via the Transform project.
BCWs contribution to the work was supported by the Agency for Healthcare
Research Quality, grant R03-HS025024, and from the National Institutes of
Health/National Cancer Institute, grant UH2-CA203711. IJM acknowledges
support from the UK Medical Research Council, through its Skills Development
Fellowship program, grant MR/N015185/1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Noel-Storr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Smalheiser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Thomas</surname>
          </string-name>
          .
          <article-title>Identifying reports of randomized controlled trials (rcts) via a hybrid machine learning and crowdsourcing approach</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>