<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automating Biomedical Evidence Synthesis: Recent Work and Directions Forward?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Byron C. Wallace</string-name>
          <email>b.wallace@northeastern.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computer and Information Science Northeastern University</institution>
          ,
          <addr-line>Boston, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Evidence-based medicine (EBM) looks to inform patient care with the totality of the available evidence. Systematic reviews, which statistically synthesize the entirety of the biomedical literature pertaining to a speci c clinical question, are the cornerstone of EBM. These reviews are critical to modern healthcare, informing everything from national health policy to bedside decision-making. But conducting systematic reviews is extremely laborious, and hence expensive. Producing a single review requires thousands of expert hours, spent culling relevant structured evidence from the vast unstructured evidence base (i.e., natural language articles describing the conduct and results of trials). The exponential expansion of the biomedical literature base has exacerbated the situation: Health care practitioners can no longer keep up with the primary literature, and this hinders the practice of evidence-based care. The machine learning, natural language and information retrieval communities can lead the way in addressing this problem through the development of automation technologies that facilitate search and synthesis of evidence; but developing these will require meeting challenging technical problems. In this extended abstract, I discuss some of the progress made in recent years toward expediting unstructured biomedical evidence synthesis via automation techniques, and I highlight a few key challenges that remain.</p>
      </abstract>
      <kwd-group>
        <kwd>evidence-based medicine</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Researchers in machine learning (ML), natural language processing (NLP)
and information retrieval (IR) can play a key role in making unstructured
evidence more actionable, e.g., by facilitating search, extraction and ultimately
synthesis of ndings reported in articles that describe the outcomes of
randomized controlled trials. Such approaches have the potential to a ord healthcare
providers access to the "best currently available evidence at the push of a
button" [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Considerable progress has been made progress toward this aim [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
(not limited to work I have been involved with, of course, although this is what
I focus on here). For example, colleagues and I have developed RobotReviewer
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a prototype that integrates machine learning technologies to produce
automated syntheses of the trials described in uploaded articles. However, despite
this progress, core technical challenges remain. In this abstract, as in the talk it
accompanies, I highlight some recent progress and select challenges that remain.
Training models in low-supervision settings. Inducing models that can
automatically categorize and extract data from unstructured articles requires,
of course, supervision on various elds of interests. State-of-the-art NLP models
for relevant tasks such as information extraction tend to be highly parameterized
neural networks and hence data hungry. It is di cult and expensive to acquire
large volumes of training data in the biomedical domain: domain experts are
few, busy and expensive, and articles describing clinical trials tend to be dense
in jargon and hence di cult for lay annotators.
      </p>
      <p>
        To address this challenge, we have explored a few avenues. The rst is a
paradigm of distant supervision [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], wherein `found' data is re-purposed,
typically via rules and heuristics, to provide noisy supervision for a target task.
In particular we have exploited the Cochrane Database of Systematic Reviews
(CDSR), a database of semi-structured data pertaining to individual articles,
to derive such noisy supervision over sentences [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To mitigate noise, we have
introduced an approach we call Supervised Distant Supervision [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which
harnesses a small amount of direct supervision to improve the quality of distantly
derived labels. This improved the performance of a distantly supervised model
for extracting clinically salient sentences in full-text articles [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Semi-supervised methods constitute a complementary approach to
improving model performance in low-supervision settings. For instance, we were able to
exploit structured abstracts to derive syntactic patterns that can be fed as
additional inputs to sequence tagging models (e.g., LSTM-CRF) to yield improved
performance [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. And elsewhere, we have shown how to exploit existing
ontologies/controlled vocabularies (e.g., MeSH) to impose inductive biases in neural
models, in turn improving predictive accuracy [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Hybrid expert &amp; crowd annotation. Another means of addressing a paucity
of training data, of course, is to simply collect more data. As mentioned above,
relying on biomedical domain experts for this would be prohibitively costly. And
it is not obvious that layworkers (hired via crowdwork platforms like Amazon
Mechanical Turk) will be able to perform the task. However, we have shown
that redundant collection of annotations coupled with careful aggregation
strategies yields reasonable training signal [
        <xref ref-type="bibr" rid="ref5 ref6">6, 5</xref>
        ]. And we have recently made publicly
Automating Biomedical Evidence Synthesis
available a relatively large set ( 5k) of richly annotated biomedical abstracts
of papers describing clinical trials to facilitate methodological work on NLP for
EBM [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        It is not obvious how best to jointly exploit small amounts of (pricey) expert
supervision and (cheap but noisy) crowd annotations at scale. We have explored
active approaches for this [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but believe the general problem remains ripe for
exploration, especially in regards to also incorporating machine predictions in
the loop [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Joint extraction &amp; inference over lengthy documents. Ideally, we would
like to cull from article full-texts assertions that the underlying trial described
in a given article provides evidence in favor of a particular treatment for a
specied condition and outcome. This requires jointly extracting these elds and then
inferring what has been reported regarding them. In general, extracting
relationships between entities in scienti c papers remains an exciting open challenge at
the fore of existing language technologies [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Concerning the particular domain of EBM, we are just beginning work on
assembling a corpus that will comprise pairs of `evidence frames' specifying an
intervention, a comparator, and an outcome and accompanying full-text articles.
The task, then, will be to predict whether the article provides evidence that the
given intervention is more e ective than the comparator, with respect to the
outcome (or not). Going forward, we envision a model that can simultaneously
extract the interventions, comparators and outcomes studied (e.g., trained using
the corpus mentioned above [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) and infer the reported directionality of the
ndings. This is an audacious goal, but if realized would a ord access to immediately
actionable evidence, automatically.
      </p>
      <p>Closing remarks. The above are just a sample of the challenges inherent to the
task of trying to trying to automate biomedical evidence synthesis. In addition
to discussing work I have done with colleagues toward meeting these, my aim
in this talk and extended abstract is to call attention to the general problem
of evidence synthesis; I think researchers in IR and adjacent areas have the
potential to change the practice of evidence-based medicine by helping doctors
navigate the evidence, and ultimately gure out what works. This is a nice
general problem to work on because it is both socially important and technically
challenging.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Jonnalagadda</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            <given-names>man</given-names>
          </string-name>
          , M.D.:
          <article-title>Automating data extraction in systematic reviews: a systematic review</article-title>
          .
          <source>Systematic reviews 4(1)</source>
          ,
          <volume>78</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuiper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banner</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>Automating biomedical evidence synthesis: Robotreviewer</article-title>
          .
          <source>In: Proceedings of the conference. Association for Computational Linguistics. Meeting</source>
          . vol.
          <year>2017</year>
          , p.
          <fpage>7</fpage>
          .
          <string-name>
            <given-names>NIH</given-names>
            <surname>Public Access</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuiper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>Automating risk of bias assessment for clinical trials</article-title>
          .
          <source>Biomedical and Health Informatics</source>
          ,
          <source>IEEE Journal of 19(4)</source>
          ,
          <volume>1406</volume>
          {
          <fpage>1412</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mintz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bills</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          .
          <source>In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-</source>
          Volume 2. pp.
          <volume>1003</volume>
          {
          <fpage>1011</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mortensen</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adam</surname>
            ,
            <given-names>G.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trikalinos</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kraska</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>An exploration of crowdsourcing citation screening for systematic reviews</article-title>
          .
          <source>Research synthesis methods 8</source>
          (
          <issue>3</issue>
          ),
          <volume>366</volume>
          {
          <fpage>386</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lease</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Aggregating and predicting sequence labels from crowd annotations</article-title>
          .
          <source>In: Proceedings of the conference. Association for Computational Linguistics. Meeting</source>
          . vol.
          <year>2017</year>
          , p.
          <fpage>299</fpage>
          .
          <string-name>
            <given-names>NIH</given-names>
            <surname>Public Access</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lease</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Combining crowd and expert labels using decision theoretic active learning</article-title>
          .
          <source>In: Third AAAI Conference on Human Computation and Crowdsourcing</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nye</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature</article-title>
          .
          <source>Association for Computational Linguistics (ACL)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Syntactic patterns improve information extraction for medical search</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>00097</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sackett</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Evidence-based Medicine How to practice and teach EBM</article-title>
          . WB Saunders Company (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tsafnat</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glasziou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coiera</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>The automation of systematic reviews</article-title>
          .
          <source>BMJ</source>
          <volume>346</volume>
          (
          <issue>f139</issue>
          ),
          <volume>1</volume>
          {
          <issue>2</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Verga</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strubell</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shai</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Attending to all mention pairs for full abstract biological relation extraction</article-title>
          .
          <source>arXiv preprint arXiv:1710.08312</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuiper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.J.:</given-names>
          </string-name>
          <article-title>Extracting pico sentences from clinical trial reports using supervised distant supervision</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>17</volume>
          (
          <issue>132</issue>
          ),
          <volume>1</volume>
          {
          <fpage>25</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noel-Storr</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smalheiser</surname>
            ,
            <given-names>N.R.</given-names>
          </string-name>
          , Thomas,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Identifying reports of randomized controlled trials (rcts) via a hybrid machine learning and crowdsourcing approach</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>24</volume>
          (
          <issue>6</issue>
          ),
          <volume>1165</volume>
          {
          <fpage>1168</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lease</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>Exploiting domain knowledge via grouped weight sharing with application to text categorization</article-title>
          .
          <source>arXiv preprint arXiv:1702.02535</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>