<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Practice-based Evidence in Medicine: Where Information Retrieval Meets Data Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karin M. Verspoor</string-name>
          <email>karin.verspoor@unimelb.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing and Information Systems</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Health and Biomedical Informatics Centre The University of Melbourne Melbourne</institution>
          ,
          <addr-line>Victoria</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        A new approach in medical practice is emerging thanks
to the increasing availability of large-scale clinical data in
electronic form. In practice-based evidence [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], the
clinical record is mined to identify patterns of health
characteristics, such as diseases that co-occur, side-effects of
treatments, or more subtle combinations of patient attributes
that might explain a particular health outcome. This
approach contrasts with what has been the standard of care
in medicine, evidence-based practice, in which treatment
decisions are based on (quantitative) evidence derived from
targeted research studies, specifically, randomised controlled
trials. Advantages of consulting the clinical record for
evidence rather than relying solely on structured research
include avoiding the selection bias of the inclusion criteria for
a clinical trial and monitoring of longer-term outcomes and
effects [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The two approaches are, of course,
complementary — a hypothesis derived from large-scale data mining
could in turn form the starting point for the design of a
clinical trial to rigorously investigate that hypothesis.
      </p>
      <p>Information retrieval can play an important role in both
approaches to collecting medical evidence. However, the use
of information retrieval methods in collecting practice-based
evidence requires moving away from traditional
documentoriented retrieval as the end goal in itself, to viewing that
retrieval as an intermediate step towards knowledge
discovery and population-scale data mining. Furthermore, it may
require the development of more context-specific retrieval
strategies, designed to identify specific characteristics of
interest and support particular tasks in the medical context.
2. IR AND EVIDENCE-BASED PRACTICE</p>
      <p>
        In evidence-based medicine, collection and meta-analysis
of the published literature of clinical trials form the
foundation of systematic reviews (e.g., Cochrane Reviews [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
The production of such reviews has traditionally been done
using painstaking exhaustive searches of the literature and
human synthesis of published experimental results. It has
been argued that automation is both necessary and possible
[
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ]. There is a clear role for information retrieval in this
process, to identify publications relevant to a given review,
although further structuring of the information within the
documents retrieved is also needed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        A number of targeted search engines for the published
biomedical literature have been developed that aim to
improve search effectiveness for biomedical researchers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Several incorporate the results of information extraction, such
as named entity recognition for specific relevant entity types
(e.g., drugs and diseases), with the objective of enabling
concept-based indexing of the literature.
3. IR AND PRACTICE-BASED EVIDENCE
      </p>
      <p>Data mining of electronic health records for medical
evidence demands processing of the wealth of clinical data now
recorded in natural language text. Transformation of this
unstructured data into a structured representation is needed
for incorporation of the information it contains into broader
data mining. Many transformations can be cast as
information retrieval tasks: for instance, identifying patients
satisfying particular profiles (e.g., for recruitment into clinical
trials or registries), or retrieval of case histories
corresponding to specific treatment protocols. Development of general
approaches to such tasks will likely require a mix of
information retrieval and domain-specific information extraction.
4.</p>
    </sec>
    <sec id="sec-2">
      <title>CONCLUSION</title>
      <p>The boundaries between information retrieval,
information extraction, and data mining are blurring; bringing them
together, in an activity commonly referred to as text mining,
can result in heterogeneous methods that will enable sifting
through the entirety of the clinical record, including both its
unstructured and structured components. This in turn will
enable clinical decision making based on data derived from
large populations in the “laboratory” of the natural world.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Cochrane</given-names>
            <surname>Collaboration</surname>
          </string-name>
          . http://www.cochrane.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Guy</surname>
          </string-name>
          et al.
          <article-title>The automation of systematic reviews</article-title>
          .
          <source>BMJ</source>
          ,
          <volume>346</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          et al.
          <article-title>Automatic classification of sentences to support evidence based medicine</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>12</volume>
          (
          <issue>Suppl 2</issue>
          ):
          <fpage>S5</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Pubmed and beyond: a survey of web tools for searching biomedical literature</article-title>
          .
          <source>Database</source>
          ,
          <year>baq036</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pincus</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Sokka</surname>
          </string-name>
          .
          <article-title>Evidence-based practice and practice-based evidence</article-title>
          .
          <source>Nat Clin Pract Rheum</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>114</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <article-title>Mining the ultimate phenome repository</article-title>
          .
          <source>Nat Biotech</source>
          ,
          <volume>31</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1095</fpage>
          -
          <lpage>1097</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>I. Shemilt</surname>
          </string-name>
          et al.
          <article-title>Pinpointing needles in giant haystacks: Use of text mining to reduce impractical screening workload in extremely large scoping reviews</article-title>
          .
          <source>Research Synthesis Methods</source>
          ,
          <year>2013</year>
          . online preprint.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>