<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Big Mechanism Program: Changing How Science Is Done</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Andrey Rzhetsky University of Chicago</institution>
          ,
          <addr-line>900 East 57th Street, Chicago, IL 60637</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of the XVIII International Conference «Data Analytics and Management in Data Intensive Domains» (DAMDID/RCDL'2016)</institution>
          ,
          <addr-line>Ershovo</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The talk will describe details of actively evolving research conducted by the UChicago consortium of the Big Mechanism program, funded by the US DARPA agency. The consortium's work focuses on: (1) probabilistic reasoning across cancer claims culled from literature which uses custom-designed ontologies; (2) the computational modelling of cancer mechanisms and pathways to automatically predict therapeutic clues; (3) automated hypothesis generation to strategically extend this knowledge, and; (4) developing a 'Robot Scientist' that performs experiments to test hypotheses probabilistically, then feeding those results back to the system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>DARPA is funding the Big Mechanism program
(http://www.darpa.mil/program/big-mechanism) in
order to study large, explanatory models of complicated
systems in which interactions have important causal
effects. The program’s aim is to develop technology used
to read research abstracts and papers and extract pieces
of causal mechanisms, assemble these pieces into more
complete causal models, and reason over these models to
produce explanations. The program’s domain is cancer
biology, with an emphasis on signalling pathways; this is
just one example of causal, explanatory models, that we
are hoping will be extensible across multiple domains,
similar to what IBM Watson’s team [1] is attempting
presently.</p>
    </sec>
    <sec id="sec-2">
      <title>2 The overall structure of the Big</title>
    </sec>
    <sec id="sec-3">
      <title>Mechanism program</title>
      <p>The program is currently organized into three
consortia, all of which take different views of causal
models, different reading technologies, and different use
cases.</p>
      <p>The largest consortium, called FRIES, includes
groups at CMU, SRI, University of Arizona, Oregon
Health Sciences University, and others. FRIES’s main
focus is to explain signalling pathway behaviours. For
instance, why is the expression of a gene ephemeral?
Technologically, FRIES focuses on information
extraction over deep reading, simulation, and even FPGA
acceleration of systems biology simulators.</p>
      <p>The second consortium (“UChicago”), in which the
author of this keynote acts as the PI, is composed of
researchers at the University of Chicago, the United
Kingdom’s National Center for Text Mining at the
University of Manchester, along with participants from
the Brunel University in London, all of whom collaborate
on developing robotic platforms for experiment design
and analysis.</p>
      <p>The third consortium, called CURE, consists of two
groups from Harvard Medical School, IHMC in Florida,
and SIFT. Their focus is on deep reading, fine-grained
modeling, and simulation of cell signaling’s underlying
biochemistry.</p>
      <p>This talk will provide an overview of the objectives
and results related mostly to the work of the second
consortium.</p>
    </sec>
    <sec id="sec-4">
      <title>3 UChicago consortium</title>
      <p>As the project is ongoing and far from completion,
we will cover the ideas that led the consortium to our
current system design, our biological and medical
motivations, and preliminary results.</p>
      <p>Motivation: Today, cancer-related text mining is
performed in linear pipelines (named entity recognition
to event extraction) without explicitly estimating
statement uncertainty or importance relative to a total
model of cancer. Moreover, reading is divorced from
reasoning and experimentation. Probabilistic reasoning
is rarely used. Similarly, the Robot Scientist approach
currently uses non-probabilistic logic and is disconnected
from text mining and not applied to medicine. In
addition, a wealth of panomics data is increasingly
available, but existing methods treat each event
independently and disregard prior knowledge.</p>
      <p>
        Fundamental medical problem: We do not fully
understand how to stop cancer cells from growing faster
than normal tissue, and spreading throughout the body
(metastasizing). Death from cancer typically occurs
when uncontrolled growth occurs in a place where it
cannot be surgically removed. Most traditional
anticancer drugs are highly toxic to patients. As a result,
single drug treatment is generally undesirable for the
following reasons: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) It is generic and not targeted to the
patient and their cancer’s genotype(s); (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Intervention is
required at multiple points along a cancer pathway, and;
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Cancer evolves resistance. The Holy Grail of cancer
therapy is to find highly potent, non-toxic drug
combinations that are tailored to individual patients, and
linked to the readout of gene and protein expression from
their specific cancer(s).
      </p>
      <p>The system developed by the consortium
incorporates three components, called Reading,
Assembly, and Explanation (see Figure 1). These
components integrate machine reading with probabilistic
modelling, the design of custom-made ontologies, and
automated experiments conducted by the Robot Scientist
(a robot that is driven by experiment-designing and
planning programs). For quality control and
benchmarking, an independent set of experiments is
conducted by humans.</p>
      <p>To illustrate how all these components come
together, the talk will present a use case: Automated,
optimal drug combination prediction for achieving
activation or silencing of target gene(s) in a breast cancer
cell line. In our initial setup, we are using a text-mined
network of about three hundred genes and proteins,
containing parts of networks in use cases 1 and 2. In the
first pass, we focused on activating the estrogen receptor
gene (ESR1) in a triple-negative breast cancer cell line
by administering a cocktail of two or more
FDAapproved drugs.</p>
      <p>The motivation for the use case is to practically apply
growing (through machine reading and experimental
validation) model of cellular machinery to manipulate
the state of the cancer cell, achieving silencing or
activation of target genes/proteins in the absence of drugs
specifically targeting these molecules. If successful,
computationally-derived drug cocktails could at least
partially reduce the need to develop new drugs, easing
the economic burden of discovering and testing new
medications. (Each new FDA-approved drug has an
estimated price tag of somewhere between 100 million
and 1 billion US dollars.)</p>
      <p>The system generates hypotheses of the form
“cocktail of drugs X1, …, Xn activates gene ESR1” and
each hypothesis is tested experimentally in a
triplenegative breast cancer cell line. Either human biologists
or the Robot Scientist carry out these experiments.</p>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusion</title>
      <p>The approach chosen by the team relies on the
assimilation of massive, pre-existing literature (similar to
IBM Watson) combined with iterative model updating
based on empirical data and newly designed experiments
(unlike IBM Watson). The project’s general
methodology is not domain-specific, so it is theoretically
extensible across scientific domains.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Best</surname>
            ,
            <given-names>J. IBM</given-names>
          </string-name>
          <article-title>Watson: The Inside Story Of How The Jeopardy-Winning Supercomputer Was Born</article-title>
          , And
          <string-name>
            <surname>What It Wants To Do Next - Feature - TechRepublic. TechRepublic</surname>
          </string-name>
          . N.p.,
          <year>2015</year>
          . Web. 13 May
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Evans</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rzhetsky</surname>
            <given-names>A.</given-names>
          </string-name>
          <article-title>Machine science</article-title>
          .
          <source>Science. Jul 23</source>
          <year>2010</year>
          ;
          <volume>329</volume>
          (
          <issue>5990</issue>
          ):
          <fpage>399</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>King</surname>
            <given-names>RD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowland</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliver</surname>
            <given-names>SG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aubrey</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Byrne</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liakata</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markham</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pir</surname>
            <given-names>P</given-names>
          </string-name>
          , Soldatova,
          <string-name>
            <given-names>LN</given-names>
            ,
            <surname>Sparkes</surname>
          </string-name>
          <string-name>
            <given-names>A</given-names>
            ,
            <surname>Whelan</surname>
          </string-name>
          <string-name>
            <given-names>KE</given-names>
            ,
            <surname>Clare</surname>
          </string-name>
          <string-name>
            <surname>C</surname>
          </string-name>
          .
          <source>The Automation of Science. Science</source>
          .
          <year>2010</year>
          .
          <volume>324</volume>
          ,
          <fpage>85</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>