<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Pharmacology Workspace: A Platform for Drug Discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alasdair J. G. Gray</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sune Askjaer</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Brenninkmeijer</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kees Burger</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christine Chichester</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Eales</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris T. Evelo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carole Goble</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Groth</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lee Harland</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonis Loizou</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steve Pettifer</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rishi Ramgolam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Thompson</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andra Waagmeester</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antony J. Williams</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Academic Concept Knowledge Limited</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Connected Discovery</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>H. Lundbeck A/S</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Maastricht University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Netherlands Bioinformatics Center</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Royal Society of Chemistry</institution>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University of Manchester</institution>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>VU University Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the Open PHACTS linked data platform that is being developed to address a set of example drug discovery research questions and which supports several drug discovery applications. The platform retrieves data from many complementary, but overlapping, data sources to present an integrated view of the data. The platform exploits two entity resolution services: respectively for transforming text and chemical structures to a concept. The single concept URI provided by the resolution service is then expanded to a set of equivalent URIs used by the data sources. Availability. An alpha version is currently available to the Open PHACTS consortium. A first public release of the platform will be made in late 2012, see http://www.openphacts.org/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>EXTENDED ABSTRACT</title>
      <p>The investigation and development of new drugs requires that
scientists involved in the process deal with multiple information
sources. These range from online databases of proteins (e.g. UniProt
and Enzyme) and chemicals (e.g. ChEMBL, ChemSpider, and
DrugBank), to models of biological pathways (e.g. Reactome,
WikiPathways, and KEGG) and scientific literature. These
information sources are often held in different formats and sourced
from a wide variety of organizations. Together they cover a
wide area of the scientific space of interest, but overlap in the
data they provide and also record different (or even inconsistent)
representations of the same data.</p>
      <p>A significant challenge to scientists is the labour intensive
integration of datasets. The entities of interest must be identified
and mapped to each other to allow complementary information
from many data sources to be collated in a single record. For
example, ChemSpider contains data about chemical compounds and
where they can be sourced, while ChEMBL complements this with
data about the bioactivity of drug-like molecules and DrugBank
provides information on the clinical use of drugs which contain the
molecules. These data sources can be linked based on the chemical
structure of the compounds. However, differences in scientific or
technical approaches to molecular structure representation mean
that different data sources will not always be in agreement, often
varying in the charged state of the compound, e.g. “Simvastatin” on
ChemSpider1 and DrugBank2. Thus, for successful data integration
1 http://www.chemspider.com/Chemical-Structure.
49179.html accessed May 2012.</p>
      <sec id="sec-1-1">
        <title>2 http://www.drugbank.ca/drugs/DB00641</title>
        <p>2012.
accessed</p>
        <p>May
one must devise strategies that address inconsistencies within the
existing data.</p>
        <p>The linked data platform being developed in the Open PHACTS
project3 aims to overcome these data integration challenges. There
are two key entry points into the system, both of which perform
resolution from user input to an identifier for a data concept.</p>
        <p>The first is through keyword search, as shown in Figure 1. In
the pharmacology domain, this is more than just text matching as
keywords can often match to multiple often very distinct concepts.
For example, when typing “menthol” does the user mean the
chemical menthol, or the menthol receptor protein. The user
interface supports this disambiguation by providing different entry
points, e.g. compound by name or target by name (shown in
Figure 1). The Identifier Resolution Service (IRS) translates
userentered entity names (in free text form), together with the context
information, into known entities within the system (i.e. that have a
defined URI). The IRS uses several dictionaries including a custom
dictionary of chemical names and synonyms from ChemSpider, as
well as MeSH, GO, and SwissProt. The IRS provides data for the
auto-complete text box including the preferred name for the entity
and a link to its definition. This supports the user in disambiguating
the entity that they mean. The identified entity URI can then be used
to retrieve further information from the linked data platform.</p>
        <p>The second entry point is through chemical structure search that
uses a tool for drawing chemical structures which are then converted
to a standardised chemical structure representation. This is then
processed by the ChemSpider structure search service to return a
ChemSpider URI for the chemical entity drawn. The service can
also be used for substructure and similarity searches.</p>
        <p>
          The linked data platform leverages the comprehensive work
already performed by the community in creating RDF-based
datasets, which are relevant for the Open PHACTS project. The
current platform uses the ChEMBL and ChEBI datasets provided
by the Chem2Bio2RDF project
          <xref ref-type="bibr" rid="ref1">(Chen et al., 2010)</xref>
          , the conversion
of DrugBank provided by the LODD project
          <xref ref-type="bibr" rid="ref3">(Samwald et al.,
2011)</xref>
          , and the conversion of the Enzyme database sourced from
UniProt
          <xref ref-type="bibr" rid="ref2">(Jain et al., 2009)</xref>
          . A significant challenge is ensuring
that the RDF versions of the datasets are kept up-to-date with
the originals from which they are derived. For example, the
Chem2Bio2RDF version of ChEMBL is version 8 whereas the
original dataset is now at version 13.
        </p>
        <p>The data sources are integrated using parameterized SPARQL
queries that are called through an API exposed by the linked</p>
      </sec>
      <sec id="sec-1-2">
        <title>3 http://www.openphacts.org/ accessed May 2012.</title>
        <p>data platform. The API call generates a query containing the URI
returned by the IRS. The query is then expanded at execution time
using an identity mapping service that equates the data entity URIs
from the various data sources. To provide adequate interaction
speeds, we have cached the datasets in the linked data platform.</p>
        <p>The result for doing a compound lookup with the search term
“Aspirin” is shown in Figure 2. Information about the chemcial
structure is sourced from ChemSpider, details of its bioactivity are
obtained from ChEMBL, and information about the drugs in which
the compound is active are obtained from DrugBank. Currently, the
provenance of the data points is not shown in the user interface,
although this is planned for the public release.</p>
        <p>
          The linked data platform is being developed to answer a set of
pharmacology research questions that require data to be integrated
from a variety of data sources
          <xref ref-type="bibr" rid="ref4">(Williams et al., 2012)</xref>
          . The platform
hides the complexities of interacting with the linked data and
concepts by exposing an API that provides the core functionality
to support a wide variety of drug discovery applications being
developed within the Open PHACTS project, although only one has
been shown in this demonstration paper.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>ACKNOWLEDGEMENTS</title>
      <p>The research leading to these results has received support from
the Innovative Medicines Initiative Joint Undertaking under grant
agreement number 115191, resources of which are composed
of financial contribution from the European Union’s Seventh
Framework Programme (FP7/2007- 2013) and EFPIA companies’
in kind contribution.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wild</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bairoch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duvaud</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redaschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzek</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGarvey</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gasteiger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Infrastructure for the life sciences: design and implementation of the UniProt website</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>136</volume>
          +.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Samwald</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jentzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kallesoe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willighagen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajagos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prud</surname>
          </string-name>
          'hommeaux, E.,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stephens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Linked open drug data for pharmaceutical research and development</article-title>
          .
          <source>Journal of Cheminformatics</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>19</volume>
          +.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettifer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chichester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willighagen</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evelo</surname>
            ,
            <given-names>C. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ecker</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mons</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2012</year>
          ). Open PHACTS:
          <article-title>Semantic interoperability for drug discovery</article-title>
          .
          <source>Drug Discovery Today</source>
          . To appear.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>