=Paper= {{Paper |id=None |storemode=property |title=The Pharmacology Workspace: A Platform for Drug Discovery |pdfUrl=https://ceur-ws.org/Vol-897/demo_4.pdf |volume=Vol-897 |dblpUrl=https://dblp.org/rec/conf/icbo/GrayABBCEEGGHLPRTWW12 }} ==The Pharmacology Workspace: A Platform for Drug Discovery== https://ceur-ws.org/Vol-897/demo_4.pdf
     The Pharmacology Workspace: A Platform for Drug Discovery
       Alasdair J. G. Gray 1 , Sune Askjaer 2 , Christian Brenninkmeijer 1 , Kees Burger 3 ,
     Christine Chichester 3 , James Eales 1 , Chris T. Evelo 4 , Carole Goble 1 , Paul Groth 5 ,
    Lee Harland 6 , Antonis Loizou 5 , Steve Pettifer 1 , Rishi Ramgolam 7 , Mark Thompson 3 ,
                        Andra Waagmeester 4 and Antony J. Williams 8
             1                                2                        3
                 University of Manchester       H. Lundbeck A/S          Netherlands Bioinformatics Center
                   4                          5                                  6
                     Maastricht University      VU University Amsterdam            Connected Discovery
                         7                                          8
                           Academic Concept Knowledge Limited         Royal Society of Chemistry




ABSTRACT                                                                   one must devise strategies that address inconsistencies within the
   We present the Open PHACTS linked data platform that is                 existing data.
being developed to address a set of example drug discovery                    The linked data platform being developed in the Open PHACTS
research questions and which supports several drug discovery               project3 aims to overcome these data integration challenges. There
applications. The platform retrieves data from many complementary,         are two key entry points into the system, both of which perform
but overlapping, data sources to present an integrated view of the         resolution from user input to an identifier for a data concept.
data. The platform exploits two entity resolution services: respectively      The first is through keyword search, as shown in Figure 1. In
for transforming text and chemical structures to a concept. The single     the pharmacology domain, this is more than just text matching as
concept URI provided by the resolution service is then expanded to a       keywords can often match to multiple often very distinct concepts.
set of equivalent URIs used by the data sources.                           For example, when typing “menthol” does the user mean the
Availability. An alpha version is currently available to the Open          chemical menthol, or the menthol receptor protein. The user
PHACTS consortium. A first public release of the platform will be          interface supports this disambiguation by providing different entry
made in late 2012, see http://www.openphacts.org/.                         points, e.g. compound by name or target by name (shown in
                                                                           Figure 1). The Identifier Resolution Service (IRS) translates user-
EXTENDED ABSTRACT                                                          entered entity names (in free text form), together with the context
The investigation and development of new drugs requires that               information, into known entities within the system (i.e. that have a
scientists involved in the process deal with multiple information          defined URI). The IRS uses several dictionaries including a custom
sources. These range from online databases of proteins (e.g. UniProt       dictionary of chemical names and synonyms from ChemSpider, as
and Enzyme) and chemicals (e.g. ChEMBL, ChemSpider, and                    well as MeSH, GO, and SwissProt. The IRS provides data for the
DrugBank), to models of biological pathways (e.g. Reactome,                auto-complete text box including the preferred name for the entity
WikiPathways, and KEGG) and scientific literature. These                   and a link to its definition. This supports the user in disambiguating
information sources are often held in different formats and sourced        the entity that they mean. The identified entity URI can then be used
from a wide variety of organizations. Together they cover a                to retrieve further information from the linked data platform.
wide area of the scientific space of interest, but overlap in the             The second entry point is through chemical structure search that
data they provide and also record different (or even inconsistent)         uses a tool for drawing chemical structures which are then converted
representations of the same data.                                          to a standardised chemical structure representation. This is then
   A significant challenge to scientists is the labour intensive           processed by the ChemSpider structure search service to return a
integration of datasets. The entities of interest must be identified       ChemSpider URI for the chemical entity drawn. The service can
and mapped to each other to allow complementary information                also be used for substructure and similarity searches.
from many data sources to be collated in a single record. For                 The linked data platform leverages the comprehensive work
example, ChemSpider contains data about chemical compounds and             already performed by the community in creating RDF-based
where they can be sourced, while ChEMBL complements this with              datasets, which are relevant for the Open PHACTS project. The
data about the bioactivity of drug-like molecules and DrugBank             current platform uses the ChEMBL and ChEBI datasets provided
provides information on the clinical use of drugs which contain the        by the Chem2Bio2RDF project (Chen et al., 2010), the conversion
molecules. These data sources can be linked based on the chemical          of DrugBank provided by the LODD project (Samwald et al.,
structure of the compounds. However, differences in scientific or          2011), and the conversion of the Enzyme database sourced from
technical approaches to molecular structure representation mean            UniProt (Jain et al., 2009). A significant challenge is ensuring
that different data sources will not always be in agreement, often         that the RDF versions of the datasets are kept up-to-date with
varying in the charged state of the compound, e.g. “Simvastatin” on        the originals from which they are derived. For example, the
ChemSpider1 and DrugBank2 . Thus, for successful data integration          Chem2Bio2RDF version of ChEMBL is version 8 whereas the
                                                                           original dataset is now at version 13.
1 http://www.chemspider.com/Chemical-Structure.                               The data sources are integrated using parameterized SPARQL
49179.html accessed May 2012.                                              queries that are called through an API exposed by the linked
2 http://www.drugbank.ca/drugs/DB00641  accessed May
2012.                                                                      3   http://www.openphacts.org/ accessed May 2012.



                                                                                                                                               1
Gray et al.




                          Fig. 1. Screenshot showing a search with the identifier resolution service for the term “menthol”.




                                     Fig. 2. Screenshot showing the integrated information returned for Aspirin.



data platform. The API call generates a query containing the URI             ACKNOWLEDGEMENTS
returned by the IRS. The query is then expanded at execution time            The research leading to these results has received support from
using an identity mapping service that equates the data entity URIs          the Innovative Medicines Initiative Joint Undertaking under grant
from the various data sources. To provide adequate interaction               agreement number 115191, resources of which are composed
speeds, we have cached the datasets in the linked data platform.             of financial contribution from the European Union’s Seventh
   The result for doing a compound lookup with the search term               Framework Programme (FP7/2007- 2013) and EFPIA companies’
“Aspirin” is shown in Figure 2. Information about the chemcial               in kind contribution.
structure is sourced from ChemSpider, details of its bioactivity are
obtained from ChEMBL, and information about the drugs in which               REFERENCES
the compound is active are obtained from DrugBank. Currently, the
                                                                             Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., and Wild, D. (2010).
provenance of the data points is not shown in the user interface,               Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic
although this is planned for the public release.                                and systems chemical biology data. BMC Bioinformatics, 11(1), 255.
   The linked data platform is being developed to answer a set of            Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N., Suzek, B., Martin, M.,
pharmacology research questions that require data to be integrated              McGarvey, P., and Gasteiger, E. (2009). Infrastructure for the life sciences: design
                                                                                and implementation of the UniProt website. BMC Bioinformatics, 10(1), 136+.
from a variety of data sources (Williams et al., 2012). The platform
                                                                             Samwald, M., Jentzsch, A., Bouton, C., Kallesoe, C., Willighagen, E., Hajagos, J.,
hides the complexities of interacting with the linked data and                  Marshall, M., Prud’hommeaux, E., Hassanzadeh, O., Pichler, E., and Stephens,
concepts by exposing an API that provides the core functionality                S. (2011). Linked open drug data for pharmaceutical research and development.
to support a wide variety of drug discovery applications being                  Journal of Cheminformatics, 3(1), 19+.
developed within the Open PHACTS project, although only one has              Williams, A. J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E. L.,
                                                                                Evelo, C. T., Blomberg, N., Ecker, G., Goble, C., and Mons, B. (2012). Open
been shown in this demonstration paper.                                         PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today. To
                                                                                appear.




2