=Paper=
{{Paper
|id=None
|storemode=property
|title=The Pharmacology Workspace: A Platform for Drug Discovery
|pdfUrl=https://ceur-ws.org/Vol-897/demo_4.pdf
|volume=Vol-897
|dblpUrl=https://dblp.org/rec/conf/icbo/GrayABBCEEGGHLPRTWW12
}}
==The Pharmacology Workspace: A Platform for Drug Discovery==
The Pharmacology Workspace: A Platform for Drug Discovery
Alasdair J. G. Gray 1 , Sune Askjaer 2 , Christian Brenninkmeijer 1 , Kees Burger 3 ,
Christine Chichester 3 , James Eales 1 , Chris T. Evelo 4 , Carole Goble 1 , Paul Groth 5 ,
Lee Harland 6 , Antonis Loizou 5 , Steve Pettifer 1 , Rishi Ramgolam 7 , Mark Thompson 3 ,
Andra Waagmeester 4 and Antony J. Williams 8
1 2 3
University of Manchester H. Lundbeck A/S Netherlands Bioinformatics Center
4 5 6
Maastricht University VU University Amsterdam Connected Discovery
7 8
Academic Concept Knowledge Limited Royal Society of Chemistry
ABSTRACT one must devise strategies that address inconsistencies within the
We present the Open PHACTS linked data platform that is existing data.
being developed to address a set of example drug discovery The linked data platform being developed in the Open PHACTS
research questions and which supports several drug discovery project3 aims to overcome these data integration challenges. There
applications. The platform retrieves data from many complementary, are two key entry points into the system, both of which perform
but overlapping, data sources to present an integrated view of the resolution from user input to an identifier for a data concept.
data. The platform exploits two entity resolution services: respectively The first is through keyword search, as shown in Figure 1. In
for transforming text and chemical structures to a concept. The single the pharmacology domain, this is more than just text matching as
concept URI provided by the resolution service is then expanded to a keywords can often match to multiple often very distinct concepts.
set of equivalent URIs used by the data sources. For example, when typing “menthol” does the user mean the
Availability. An alpha version is currently available to the Open chemical menthol, or the menthol receptor protein. The user
PHACTS consortium. A first public release of the platform will be interface supports this disambiguation by providing different entry
made in late 2012, see http://www.openphacts.org/. points, e.g. compound by name or target by name (shown in
Figure 1). The Identifier Resolution Service (IRS) translates user-
EXTENDED ABSTRACT entered entity names (in free text form), together with the context
The investigation and development of new drugs requires that information, into known entities within the system (i.e. that have a
scientists involved in the process deal with multiple information defined URI). The IRS uses several dictionaries including a custom
sources. These range from online databases of proteins (e.g. UniProt dictionary of chemical names and synonyms from ChemSpider, as
and Enzyme) and chemicals (e.g. ChEMBL, ChemSpider, and well as MeSH, GO, and SwissProt. The IRS provides data for the
DrugBank), to models of biological pathways (e.g. Reactome, auto-complete text box including the preferred name for the entity
WikiPathways, and KEGG) and scientific literature. These and a link to its definition. This supports the user in disambiguating
information sources are often held in different formats and sourced the entity that they mean. The identified entity URI can then be used
from a wide variety of organizations. Together they cover a to retrieve further information from the linked data platform.
wide area of the scientific space of interest, but overlap in the The second entry point is through chemical structure search that
data they provide and also record different (or even inconsistent) uses a tool for drawing chemical structures which are then converted
representations of the same data. to a standardised chemical structure representation. This is then
A significant challenge to scientists is the labour intensive processed by the ChemSpider structure search service to return a
integration of datasets. The entities of interest must be identified ChemSpider URI for the chemical entity drawn. The service can
and mapped to each other to allow complementary information also be used for substructure and similarity searches.
from many data sources to be collated in a single record. For The linked data platform leverages the comprehensive work
example, ChemSpider contains data about chemical compounds and already performed by the community in creating RDF-based
where they can be sourced, while ChEMBL complements this with datasets, which are relevant for the Open PHACTS project. The
data about the bioactivity of drug-like molecules and DrugBank current platform uses the ChEMBL and ChEBI datasets provided
provides information on the clinical use of drugs which contain the by the Chem2Bio2RDF project (Chen et al., 2010), the conversion
molecules. These data sources can be linked based on the chemical of DrugBank provided by the LODD project (Samwald et al.,
structure of the compounds. However, differences in scientific or 2011), and the conversion of the Enzyme database sourced from
technical approaches to molecular structure representation mean UniProt (Jain et al., 2009). A significant challenge is ensuring
that different data sources will not always be in agreement, often that the RDF versions of the datasets are kept up-to-date with
varying in the charged state of the compound, e.g. “Simvastatin” on the originals from which they are derived. For example, the
ChemSpider1 and DrugBank2 . Thus, for successful data integration Chem2Bio2RDF version of ChEMBL is version 8 whereas the
original dataset is now at version 13.
1 http://www.chemspider.com/Chemical-Structure. The data sources are integrated using parameterized SPARQL
49179.html accessed May 2012. queries that are called through an API exposed by the linked
2 http://www.drugbank.ca/drugs/DB00641 accessed May
2012. 3 http://www.openphacts.org/ accessed May 2012.
1
Gray et al.
Fig. 1. Screenshot showing a search with the identifier resolution service for the term “menthol”.
Fig. 2. Screenshot showing the integrated information returned for Aspirin.
data platform. The API call generates a query containing the URI ACKNOWLEDGEMENTS
returned by the IRS. The query is then expanded at execution time The research leading to these results has received support from
using an identity mapping service that equates the data entity URIs the Innovative Medicines Initiative Joint Undertaking under grant
from the various data sources. To provide adequate interaction agreement number 115191, resources of which are composed
speeds, we have cached the datasets in the linked data platform. of financial contribution from the European Union’s Seventh
The result for doing a compound lookup with the search term Framework Programme (FP7/2007- 2013) and EFPIA companies’
“Aspirin” is shown in Figure 2. Information about the chemcial in kind contribution.
structure is sourced from ChemSpider, details of its bioactivity are
obtained from ChEMBL, and information about the drugs in which REFERENCES
the compound is active are obtained from DrugBank. Currently, the
Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., and Wild, D. (2010).
provenance of the data points is not shown in the user interface, Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic
although this is planned for the public release. and systems chemical biology data. BMC Bioinformatics, 11(1), 255.
The linked data platform is being developed to answer a set of Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N., Suzek, B., Martin, M.,
pharmacology research questions that require data to be integrated McGarvey, P., and Gasteiger, E. (2009). Infrastructure for the life sciences: design
and implementation of the UniProt website. BMC Bioinformatics, 10(1), 136+.
from a variety of data sources (Williams et al., 2012). The platform
Samwald, M., Jentzsch, A., Bouton, C., Kallesoe, C., Willighagen, E., Hajagos, J.,
hides the complexities of interacting with the linked data and Marshall, M., Prud’hommeaux, E., Hassanzadeh, O., Pichler, E., and Stephens,
concepts by exposing an API that provides the core functionality S. (2011). Linked open drug data for pharmaceutical research and development.
to support a wide variety of drug discovery applications being Journal of Cheminformatics, 3(1), 19+.
developed within the Open PHACTS project, although only one has Williams, A. J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E. L.,
Evelo, C. T., Blomberg, N., Ecker, G., Goble, C., and Mons, B. (2012). Open
been shown in this demonstration paper. PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today. To
appear.
2