ICBO 2014 Proceedings


        A Nanopublication Framework for Biological
              Networks using Cytoscape.js
                          James P. McCusker,1,3 Rui Yan,1 Kusum Solanki,2 John Erickson,1
                                                  Cynthia Chang,1
                          Michel Dumontier,4 Jonathan S. Dordick,2 Deborah L. McGuinness1


   Abstract—We leverage semantic technologies and Cytoscape.js      A. Inferring Probabilities
to create a provenance-aware, probabilistic analysis platform for
systems biology and evaluate its usefulness in discovering links       Two molecular biology and biochemistry experts, Michel
between drugs and diseases. In our efforts to create a system-      Dumontier and Pascale Gaudet, assigned a score from low to
atic approach to discovering new uses for existing drugs, we        high confidence of 1-3, evidence and/or technique associated
have developed Repurposing Drugs with Semantics (ReDrugS).
ReDrugS is a data curation and publication framework that
                                                                    with the interaction. The confidence measure was based on the
accepts data from nearly any database containing biological or      comparative analyses of techniques [2], [3], and experience
chemical entity interactions and produces visualizations using      of the experts in reviewing data of this kind. The confidence
Cytoscape.js. A semantic web service API is provided that enables   assignment is based on a number of factors including degree
search, traversal, and provides composite probabilities for the     of indirection in the assay, sensitivity and specificity of the
resulting graph of biological entities using the SADI web service
framework and Nanopublications. We show how associations
                                                                    approach, and reproducibility of results under different condi-
between a postive control, topiramate, allows us to independently   tions. The confidence scores for both experts were encoded as
reconstruct a positive control of epilepsy and migraine, and        classes of evidence, where each experimental method class was
potential consequences on bone health.                              assigned two superclasses, one for each expert. This ontology
   1
     Department of Computer Science, 2 Department of                was created from a spreadsheet and expanded to full inferences
Chemical & Biological Engineering, Rensselaer Polytechnic           using Pellet [4]. At the same time, SPARQL-based reasoning
Institute, Troy, NY http://www.rpi.edu                              is used to classify nanopublication assertions by their available
3
  5AM Solutions, Inc, Rockville, MD http://5amsolutions.com         evidence, and thereby assign a class of confidence codes to it.
4
  Stanford University, Stanford, CA http://stanford.edu

                                                                    B. SADI Web Service Interface

                      I. I NTRODUCTION                                 We developed four Semantic Automated Discovery and
                                                                    Integration (SADI) web services in Python1 to support easy
   Drug repurposing can often lead to effective new treatments      access to the nanopublications. We use SADI to provide a
for diseases. The ReDrugs system we are developing can              discoverable, consistent API that can be re-used in other
assist in this procedure through the integration of multiple        applications or directly consumed by analytical tools.
systems biology, pharmacology, disease association, and gene
                                                                       The services perform these computational tasks that would
expression databases into a coherent repository of individually-
                                                                    otherwise be difficult to perform with SPARQL queries. The
supported assertions that can each be assigned their own
                                                                    services return only one interaction for each triple (source,
probabilistic value. We have developed an initial database that
                                                                    interaction type, target) but multiple, probabilities per inter-
includes drug/protein, protein/protein, and protein/biological
                                                                    action, and more than one interaction per interaction type.
process associations that is providing us a view into how drugs
                                                                    This is because the interaction may have been recorded in
have the effects that they do.
                                                                    multiple databases, based on different experimental methods.
                                                                    To provide a single probability score for each triple, the in-
                        II. M ETHODS                                teractions are combined. This is done to indicate that multiple
                                                                    experiments that produce the same results reinforce each other,
   We deployed an instance of the RPI semantic web
                                                                    and should therefore give a higher overall probability than
toolsuite, Prizms, at http://redrugs.tw.rpi.edu and the
                                                                    would be indicated by taking their mean.
Comprehensive Knowledge Archive Network (CKAN) to
http://data.melagrid.org to catalog the available datasets [1].                                     n
                                                                                                                                !
Cataloging is an ongoing process, but initial datasets were                                         X
                                                                                                                1
                                                                              P (x1...n ) = CDF           CDF       (P (xi ))
added to the catalog, initializing the Prizms conversion                                            i=1
process. We were then able to use the Prizms infrastructure
to generate RDF for publication to our SPARQL endpoint.               1 For further information on developing web services in Python
We used the BigData RDF store with named graph and text             using SADI, see this tutorial: https://code.google.com/p/sadi/wiki/
indexing support enabled.                                           BuildingServicesInPython

                                                                90
                                                                ICBO 2014 Proceedings

C. User Interface                                                                    gene)/disease associations like the Gene Expression Atlas (in
   Users can search for biological entities and processes, which                     progress) [5]. Further, we are very interested in integrating
can then be autocompleted to specific entities that are in                           the newest version of the Connectivity Map dataset [6], as
the ReDrugS graph. Users can then add those entities and                             it provides gene expression signature similarities for a large
processes to the displayed graph and retrieve upstream and                           number of chemical and genetic perturbations. Finally, as we
downstream connections and link out to more details for every                        develop new hypotheses about potential new drug effects,
entity. Cytoscape.js is used as the main rendering and network                       we plan to test them using a new three-dimensional cellular
visualization tool, and provides node and edge rendering,                            microarray to perform high throughput drug screening [7] with
layout, and network analysis capabilities.                                           reference samples.

                            III. E VALUATION                                                                     V. C ONCLUSION
   In order to evaluate this knowledge base, we developed                               We have developed a framework for collecting, searching,
a demonstration web interface2 based on the Cytoscape.js3 .                          analyzing, and visualizing important components of biological
It lets users enter biological entity names, and as the user                         systems. We were able to build this by converting existing
types, the text is resolved to a list of entities to be selected.                    databases into a common nanopublication structure that uses
After that, the entity is submitted to all three SADI services                       the provenance of the database records to determine the quality
via a basic JavaScript SADI client.4 The resulting interactions                      of any given piece of information through the methods used
and nodes are added to the Cytoscape.js graph, which can be                          to provide it. We use the Semantic Automated Discovery and
laid out according to a number of algorithms. Users are also                         Integration framework to provide simple access to data, and
able to select nodes and populate upstream or downstream                             can visualize results using an existing interaction graph tool.
connections. An example of this is shown in Figure 1. This                           The resulting application makes it easy to search for biological
figure was obtained by putting “Topiramate” as a query in                            entities and see how they interact. We have already found
the search box, which returned all of the biological entities                        some hypotheses of proteins through which drugs influence
that topiramate is directly associated with. We then expanded                        disease conditions. We plan to expand the loaded set of data
the network downstream to see what biological entities are                           with protein/disease associations as well as gene expression
affected by topiramate’s targets.                                                    profiles, and will be using ReDrugS to produce prospective
                                                                                     testable hypotheses.

                                                                                                              ACKNOWLEDGMENTS
                                                                                       A special thanks to Pascale Gaudet, who, with Michel
                                                                                     Dumontier, evaluated the experimental methods and evidence
                                                                                     codes listed in the Protein/Protein Interaction Ontology and
                                                                                     Gene Ontology.

                                                                                                                   R EFERENCES
                                                                                     [1] J. P. McCusker, T. Lebo, M. Krauthammer, and D. L. McGuinness, “Next
                                                                                         Generation Cancer Data Discovery, Access, and Integration Using Prizms
                                                                                         and Nanopublications,” in Data Integration in the Life Sciences. Springer,
Fig. 1.     The ReDrugS user interface allows users to build networks of                 2013, pp. 105–112.
drugs, proteins, and diseases based on provenance-driven data from iRefIndex,        [2] J. C. Obenauer and M. B. Yaffe, “Computational prediction of protein-
DrugBank, UniProt Gene Ontology Annotations, and Online Mendelian                        protein interactions,” in Protein-Protein Interactions. Springer, 2004, pp.
Inheritance in Man (OMIM). Users can select entities and add entities that               445–467.
affect or are affected by the selected entities. They can also search for entities   [3] E. Sprinzak, S. Sattath, and H. Margalit, “How reliable are experimental
by name (here Topiramate was used).                                                      protein–protein interaction data?” Journal of molecular biology, vol. 327,
                                                                                         no. 5, pp. 919–923, 2003.
                                                                                     [4] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz, “Pellet: A
                                                                                         practical owl-dl reasoner,” Web Semantics: science, services and agents
                                                                                         on the World Wide Web, vol. 5, no. 2, pp. 51–53, 2007.
              IV. D ISCUSSION AND F UTURE W ORK                                      [5] R. Petryszak, T. Burdett, B. Fiorelli, N. A. Fonseca, M. Gonzalez-
   We are able to successfully navigate a protein-drug-disease                           Porta, E. Hastings, W. Huber, S. Jupp, M. Keays, N. Kryvych, and
                                                                                         et al., “Expression Atlas update–a database of gene and transcript
interaction graph that is a consensus of 16 diverse sources, to                          expression from microarray- and sequencing-based functional genomics
infer prior probabilities for more than three million individual                         experiments,” Nucleic Acids Research, vol. 42, no. D1, p. D926–D932,
assertions using their provenance and experts’ confidence in                             Jan 2014. [Online]. Available: http://dx.doi.org/10.1093/nar/gkt1270
                                                                                     [6] J. Lamb, E. D. Crawford, D. Peck, J. W. Modell, I. C. Blat, M. J.
different experimental methods and to find drug/disease asso-                            Wrobel, J. Lerner, J.-P. Brunet, A. Subramanian, K. N. Ross et al., “The
ciations that are not directly expressed by any one database.                            Connectivity Map: using gene-expression signatures to connect small
   We plan to add further data sources, especially those that                            molecules, genes, and disease,” science, vol. 313, no. 5795, pp. 1929–
                                                                                         1935, 2006.
provide direct experimental results that predict protien (or                         [7] M.-Y. Lee, R. A. Kumar, S. M. Sukumaran, M. G. Hogg, D. S. Clark, and
  2 http://lod.melagrid.org/redrugs                                                      J. S. Dordick, “Three-dimensional cellular microarray for high-throughput
  3 http://cytoscape.github.io/cytoscape.js
                                                                                         toxicology assays,” Proceedings of the National Academy of Sciences, vol.
                                                                                         105, no. 1, pp. 59–63, 2008.
  4 https://sadi.googlecode.com/svn/trunk/javascript/sadi.js


                                                                                 91
                                                                                                                                                                                                                                                ICBO 2014 Proceedings
                                       2/25/2014                                                                                                RDF Viewer Demo


                                        Viewing Relations, Attributes, and Entities in RDF (VRAER)
                                                                                          A Nanopublication Framework for Biological Networks using Cytoscape.js
                                        https://dl.dropboxusercontent.com/u/9752413/CSHALS2014/attribution.rdf                                     Redraw


                                                                                                                     1,2      1
                                                                                                   James P. McCusker, Rui Yan, Kusum Solanki, 1                                                                                                    1
                                                                                                                                                                                                                                     John Erickson, Cynthia Chang, 1
                                                                                                                                                                                                                                                                                                      Availability: http://redrugs.tw.rpi.edu
                                                                                                                                      9% total height

                                                                                           19% icon height


                                                                                                                                                        3                 1
                                                                                                                                       Michel Dumontier, Jonathan Dordick, and Deborah McGuinness
                                                                                               White space around logo must equal at least
                                                                                                                                                                                                  1
                                                                                              distance between logo icon body and left of text.

                                                    Tall version (Preferred)                  Icon - text ratio: the height of the bar between                                                                                                                                                                                                                                       CORUM
                                             Use this version in the majority of cases.          5AM and Solutions is 19% of icon height
                                                                                                                                                                                                                                                                                                                    NanoPub_501799_Assertion                          MPIDB

                                                                                          NanoPub_501799_Supporting
                                                                                           Space between the body of the icon and the text
                                                                                                                                                                                                              Class: GeneratedBy_MI_0096                                                                   a NanopubDerivedFrompull_down_mi_0096,
                                                                                                                                                                                                                EquivalentClass:
                                                                                                             is 9% total height
                                                                                                                                                                                                                                                                                                           Assertion
                                                                                               NanoPub_501799_Assertion_Activity
      1Rensselaer Polytechnic Institute, Troy, NY                                           NanoPub_501799_Assertion_Activity

                                                                                                                                                                    nanopub                                       wasGeneratedBy some ‘pull down’                                                          wasQuotedFrom:    MI_0463                                                                                     Ontological Resources


                                                                                                                                                                                                                                                         +
                                                                                                     a MI_0096, Activity


                                                                                                                                                                                                              +
                                     pull down                                              a pull down, Activity                                                                                                                                                                                                                                                                                                   Protein/Protein Interaction Ontology,
           25AM Solutions, Inc., Rockville,direct                                                                                                                                                                                                                                                          hadPrimarySource: pubmed:14736710
                                            MD    interaction                                                                                                                                                   SubClassOf: Confidence2                                                                                                                               curated into              DIP
                                                                                                                                                                                                                                                                                                                                                                                                                 Semanticscience Integrated Ontology, Gene
                                                                                                                                                                                                                                                                                                                                                                                                                                  Ontology
                                                                                                             wasGeneratedBy                                                                                                                                                                                wasGeneratedBy:   NanoPub_501799_Assertion_Activity
            3Stanford University, Stanford, CA                                                              wasGeneratedBy
                                                                                                         5AM Solutions Logo Usage Guidelines

                                                                                           NanoPub_501799_Assertion
                                                                                           NanoPub_501799_Assertion
                                                                                           NanoPub_501799_Assertion
                                                                                                   XX
                                                                                                                                                            NanoPub_501799_Attribution
                                                                                                                                                                                                                     ontology                                                                                                has-attribute                             iRefIndex
                                                                                                                                                                                                                                                                                                                                                                 Protein/Protein Interaction
                                                                                                                                                                                                                                                                                                                                                                                                                                vocabularies, relationships


                                                                                                                                                                                                                                                                                                                                                                                                                                                              converted to
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Protein/Biological Process,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Protein/Disease Associations


     We leverage semantic technologies and Cytoscape.js to                                                                                                     NanoPub_501799_Assertion
                                                                                                                                                                  NanoPub_501799_Assertion                                                                                                                                                                               Database                     converted to                                             nanopubs


     create a provenance-aware, probabilistic analysis platform
                                                                                           aaMI_0407
                                                                                              direct interaction                                                                                               Class: Confidence2                                                                                                                                                                      nanopubs                  ReDrugS                      converted to


                                                                                                                                                                                                                                                                        reasoner
                                                                                                                                                            hadPrimarySource: Regulation of the human NBC3
                                                                                                                                                             hadPrimarySource:          pubmed:14736710                                                                                                                   a probability-value                                                                                   Quad Store                     nanopubs


     for systems biology and evaluate its usefulness in
                                                                                           has-target:
                                                                                            has-target:
                                                                                                            SLC4A8
                                                                                                             SLC4A8
                                                                                                                                                                              Na+/HCO3-   cotransporter by
                                                                                                                                                                              carbonic anhydrase II and PKA
                                                                                                                                                             wasQuotedFrom: MI_0463
                                                                                                                                                                                                                 EquivalentClass: ‘has attribute’ min 1                                                                   has-value:    0.95
                                                                                           has-participant: CA2
                                                                                                                                                                                                                   (‘probability value’ and
                                                                                                                                                            wasQuotedFrom: BioGRID                                                                                                                                                                                                                     evidence to                                                           Drug/Protein Interactions
                                                                                            has-participant: CA2
     discovering links between drugs and diseases. A number of                                                                                                                                                                                                                                                                                                                                         probability                  condenses


     databases have been developed that serve as a patchwork                                                                                                                                                         (‘has value’ value 0.95))
                                                                                                                                                                                                                                                                                                                probability
                                                                                                                                                                                                                                                                                                                                                                       Experimental
     across the landscape of systems biology, each focused on                                                                                                                                                                                                                                                                                                            Method                                            ReDrugS API

     different experimental methods, many species, and a wide
                                                                                          Different databases can provide                                                                                                                                                                                                                                              Assessment                         SADI-based API provides interaction network                                      Researchers

                                                                                          the same assertions. This might
                                                                                                                                                                                                                                                                                                                                                                                                           search and expansion based on consensus
                                                                                                                                                                                                                                                                                                                                                                     Confidence scores of                                probabilities
     diversity of inclusion criteria. Systems biology has been used                                                                                                                                                                                                                                                                                                 experimental methods.
                                                                                          be experimental replication! We
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             pose questions
     in the past to generate hypotheses for drug effects, but has                                                                                                                                                                                                                                                                                                                                                                                                               view data
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             explore network
                                                                                                                                                                                                                                                                                                                                                                                                                queries graph

                                                                                            model this with composite z-
                                                                                                                                                                                                                                                                                                                                                                                                                                                  queries graph
     become fragmented under the large number of disparate and
     disconnected databases. In our efforts to create a systematic
     approach to discovering new uses for existing drugs, we
                                                                                                       scores:                                                                                                                                                                                                                                                                               Analytical Tools                                              ReDrugS
                                                                                                                                                                                                                                                                                                                                                                                        Cytoscape, R, Python, etc.                                     Cytoscape.js App
     have developed Repurposing Drugs with Semantics
     (ReDrugS). ReDrugS is a data curation and publication                                                                                                                                                                                                                                                                                                                               Flow of data through the ReDrugS pipeline
     framework that can take data from nearly any database
     containing biological or chemical entity interactions and
                                                                                            P(x) =
     display it using Cytoscape.js. ReDrugS is able to infer
     probability of the assertions based on its provenance using                                        F(x): Cumulative Distribution Function
     experimental methods and data sources. A semantic web                                               (converts z-scores to probabilities)
     service API is provided that can search, traverse, and provide
     composite probabilities for the resulting graph of biological
     entities using the SADI web service framework and
     Nanopublications. We show how associations between a
     postive control, topiramate, allows us to independently
     reconstruct a positive control of epilepsy and migraine, and
     potential consequences on bone health. Future work will
     incorporate additional protein/disease associations, enabling
     hypothesis generation on indirect drug targets, and leading to
     testing the resulting hypotheses using high throughput drug
     screening.


                                       http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fattribution.rdf                                     1/1


                                                                                                                                                                                                                                      http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fassertion_describe.rdf                                                    1/1

rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fassertion.rdf                                                                              1/1


wer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fsupport.rdf                                                                                    1/1          The ReDrugS framework allows users to explore protein/protein,
                                                                                                                                                                                         protein/disease, and drug/protein interactions using full text search,
                                                                                                                                                                                           network expansion, and statistical aggregation. The edges are
                                                                                                                                                                                           rendered with their width mapped to the probability that the link
                                                                                                                                                                                           exists. These statin and topiramate networks were built using a
                                                                                                                                                                                                         “disease finder” network expander.

                                                                                                                                                                                                                                                         92