Role-based representation and inference of biochemical processes Christian Bölling1,* Michel Dumontier2, Michael Weidlich3 and Hermann-Georg Holzhütter1 1 Institute of Biochemistry, Charité Universitätsmedizin Berlin, Seestr.73, 13347 Berlin, Germany 2 Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Canada K1S5B6 3 Department of Computer Science, Humboldt-Universität Berlin, Rudower Chaussee 25, 12489 Berlin, Germany ABSTRACT Web Rule Language (SWRL, Horrocks et a.l., 2004), our We present a streamlined data model for representation of bio- approach facilitates biologically-relevant inferences. chemical processes which consistently adopts a perspective on these processes as molecular events. Our model references a small number of established founda- 2 RESULTS tional relations predominantly from RO and employs BFO as upper ontology. It addresses some of the limitations in terms of interopera- The perspective taken on biochemical processes in this work bility, semantic compatibility and expressivity encountered in other is that individual molecular entities, i.e. single molecules, approaches to modeling biochemical processes. interact with each other in various processes through which Using a role-based approach we demonstrate how from this per- spective various metabolic and transport processes can be consis- molecular structures of various complexity are formed and tently represented across different levels of granularity and how dynamic biochemical and physiological phenomena on the relations between processes like sequence of events can be in- macroscopic scale are produced. In our OWL2 representa- ferred. tion, individuals of classes describing biochemical processes represent singular molecular events, i.e. directed transitions 1 INTRODUCTION of a chemical system from an initial to a terminal state in- Computational approaches to study biochemistry require volving individual molecules. machine accessible representations of biochemical knowl- edge. While various schemes have been specified for the 2.1 OWL-constructs for role-based representa- representation of biochemical processes, their underlying tion of biochemical processes conceptualizations differ with regard to biochemical scope, As a matter of convenience, our representation uses the molecular detail, and provision of meta-data and adopt for- class and property distinctions identified by the Basic For- mal syntax and semantics to varying degrees. BioPAX mal Ontology (BFO, Grenon et al., 2004) and the OBO Re- (Demir et al., 2010) provides a basic ontology to exchange lation Ontology (RO, Smith et al., 2005). Molecules are data on biochemical pathways and their interactions, with an types of bfo:object, roles are types of bfo:role and emphasis that these represent bulk phenomena, as opposed biochemical processes are types of bfo:process. to single molecular events. Although the BioPAX ontology We developed a basic ontology of roles that chemical par- is specified using the Web Ontology Language (OWL, ticipants hold in the context biochemical processes (Fig. 1). Hitzler et al., 2009), the axioms are mostly there to con- The role ontology includes a role for catalysts (cata- strain the types of relations allowed, as opposed to a more lyst_role), reactants (reactant_role), substrates expressive description of pathways and the molecular par- (substrate_role), products (product_role), effec- ticipants found therein. Towards addressing these limita- tors such as activators (activator_role, enzy- tions, an OWL-based representation was put forward to de- matic_activator_role) and inhibitors (inhibi- scribe types of biochemical pathways and reactions in terms tor_role, enzymatic_inhibitor_role). Consis- of the molecular participants, their parts and the roles that tent with IUPAC and IUBMB terminology (IUPAC, 2011) they play (Dumontier, 2008). Here, we extend on that pre- reactants are participants that are present at the onset and liminary work with a basic ontology of biochemical proc- products are participants that are present at the end of the esses consisting of one or more biochemical reactions, and process. Substrates are reactants that are converted to prod- specifying roles that molecular entities play therein. Ac- ucts by the activity of one or more enzymes. Enzymes are companied by relevant rules specified using the Semantic catalysts of mostly protein nature. Effectors are chemical entities that affect the functionality of enzymes with respect to the rate of reaction. We further developed a simple ontology of biochemical * To whom correspondence should be addressed: chris- tian.boelling@charite.de processes which distinguishes between elementary reactions $ courier typeface denotes OWL-classes, italics denote OWL- and overall reactions (Fig. 1). Elementary reactions pertain properties to fine-grained mechanistic aspects of biochemical proc- 1 Bölling et al. esses and include association, dissociation and conversion events. Overall reactions comprise of single- and multi- enzyme reactions and net reactions catalyzed by structurally independent enzymes which as such reflect traditional bio- chemical pathways. The roles of chemical entities may be described in the context of the biochemical reactions in which they are realized. Stoichiometry may also be speci- fied as cardinality restriction on the realizes property between the process and the role. For example, hexokinase- like reactions, i.e. the conversion of glucose (glc) and ATP to glucose-6-phosphate (g6p) and ADP are defined as: (realizes exactly 1 (product_role and (has_bearer some adp))) and (realizes exactly 1 (product_role and (has_bearer some g6p))) and (realizes exactly 1 (reactant_role and (has_bearer some atp))) Fig. 2. Biochemical processes at different levels of granularity and (realizes exactly 1 (reactant_role using reactant and product roles. Boxes denote named OWL- and (has_bearer some glc))) individuals. Dotted arrows denote object properties as labelled. Due to their status in BFO as specifically dependent contin- Solid and dashed arrows denote substrate and product roles, resp. which connect processes in which they are realized with molecules uants, these roles are borne only by single molecules, thus by which they are borne. reaction stoichiometry is duly reflected in our representa- tion. preceded_by relations from RO as outlined in Dumon- tier 2008. In addition directly_preceded_by, as a sub-property of RO’s preceded_by connects instances of processes which are coupled by joint participants which bear product roles in the preceding and reactant roles in the succeeding process (Fig. 3). In contrast to the immedi- ately_preceded_by relation defined in RO, this prop- erty relates processes which are not necessarily temporally Fig. 1. Taxonomy of the ontology of molecular roles and bio- chemical processes. adjacent. 2.2 Representation of biochemical processes at various levels of granularity Both elementary reactions and overall reactions can be de- scribed in terms of its reactants and products, i.e. in terms of the molecular roles being realized in a biochemical process. For example, the association of ATP and the hexokinase enzyme is an elementary reaction of the hexokinase reac- tion, while the overall phosphorylation of glucose with ATP involves ATP, glucose, ADP and glucose-6-phosphate (Fig. 2). In the case of hexokinase, we observe that it plays Fig. 3. Sequence of biochemical processes. Boxes denote named the role of a reactant in the elementary reactions which are OWL-individuals. Dotted arrows denote object properties as labelled. Other arrows denote participant roles connecting part of the hexokinase reaction and glycolysis, while it plays processes with molecules as indicated. the enzyme role in those more “macro” reactions. Addi- tional detail, such as the participation of catalysts or cofac- 2.4 Location of processes and representation of tors can be represented with the corresponding role classes, making clear the nature of their participation. transport reactions Location of molecules can be represented using the RO rela- 2.3 Relations between processes: process parts tions located_in. Location of processes, i.e. where they and sequence occur is specified using the occurs_in property. The relation of more complex processes to their constituent Transport processes are represented also in terms of their process parts can be represented by part_of and reactants and products, formalizing the transported entities 2 Role-based representation and inference of biochemical processes as individual instances of the corresponding chemical spe- 3 DISCUSSION cies connected to instances of reactant role and product role In this representation biochemical processes can be consis- via the bears property and to instances of the correspond- tently described on different levels of granularity accounting ing locations via located_in (Fig. 4). For example, the for different roles of participating molecules on different antiport of 2-ketoglutarate (2kg) and malate (mal) across the levels. By including location and transport even complex mitochondrial membrane is defined as: biochemical processes can be represented using a small set (realizes exactly 1 (product_role of basic relations. This provides a stable platform for inter- and (has_bearer some (2kg operability with ontological descriptions of related biologi- and (located_in some mitochondrion))))) and (realizes exactly 1 (product_role cal entities (e.g. molecules, tissues, taxa) which could also and (has_bearer some (mal be used to represent and interrelate GO biological processes and (located_in some cytosol))))) via their participants. Our representation applies a consistent and (realizes exactly 1 (reactant_role perspective on biochemical processes as microscopic and (has_bearer some (2kg chemical events. This provides, together with the formal and (located_in some cytosol))))) semantics of OWL2, a clear semantic basis to represent and (realizes exactly 1 (reactant_role and (has_bearer some (mal complex processes and complex structure-function relation- and (located_in some mitochondrion))))) ships and to interpret their asserted and inferred properties in terms of biochemical entities. For example, substrate Fig. 4. Representation of transport reactions. Boxes denote named channeling can be represented through molecule instances OWL-individuals. which bear product and substrate roles for the preceding and Dotted arrows denote object properties as succeeding reaction in a straightforward manner. Thus, our labelled. Other arrows representation is suited to overcome some of the limitations denote participant roles regarding interoperability, semantic compatibility and ex- connecting processes pressivity that have been identified in other models and molecules as (Dumontier, 2008) which makes it a promising base for rep- indicated. resentation and analysis of the biochemistry of organs like the liver, i.e. of complex systems, comprising interrelated 2.5 Inference of process and entity characteristics processes on several scales. Reasoning over the OWL representation of biochemical processes as described above enables the following: ACKNOWLEDGEMENTS CB and MW were supported by the German Federal Minis- Classification of processes: Processes can be classified ac- try of Education and Research (BMBF) within the Virtual cording to specialization of roles and chemicals. For Liver Network (grant numbers 0315756, 0315741). instance, a process involving a chemical as a reactant would subsume a process involving that chemical as a REFERENCES substrate. Given an ontology of chemicals (e.g. ChEBI), similar classification of processes are enabled. Demir, E. et al. (2010). The BioPAX community standard for pathway data Location of molecules: This can, for participants of local- sharing. Nat Biotechnol 28, 935-942. ized processes, be inferred from the location of the Dumontier, M. (2008). Situational Modeling: Defining Molecular Roles in process using SWRL: biochemical_process(?p), Biochemical Pathways and Reactions. Proceedings of the 5th OWLED occurs_in(?p,?l), has_participant(?p,?o) Workshop on OWL: Experiences and Directions. -> located_in(?o,?l)(the has_participant Grenon, P., Smith, B., Goldberg, L. (2004). Biodynamic ontology: apply- property can be inferred for any bearer of any role real- ing BFO in the biomedical domain. Stud Health Tech Inf 102, 20–38. ized in a reaction). Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. eds. Sequence of processes: within the same location this can be (2009). OWL2 Web Ontology Language: Primer. Latest version avail- deduced by invoking the SWRL-rule: prod- able at http://www.w3.org/TR/owl2-primer/ uct_role(?r1), reactant_role(?r2), Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S., Grosof, B., Dean, has_bearer(?r1,?o), has_bearer(?r2,?o), M. (2004). SWRL: A semantic web rule language combining OWL realizes(?p1,?r1), realizes(?p2,?r2), oc- and RuleML. Latest version available at curs_in(?p1,?l), occurs_in(?p2,?l) -> di- http://www.w3.org/Submission/SWRL/ rectly_precedes(?p1,?p2) and the transitivity of IUPAC (2011) IUPAC Compendium of Chemical Terminology - the Gold the preceded_by relation. Book. Latest version available at http://goldbook.iupac.org/ Smith, B., Ceusters, W., Klagges, B., Köhler, J., Kumar, A., Lomax, J., Mungall, C., Neuhaus, F., Rector, A. L., Rosse, C. (2005). Relations in biomedical ontologies. Genome Biol 6, R46. 3