Introduction

Completing signaling networks by abductive reasoning with perturbation experiments

Adrien Rougny

Yoshitaka Yamamoto

Hidetomo Nabeshima

Gauvain Bourgne

Anne Poupon

Katsumi Inoue

Christine Froidevaux

2 0 BIOS group, INRA , CNRS 1 LIP6, CNRS, Universit Pierre et Marie Curie 2 Laboratoire de Recherche en Informatique, CNRS, Universite Paris-Sud 3 National Institute of Informatics 4 University of Yamanashi

Introduction

Signaling networks model the ow of information occurring in cells after they have been stimulated by an extracellular signal (for instance, a hormone). Together with the rise of available high-throughput data, such networks become always larger and more complex. Consequently, automatic methods have become necessary for their analysis. Methods relying on discrete formalisms, such as logical ones, seem well suited as numerical parameters are often di cult to obtain.

One fundamental task of cell signaling biology is to test whether available experimental data can be explained by a given signaling network, and else, to modify the network (by adding or removing an edge) or clean the data so that it can be explained. Methods to accomplish this task take as input a representation of signaling networks called interaction graph (IG). In IGs, nodes are molecules (or activities that originate from molecules) and arcs are positive or negative in uences that molecules have on each other.

These various works mainly di er on four aspects. (i) The semantics used to interpret IGs: In [11] and [14, 12], authors interpret the arcs of an input IG under the path semantics, introduced in [13], or under the Sign Consistency Model [15], respectively, whereas authors of [4, 10] interpret IGs as causality networks, which implies the use of a more constrained semantics. Authors of [14] also consider the problem within the boolean network semantics. (ii) The experimental data they take into account: Methods of [11, 4] take as input steady-state shift experiments, whereas methods of [14, 12, 10] take into account perturbation experiments. (iii) The modi cations of the network or the cleaning of the data they propose to explain unexplained experimental results: The methods in [11, 12] provide possible modi cation of the input network or the data, whereas the method in [4, 10] allows only the completion of the input network by addition of edges. (iv) The formalism they use: graph theory [14], integer linear programming [12], answer set programming [11] and rst-order logic [4, 10].

In this work, we propose a method to check whether a set of perturbation experiments can be explained by a signaling network represented in SBGN-AF [16], and else, to complete the network by adding edges to the network.

SBGN-AF is a standard to represent signaling and gene regulation networks. It extends the classical IG representation by including logical operators (the AND and the OR operator) that permit to specify logical functions within the graph. Taking as input SBGN-AF maps, we extend the path semantics of [13] by considering such operators in the de nition of the paths, that we formalize in rst-order logic, based on the translation of SBGN-AF maps into predicates introduced in [8]. We also interpret perturbation experiments making a stronger assumption than in [14, 12, 10], resulting in a more constrained setting (cf. section 3). We perform both the explanation and the completion tasks within the same abductive framework by using the consequence nding method from SOLAR [7]. 2

Paths semantics with logical operators

Positive and negative paths of an SBGN-AF map are built by transitive closure of the elementary arcs.. We interpret a positive path from an activity A to an activity B as a possibility to explain an increase (resp. decrease) of B by an increase (resp. decrease) of A. Analogously, we interpret a negative path from A to B (denoted by inhibits (A; B)) as a possibility to explain a decrease (resp. increase) of B by an increase (resp. decrease) of A. Positive and negative paths are denoted by stimulates (A; B) and inhibits (A; B), respectively. The following axioms allow building positive and negative paths using the in uences and the logical operators of an SBGN-AF map. Axioms ( 1-6 ) are the main transitivity axioms, while axioms ( 7-9 ) and ( 10-12 ) express the semantics of the AND and the OR logical operators, respectively. 3

Formalization of experimental observations

We consider experimental observations that originate from perturbation experiments. Such experiments consist in comparing the rate of an activity aT between two batches of cells each having received a particular treatment. In the control batch, cells are stimulated by a set of molecules, whose corresponding set of activities are denoted by S. In the experimental batch, cells are rst treated with a number of inhibitors that suppress a set of activities denoted by KO. The cells are then stimulated as in the control batch. We introduce a variable e that takes the value # (resp. ") if and only if (i ) the rate of aT is lower (resp. higher) in the experimental batch than in the control batch. We denote such an experimental observation by the tuple (S; KO; aT ; e).

For a given experimental observation E = (S; KO; aT ; e), if e =#, then aT is more inhibited or less stimulated by the activities of S in the experimental batch than in the control batch due to the suppression of at least one activity of KO. In the cells of the experimental batch, as all activities of KO are suppressed, they can no longer be performed by the cells. Consequently, the lower overall stimulation of aT can only be caused by suppressing at least one positive path from an activity of S to aT . Thus, e =# i there exists at least one positive path outgoing from an activity of S, incoming to aT , and passing through an activity of KO. Analogous reasoning is made for e =", hence e =" i there exists at least one negative path from an activity of S to aT and passing through an activity of KO.

Here, we make the hypothesis that suppressing the activities of KO has an e ect on the pathways that link the activities of S to aT . That is not the case in [14, 12, 10], where the authors make the assumption that suppressing the activities of KO only a ects the pathway between activities of KO and aT , thus not taking into account the activities of S. As a result, our interpretation is more constrained. Therefore, experiments that could be explained by a network with the interpretation of experiments made in [14, 12, 10] could no longer be explained within our setting, resulting in the discovery of new possible arcs.

To explicitly describe the role of S, we add one virtual activity node aS to the prior network so that for each activity a 2 S, we add a stimulation arc from aS to a. According to our interpretation of perturbation experiments and our transitivity axioms, each experimental observation E = (S; KO; aT ; e) is formalized as the following disjunction OE :

OE = OE =

_ aKO2KO

_ aKO2KO (stimulates (aS ; aKO) ^ inhibits (aKO; aT ))_ (inhibits (aS ; aKO) ^ stimulates (aKO; aT )) if e ="; (stimulates (aS ; aKO) ^ stimulates (aKO; aT ))_ (inhibits (aS ; aKO) ^ inhibits (aKO; aT )) if e =# : ( 13 )

Given an SBGN-AF map N and an experimental observation E = (S; KO; aT ; e), we want to check if E can be explained by N . If not, we want to nd a minimal set of arcs that complete N in order to explain E . Both tasks can be realized within the same abductive setting, presented hereafter.

Abductive setting for the completion task

Let N be an SBGN-AF map and E = (S; KO; aT ; e) be an experimental observation. Let B be the background theory formed of the translation of N into predicates and axioms ( 1-12 ), and OE be the observation formalized from E. Then, solving both the explanation and the completion task consists in searching for all minimal hypotheses H such that B [ H j= OE and B [ H 6j= . If B j= OE , then clearly N explains E.

For the computation of H, we can use the consequence nding system SOLAR [7], that allows to de ne a set of abducibles by means of the language bias P describing the negations of desirable hypotheses, and seek for all the subsumption-minimal hypotheses belonging to P. In the completion task, every added in uence is either a stimulation or an inhibition. Besides, we restrict the number of added in uences to at most two for each observation in order to get more realistic hypotheses that could be tested experimentally. Then, P is given under the form hf:stimulates( ; ); :inhibits( ; )g; Length 2i, where Length is the number of literals (i.e., instances of :stimulates( ; ) and :inhibits( ; )) allowed in the hypothesis.

In general, SOLAR can produce a large amount of hypotheses. To reduce it, we perform a rst selection that operates directly at the generation step or during a post- ltering step. We do not consider hypotheses that generate a loop in the prior map and those that contain constants mapped to logical operators or the constant aS . We then use a greedy algorithm to select hypotheses based on the decreasing number of experimental observations they can explain. 5

Application: the FSHR-induced network

We applied our method to two pathways of the FSHR-induced signaling network, namely the G protein pathway and the PI3K pathway taken from [1] (See Fig. 1). We built a dataset of 29 experimental observations by gathering and formalizing reliable experimental results from the literature related to the FSHR. For each experiment, only one activity suppressor was used. Consequently, for each experimental observation, the set KO is merely a singleton.

Among the 29 di erent experimental observations, 17 observations could be explained by the network, and the 12 remaining ones were used to complete the network. For each of them we computed minimal hypotheses su cient to explain it when added to the network. We ran SOLAR (ver. 2) with 12 machines (Intel Xeon E-1230 V2 (3.3GHz) and 8GB RAM) in parallel, with a limited executing time of 4 hours.

Each of the 12 observations could be explained by hypotheses containing a unique in uence, although more complex hypotheses were also generated. Consequently, we chose to focus on the hypotheses containing a unique in uence. Using our greedy algorithm, we ranked more than 250 hypotheses containing a unique in uence generated during the abduction phase, and selected 28 among them. Results are shown in Table 1. 1 (fcamp; epacg; pi3k; akt; #) 2 (fcamp; epacg; pi3k; rps6; #) 3 (fcamp; epacg; p38mapk; akt; #) 4 (fcamp; epacg; pi3k; p70s6k; #) 5 (fcamp; epacg; pi3k; rps6; #) 6 (ffsh fshr; epacg; pka; akt; ") 7 (ffsh fshr; epacg; p38mapk; akt; #) 8 (fcamp; epacg; pka; p70s6k; ") 9 (ffsh fshr; epacg; p38mapk; erk12; #) 10 (ffsh fshr; epacg; camp epac; erk12; #) 11 (ffsh fshr; epacg; mek; p38mapk; #) 12 (fcamp; epacg; pka; akt; ") Table 1: Application to the FSHRinduced network. Lines correspond to experimental observations, columns to selected hypotheses. A cell is green Figure 1: The FSHR-induced netif the hypothesis explains the observa- work, represented in SBGN-AF. The tion. Experimental observations that are G protein pathway is represented in red explained by the network, as well as and the PI3K pathway in blue. hypotheses ( 8-28 ) are omitted.

Hypothesis ( 1 ) proposes that p38MAPK could activate PI3K. In [3], the authors make the hypothesis of such a crosstalk in Granulosa Cells. Moreover, activation of Akt in Zn2+-treated cells has been shown to pass through PI3K downstream of p38MAPK [9]. This result shows that p38MAPK is able to trigger the PI3K pathway in Zn2+ treated cells, and thus this reinforces our hypothesis for FSH stimulated cell. Hypotheses ( 2-4 ) all suggest an inhibitory crosstalk between p38MAPK and the RAF/MEK/ERK pathway. In [5], the authors clearly state that p38MAPK inhibits the RAF/MEK/ERK pathway during muscle di erentiation, thus suggesting a potential in uence of p38MAPK on the RAF/MEK/ERK pathway. Hypotheses ( 5-28 ) all suggest a crosstalk between the pathway downstream of MEK and the cAMP-EPAC pathway. A crosstalk between ERK and cAMP has indeed been evidenced in [6], even if it involves a feedback loop (excluded in our work).

According to our literature review, top ranked hypotheses are more promising than low ranked ones, indicating that selecting hypotheses based on the number of observations they can explain seems to be appropriate.

Interestingly, experimental results ( 1,2,4,5,8 ) would have been explained by the network considering the less constrained interpretation of experimental results given in [14, 12, 10], and would not have allowed to generate any hypothesis. 6

Concluding remarks

We have proposed a logical formalization of SBGN-AF maps and transitivity axioms that allow to check, given an SBGN-AF map, whether some experimental observations can be explained by the map, and else to generate hypotheses that complete the map. Application to the FSHR-induced signaling network shows that the method leads to plausible hypotheses, some of which having already been demonstrated in other signaling systems, and thus that it is promising.

1. Gloaguen et al.: Mapping the follicle-stimulating hormone-induced signaling networks . Frontiers in endocrinology 2 ( 2011 )

2. Choi et al.: Gonadotropin-stimulated epidermal growth factor receptor expression in human ovarian surface epithelial cells: involvement of cyclic amp-dependent exchange protein activated by camp pathway . Endocrine-related cancer 16(1) , pp. 179 { 188 ( 2009 )

3. Gonzalez-Robayna et al.: Follicle-stimulating hormone (fsh) stimulates phosphorylation and activation of protein kinase b (pkb/akt) and serum and glucocorticoidinduced kinase (sgk): evidence for a kinase-independent signaling by fsh in granulosa cells . Molecular Endocrinology 14 ( 8 ), pp. 1283 { 1300 ( 2000 )

4. Inoue , K. , Doncescu , A. , Nabeshima , H.: Completing causal networks by meta-level abduction . Machine learning 91(2) , pp. 239 { 277 ( 2013 )

5. Lee et al.: Activation of p38 mapk induces cell cycle arrest via inhibition of raf/erk pathway during muscle di erentiation . Biochemical and biophysical research communications 298(5) , pp. 765 { 771 ( 2002 )

6. Baillie et al. Phorbol 12-myristate 13-acetate triggers the protein kinase Amediated phosphorylation and activation of the PDE4D5 cAMP phosphodiesterase in human aortic smooth muscle cells through a route involving extracellular signal regulated kinase (ERK) . Molecular Pharmacology 60 ( 5 ), pp. 1100 { 1111 ( 2001 )

7. Nabeshima , H. , Iwanuma , K. , Inoue , K. : Solar: a consequence nding system for advanced reasoning . In: Automated Reasoning with Analytic Tableaux and Related Methods , pp. 257 { 263 . Springer ( 2003 )

8. Rougny et al.: Analyzing sbgn-af networks using normal logic programs . Logical Modeling of Biological Systems , pp. 325 { 361 ( 2013 )

9. Wu et al.: p38 and egf receptor kinase-mediated activation of the phosphatidylinositol 3-kinase/akt pathway is required for zn2+-induced cyclooxygenase-2 expression . AJP-Lung Cellular and Molecular Physiology 289 ( 5 ), L883{L889 ( 2005 )

10. Yamamoto et al.: Completing sbgn-af networks by logic-based hypothesis nding . In: Formal Methods in Macro-Biology , pp. 165 { 179 . Springer ( 2014 )

11. Gebser et al.: Repair and Prediction (under Inconsistency) in Large Biological Networks with Answer Set Programming . In: KR ( 2010 , April)

12. Melas et al.: Detecting and removing inconsistencies between experimental data and signaling network topologies using integer linear programming on interaction graphs . PLoS computational biology 9 ( 9 ), p. e1003204 ( 2013 )

13. Klamt et al. A methodology for the structural and functional analysis of signaling and regulatory networks . BMC bioinformatics 7 ( 1 ), p. 56 ( 2006 )

14. Samaga et al. The logic of EGFR/ErbB signaling: theoretical properties and analysis of high-throughput data . PLoS Comput Biol 5 ( 8 ), p. e1000438 ( 2009 )

15. Siegel et al. Qualitative analysis of the relation between DNA microarray data and behavioral models of regulation networks . Biosystems 84 ( 2 ), p. 153 - 174 ( 2006 )

16. Mi et al. Systems biology graphical notation: activity ow language level 1 .

Nature

Precedings ( 2009 )