Integrating prior knowledge in automatic network reconstruction Marie C.F. Favre1? , Wolfgang Marwan2 , Annegret K. Wagler1 {marie.favre,wagler}@isima.fr, wolfgang.marwan@ovgu.de 1 Laboratoire d’Informatique, de Modélisation et d’Optimisation des Systèmes (LIMOS, UMR CNRS 6158), Université Blaise Pascal, Clermont-Ferrand, France 2 Magdeburg Center for Systems Biology (MaCS), Otto-von-Guericke-Universität, Magdeburg, Germany Abstract. The reconstruction of models from experimental data is a challenging problem due to the inherited complexity of biological sys- tems. We developed an exact, exclusively data-driven approach to re- construct Petri nets from experimental time-series data. Our approach aims at reconstructing all such netwoks that fit the given experimen- tal data, to provide all possible alternatives of mechanisms behind the experimental observations, which typically results in a large set of solu- tion alternatives. To keep this solution set reasonably small while still guaranteeing its completeness, we firstly generate only Petri nets being minimal in the sense that all other networks fitting the data contain the reconstructed ones. We further aim at avoiding the generation of mini- mal solutions which are “technically correct” but would be ruled out later during a subsequent verification process to check whether the returned solutions are “biological meaningful” or even contradict well-established biological knowledge. For that, we propose to extent the considered input (beyond the information given with the experimental time-series data) for the reconstruction process and demonstrate with the help of a run- ning example the influence on the generated solution set. 1 Introduction Systems biology aims at the integrated experimental and theoretical analysis of molecular or cellular networks to achieve a holistic understanding of biological systems and processes. To gain the required insight into the underlying biological processes, experiments are performed and experimental data are interpreted in terms of models. Depending on the biological aim, the type and quality of the available data, different types of mathematical models are used and correspond- ing reconstruction methods have been developed. Our work is dedicated to Petri nets which turned out to coherently model both static interactions in terms of networks and dynamic processes in terms of state changes, see e.g. [7,10]. ? This work was funded by the French National Research Agency, the European Com- mission (Feder funds) and the Région Auvergne within the LabEx IMobS3 . M. Heiner (Ed.): BioPPN 2014, a satellite event of PETRI NETS 2014, CEUR Workshop Proceedings Vol. 1159, 2014. 46 M.C.F. Favre, W. Marwan and A.K. Wagler In fact, a network P = (P, T, A, w) reflects the involved components by places p ∈ P and their interactions by transitions t ∈ T , linked by weighted directed arcs (p, t), (t, p) ∈ A. Each place p ∈ P can be marked with an integral number xp of |P | tokens defining a system state x ∈ Z+ , i.e., we obtain X := {x ∈ Z|P | : xp ≥ 0} as set of potential states. A transition t ∈ T is enabled in a state x if xp ≥ w(p, t) for all p with (p, t) ∈ A, we denote the set of all such transitions by T (x). Switching t ∈ T (x) yields a successor state succ(x) = x0 with x0p = xp − w(p, t) for all (p, t) ∈ A and x0p = xp + w(t, p) for all (t, p) ∈ A. Dynamic processes are represented by sequences of such state changes. Our central question is to reconstruct models of this type from experimental time-series data by means of an exact, exclusively data-driven approach. This approach takes as input a set P of places and discrete time-series data X 0 ⊆ X given by sequences (x0 ; x1 , . . . , xk ) of experimentally observed system states. The goal is to determine all Petri nets (P, T, A, w) that are able to reproduce the data, i.e., that perform for each xj ∈ X 0 the experimentally observed state change to xj+1 ∈ X 0 in a simulation. Hence, in contrast to the normally used stochastic simulation, we require that for states where at least two transitions are enabled, the decision between the alternatives is not taken randomly, but a specific transition is selected. Thus, (standard) Petri nets have to be equipped with additional activation rules to force the switching of specific transitions (to reach xj+1 from xj ), and to prevent all others from switching. For that, different types of additional activation rules are possible. On the one hand, control-arcs are used to represent catalytic or inhibitory dependencies. An extended Petri net P = (P, T, (A ∪ AR ∪ AI ), w) is a Petri net which has, besides the (standard) arcs in A, two additional sets of so-called control-arcs: the set of read-arcs AR ⊂ P × T and the set of inhibitor-arcs AI ⊂ P × T ; we denote the set of all arcs by A = A ∪ AR ∪ AI . Here, a transition t ∈ T (x) coupled with a read-arc (resp. an inhibitor-arc) to a place p ∈ P can switch only if at least w(p, t) tokens (resp. less than w(p, t) tokens) are present in p; we denote by TA (x) the set of all such transitions. On the other hand, in [9,12,13] priority relations among the transitions of a network are employed to reflect the rate of the corresponding reactions, where the fastest reaction has highest priority and, thus, is taken. In Marwan et al. [9] it is proposed to model such priorities with the help of partial orders O on the transitions. We call (P, O) an (extended) Petri net with priorities, if P = (P, T, A, w) is an (extended) Petri net and O a priority relation on T . Priorities can prevent enabled transitions from switching: For each state x, a transition t ∈ TA (x) is allowed to switch only if there is no other enabled transition in TA (x) with higher priority; we denote by TA,O (x) the set of all such transitions. We call (P, O) X 0 -deterministic if TA,O (x) contains at most one element for each experimentally observed state x ∈ X 0 . Based on earlier results in [3,4,5,9,13], an integrative method to reconstruct all X 0 -deterministic extended Petri nets with priorities fitting given experimental time-series data X 0 was pro- posed in [6] (see Section 2). Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 47 Our approach aims at reconstructing all netwoks of the studied type that fit the given experimental data, to provide all possible alternatives of mechanisms behind the experimentally observed phenomena. Typically, this results in a large set of solution alternatives. To keep this solution set reasonably small while still guaranteeing its completeness, we generate only Petri nets being minimal in the sense that all other networks fitting the data contain the reconstructed ones. Here, we propose a method to insert only minimal sets of control-arcs during the reconstruction process (see Section 2). We further aim at avoiding the generation of minimal solutions which are “technically correct” but would be ruled out later during a subsequent verification process to check whether the returned solutions are “biological meaningful” or even contradict well-established biological knowledge. For that, we extent the considered input by integrating biological prior knowledge (beyond the information given with the experimental time-series data) into the reconstruction process and demonstrate with the help of a running example the influence on the generated solution set (see Section 3). We close with some concluding remarks and lines of future work. 2 Reconstructing extended Petri nets with priorities We describe the input, the main ideas, and the output of our approach from [6]. Input. A set of components P (standing for proteins, enzymes, genes, receptors or their conformational states, later represented by the set of places) is chosen which is expected to be crucial for the studied phenomenon. If P contains known P -invariants (subsets P 0 ⊆ P of places where the sum of the number of all tokens on all the places in P 0 is constant, e.g., different functional states of a cell or conformational states of a molecular complex), they are collected in a set IP . To perform an experiment, one first triggers the system in some state x0 (by external stimuli like the exposition to a pathogen), to generate an initial state x1 . Then the system’s response to the stimulation is observed and the resulting state changes are measured for all considered components at certain time points. This yields a sequence of (discrete or discretized) states xj ∈ Z|P | reflecting the time-dependent response of the system to the stimulation in x1 , which typically terminates in a terminal state xk where no further changes are observed. The corresponding experiment is X 0 (x1 , xk ) = (x0 ; x1 , . . . , xk ). Several experiments 0 starting from different initial states in a set Xini ⊆ X 0 , reporting the observed 0 state changes, and ending at different terminal states in a set Xterm ⊆ X 0 de- scribe the studied phenomenon, and yield experimental time-series data of the form X 0 = {X 0 (x1 , xk ) : x1 ∈ Xini 0 , xk ∈ Xterm 0 }. Thus, the input of the recon- 0 struction approach is given by (P, IP , X ). Example 1. As running example, we will consider experimental biological data from the light-induced sporulation of Physarum polycephalum as in [6,13]. In P. polycephalum plasmodia, the photoreceptor involved in the control of sporula- tion Spo is a protein which occurs in two stages PF R and PR . The develop- mental decision of starving P. polycephalum plasmodia to enter the sporulation Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 48 M.C.F. Favre, W. Marwan and A.K. Wagler pathway is controlled by environmental factors like visible light [11]. If the dark- adapted form PF R absorbs far-red light F R, the receptor is converted into its red-absorbing form PR , which causes sporulation after several hours [8]. If PR is exposed to red light R, it is photo-converted back to the initial state PF R (pho- toreversion), which prevents sporulation if the red light pulse is given shortly after the far-red pulse, but not if the red pulse is delivered after more than an hour when the phytochrome photoreceptor has had sufficient time to cause the formation of a biochemical downstream signal G that subsequently causes the sporulation of the cell. The changes between the stages PF R and PR only require fractions of seconds and can be experimentally observed due to a change of color of the phytochrome protein. The experimental setting consists of P = {F R, R, PF R , PR , G, Spo}, X 0 (x1 , x4 ) = (x0 ; x1 , x2 , x3 , x4 ), Xini 0 = {x1 , x5 , x6 }, 0 5 0 2 5 0 0 IP = {PF R , PR }, X (x , x ) = (x ; x , x ), Xterm = {x4 , x0 , x8 } 0 6 8 3 6 7 8 X (x , x ) = (x ; x , x , x ), as input for the algorithm, we present all observed states schematically in Fig 1. 0 1 2 3 4  x  x  x  x  x  x  xF R 0 1 0 0 0  xR   0   0   0   0   0      FR   d1   d2   d3    xPF R   1   1   0   0   0               xPR   0   0   1   1   1               xG   0   0   0   1   1  xSpo 0 0 0 0 1 R R 5 6 7 8 x  x  x  x  0 0 0 0  1   1   0   0      d5   d6    0   0   1   1  d4          1   1   0   0           0   1   1   1  0 0 0 1 Fig. 1. Experimental time-series data X 0 for the light-induced sporulation of Physarum polycephalum. The experimental setting uses the set P = {F R, R, Pf r , Pr , G, Spo} of studied components, observed states are represented by vectors of the form x = (xF R , xR , xPF R , xPR , xG , xSpo )T having 0/1-entries only. Dashed arrows represent a stimulation or perturbation of the system, solid arrows the observed responses. For a successful reconstruction, the data X 0 need to satisfy two properties: re- producibility and monotonicity. The data X 0 are reproducible if for each xj ∈ X 0 there is a unique observed successor state succX 0 (xj ) = xj+1 ∈ X 0 . Reproducibil- ity is obviously necessary and can be ensured by preprocessing [15]. Whether a state xj ∈ X 0 and its observed successor succX 0 (xj ) = xj+1 ∈ X 0 are also con- secutive system states depends on the chosen time points to measure the states in X 0 . In fact, xj+1 may be obtained from xj by a switching sequence of some length, where the intermediate states are not reported in X 0 . The data X 0 are monotone if for each pair (xj , xj+1 ) ∈ X 0 , the values of the elements do not oscillate in the possible intermediate states between xj and xj+1 . It was shown in [4] that monotonicity has to be required, too (which is equivalent to demand that all essential responses are indeed reported in X 0 ). Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 49 Output. An extended Petri net with priorities (P, O) with P = (P, T, A, w) fits the given data X 0 when it is able to perform every observed state change from xj ∈ X 0 to xj+1 ∈ X 0 . For that, associate with P an incidence matrix M ∈ Z|P |×|T | whose rows correspond to the places p ∈ P and whose columns M·t to the update vector r t of the transitions t ∈ T :   −w(p, t) if (p, t) ∈ A, t rp = Mpt := +w(t, p) if (t, p) ∈ A,   0 otherwise. Reaching xj+1 from xj by a switching sequence using the transitions from a subset T 0 ⊆ T is equivalent to obtain xj+1 P from xj by adding the corresponding columns M·t of M for all t ∈ T , i.e., x + t∈T 0 M·t = xj+1 . 0 j T has to contain enough transitions to perform all experimentally observed switching sequences. The underlying standard network P = (P, T, A, w) is con- formal with X 0 if, for any two consecutive states xj , xj+1 ∈ X 0 , the linear equa- tion system xj+1 − xj = M λ has an integral solution λ ∈ N|T | such that λ represents a sequence (t1 , ..., tm ) of transition switches, i.e., there are intermedi- ate states xj = y 1 , y 2 , ..., y m+1 = xj+1 with y l +M·tl = y l+1 for 1 ≤ l ≤ m. The extended Petri net P = (P, T, A, w) is catalytic conformal with X 0 if tl ∈ TA (y l ) for each intermediate state y l , and the extended Petri net with priorities (P, O) is X 0 -deterministic if {tl } = TA,O (y l ) holds for all y l . The desired output consists of all minimal X 0 -deterministic extended Petri nets (P, O) (all having the same set P of places as part of the input). Example 2. We represent in Fig. 3 (page 54) several alternative X 0 -deterministic extended Petri nets fitting the experimental data X 0 from our running example. We now briefly sketch the reconstruction procedure. Representing the observed responses. As initial step, extract the observed changes of states from the experimental data. For that, define the set  D := dj = xj+1 − xj : xj+1 = succX 0 (xj ) ∈ X 0 . Generating the complete list of all X 0 -deterministic extended Petri nets P = (P, T, A, w) includes finding the corresponding standard networks and their in- cidence matrices M ∈ Z|P |×|T | . Due to monotonicity [4], it suffices to represent any dj ∈ D using sign-compatible update vectors from the following set only:     0 ≤ rp ≤ dp if djp > 0    dp ≤ rp ≤ 0 if djp < 0  Box(dj ) = r ∈ Z|P | : \ {0}.   P rp = 0 if djp = 0    0  p∈P 0 rp = 0 ∀P ∈ IP Next, we determine for any dj ∈ D, the set Λ(dj ) of all integral solutions of X dj = λt r t , λt ∈ Z+ , (1) r t ∈ Box(dj ) Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 50 M.C.F. Favre, W. Marwan and A.K. Wagler and for each λ ∈ Λ(dj ), the (multi-)set R(dj , λ) = {r t ∈ Box(dj ) : λt 6= 0} of update vectors used for this solution λ. By construction, Box(dj ) and Λ(dj ) are always non-empty since dj itself is always a solution due to reproducibility [6]. Every permutation π = (r t1 , . . . , r tm ) of the elements of a set R(dj , λ) gives rise to a sequence of intermediate states xj = y 1 , y 2 , ..., y m , y m+1 = xj+1 with  σ = σπ,λ (xj , dj ) = (y 1 , r t1 ), (y 2 , r t2 ), . . . , (y m , r tm ) . Example 3. For the running example we obtain as sequences d3 = d6 x7 x8 4.1 r r 4.2 FR x0 x1 x10 d4 = d5 x3 r 4.2 r 4.1 r 1.2 r 1.1 x2 x11 x9 x0 r 4.2 r 4.1 d4 d1 x6 r 4.1 r 4.2 r 1.1 r 1.2 R d2 d3 x5 x2 x3 x4 R with x9 = (1, 0, 0, 1, 0, 0)T , x10 = (0, 1, 1, 0, 1, 0)T and x11 = (0, 1, 1, 0, 0, 0)T . To compose all possible standard networks, we have to select exactly one solution λ ∈ Λ(dj ) for each dj ∈ D and to take the union of the corresponding sets R(dj , λ) in order to yield the columns M·t = r t of an incidence matrix M of a conformal network. To ensure that the generated conformal networks can be made X 0 -deterministic, we proceed as follows. Detecting priority conflicts between sequences. By construction, every sequence σπ,λ (xj , dj ) induces a priority relation Oσ , since it implies which transition ti is supposed to have highest priority (and thus switches) for every intermediate state y i . To impose valid priority relations O among all transitions of the reconstructed networks, we have to take conflicts between priority relations Oσ induced by different sequences σ into account. Two sequences σ and σ 0 are in priority conflict 0 if there are update vectors r t 6= r t and intermediate states y, y 0 such that 0 t, t ∈ T (y) ∩ T (y ) and (y, r ) ∈ σ but (y 0 , r t ) ∈ σ 0 (since this implies t > t0 in 0 0 t Oσ but t > t in Oσ0 ). We have a weak (resp. strong) priority conflict if y 6= y 0 0 (resp. y = y 0 ) which can (resp. cannot) be resolved by adding control-arcs. We construct a priority conflict graph G = (VD ∪Vterm , ED ∪EW ∪ES ) whose nodes correspond to all possible sequences σπ,λ (xj , dj ) and whose edges reflect weak and strong priority conflicts (WPC and SPC for short): – VD contains the sequences σπ,λ (xj , dj ) for all xj ∈ X 0 \ Xterm 0 and dj = j j j j succX 0 (x ) − x , for all λ ∈ Λ(d ) and all permutations π of R(d , λ). – Vterm contains for all xk ∈ Xterm 0 the trivial sequence σ(xk , 0). – ED contains all edges between two sequences σ, σ 0 stemming from the same difference vector. – ES (resp. EW ) reflects all SPCs (resp. WPCs) between sequences σ, σ 0 stem- ming from distinct difference vectors. Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 51 In G, we generate all node subsets S selecting exactly one sequence σπ,λ (xj , dj ) per difference vector dj ∈ D such that no SPCs occur between the selected se- quences and the nodes in Vterm . Clearly, a node σ ∈ VD can never be selected for any solution S if it is in SPC with a terminal state sequence or with all sequences σ 0 stemming from one difference vector dj ∈ D. Removing all such nodes and their incident edges from G yields the reduced priority conflict graph G 0 . We can show that the sets of suitable selections S obtained in G and G 0 are equal and that there is always at least one feasible selection. Example 4. We obtain the following reduced priority conflict graph G 0 for the running example, where bold edges indicate SPCs and thin edges WPCs. Q1 σ(x0 , 0) Q3 σ1 (x1 , d1 ) σ1 (x7 , d3 ) σ1 (x3 , d3 ) σ1 (x6 , d4 ) σ1 (x5 , d4 ) σ1 (x2 , d2 ) σ3 (x1 , d1 ) σ3 (x6 , d4 ) σ3 (x5 , d4 ) σ(x8 , 0) σ(x4 , 0) Q3 From G 0 , we obtain as feasible subsets Si = S 0 ∪ Si0 with S 0 = {σ1 (x2 , d2 ), σ1 (x3 , d3 ), σ1 (x7 , d3 )} S10 = {σ1 (x1 , d1 ), σ1 (x5 , d4 ), σ1 (x6 , d4 )}, S50 = {σ3 (x1 , d1 ), σ1 (x5 , d4 ), σ1 (x6 , d4 )}, S20 = {σ1 (x1 , d1 ), σ1 (x5 , d4 ), σ3 (x6 , d4 )}, S60 = {σ3 (x1 , d1 ), σ1 (x5 , d4 ), σ3 (x6 , d4 )}, S30 = {σ1 (x1 , d1 ), σ3 (x5 , d4 ), σ1 (x6 , d4 )}, S70 = {σ3 (x1 , d1 ), σ3 (x5 , d4 ), σ1 (x6 , d4 )}, S40 = {σ1 (x1 , d1 ), σ3 (x5 , d4 ), σ3 (x6 , d4 )}, S80 = {σ3 (x1 , d1 ), σ3 (x5 , d4 ), σ3 (x6 , d4 )}. Constructing X 0 -deterministic Petri nets. Every selected subset S corresponds to a conformal standard network PS = (P, TS , AS , w): we obtain the columns of the incidence matrix MS of the network by taking the union of all sets R(dj , λ) corresponding to the sequences σ = σπ,λ (xj , dj ) selected by σ ∈ S. Example 5. We apply the method to the feasible sets S1 and S4 from Exp. 4 and obtain the standard networks presented in Fig. 2 with TS1 = D and in Fig. 3 with TS4 = {d1 , d2 , d3 , r 4.1 , r 4.2 }, respectively. If there are weak priority conflicts σσ 0 ∈ EW for nodes σ, σ 0 ∈ S ∪ Vterm , denoted by WPC(σ, σ 0 ), the constructed standard network PS needs to be made X 0 -deterministic by inserting appropriate control-arcs. By [6], a WPC(σ, σ 0 ) be- 0 tween two sequences σ and σ 0 involving update vectors r t 6= r t and intermediate 0 states y 6= y 0 with t, t0 ∈ T (y) ∩ T (y 0 ) such that (y, r t ) ∈ σ but (y 0 , r t ) ∈ σ 0 has to be resolved by adding appropriate control-arcs that either turn r t into a transition t which is disabled at y 0 (then t > t0 forces t to switch in y whereas only t0 is enabled at y 0 ), or vice versa. Let P (y, y 0 ) be the set of places where y and y 0 differ and CA(σ, σ 0 ) the set of all possible control-arcs that can resolve WPC(σ, σ 0 ). Then CA(σ, σ 0 ) partitions into two subsets CAt>t0 (σ, σ 0 ) disabling t at y 0 containing Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 52 M.C.F. Favre, W. Marwan and A.K. Wagler – a read-arc (p, t) ∈ AR with weight w(p, t) > yp0 ∀p ∈ P (y, y 0 ) with yp > yp0 , – an inhibitor-arc (p, t) ∈ AI with w(p, t) < yp ∀p ∈ P (y, y 0 ) with yp < yp0 , and CAt yp ∀p ∈ P (y, y 0 ) with yp0 > yp , – an inhibitor-arc (p, t0 ) ∈ AI with w(p, t0 ) < yp0 ∀p ∈ P (y, y 0 ) with yp0 < yp . Remark 1. If one of y, y 0 is a terminal state, say y 0 , then CAt t0 = 0 holds automatically. Moreover, if y = y 0 then P (y, y 0 ) = ∅ follows which is the reason why SPCs cannot be resolved by adding control-arcs. Due to [6], inserting one control-arc from CA(σ, σ 0 ) resolves the WPC(σ, σ 0 ) in PS . Here, we further discuss mutual influences of control-arcs in the resulting extended Petri nets as well as the issue of only constructing minimal catalytic conformal networks. On the one hand, inserting a control-arc (p, t) ∈ CA(σ, σ 0 ) in PS might disable t at a state in another sequence σ 00 ∈ S \ σ, σ 0 where t is supposed to switch. In this case, (p, t) has to be removed from CA(σ, σ 0 ), resulting in a reduced set CAS (σ, σ 0 ). On the other hand, one control-arc may resolve several WPCs in PS if the corresponding sets CAS (σ, σ 0 ) intersect. Therefore, we propose the following consideration: Introduce one variable z(p,t) ∈ {0, 1} for each possible control-arc (p, t) ∈ CAS (σ, σ 0 ) for all WPCs in PS . Construct a 0/1-matrix AS whose columns correspond to all those variables (resp. control-arcs) and whose rows encode the incidence vectors of the sets CAS (σ, σ 0 ) for all WPCs in PS . Then any 0/1-solution z of the system AS z ≥ 1 encodes a suitable set of control-arcs resolving all WPCs in PS and, thus, a hitting set or cover of AS . By [14], we are only interested in finding minimal models fitting X 0 , where minimality is interpreted in the sense that all non-minimal models contain another one also fitting the data. We can show that using non- minimal covers of AS yields non-minimal extended Peri nets but that we need all of them for the sake of completeness, which corresponds to determining the so-called blocker b(AS ) of AS . Example 6. We list all WPCs between sequences in our running example: WPC1 between σ1 (x2 , d2 ) and σ(x0 , 0) due to d2 , 0 ∈ T (x0 ) ∩ T (x2 ) WPC2 between σ1 (x3 , d3 ) and σ(x0 , 0) due to d3 , 0 ∈ T (x0 ) ∩ T (x3 ) WPC3 between σ1 (x7 , d3 ) and σ(x0 , 0) due to d3 , 0 ∈ T (x0 ) ∩ T (x7 ) WPC4 between σ3 (x1 , d1 ) and σ1 (x7 , d3 ) due to d3 , r 1.2 ∈ T (x1 ) ∩ T (x7 ) WPC5 between σ3 (x1 , d1 ) and σ(x0 , 0) due to r 1.2 , 0 ∈ T (x0 ) ∩ T (x1 ) WPC6 between σ3 (x1 , d1 ) and σ(x8 , 0) due to r 1.2 , 0 ∈ T (x8 ) ∩ T (x1 ) WPC7 between σ1 (x2 , d2 ) and σ3 (x5 , d4 ) due to d2 , r 4.2 ∈ T (x2 ) ∩ T (x5 ) WPC8 between σ3 (x5 , d4 ) and σ1 (x3 , d3 ) due to d3 , r 4.2 ∈ T (x3 ) ∩ T (x5 ) WPC9 between σ3 (x5 , d4 ) and σ(x4 , 0) due to r 4.2 , 0 ∈ T (x4 ) ∩ T (x5 ) WPC10 between σ3 (x6 , d4 ) and σ1 (x3 , d3 ) due to d3 , r 4.2 ∈ T (x3 ) ∩ T (x6 ) WPC11 between σ3 (x6 , d4 ) and σ(x4 , 0) due to r 4.2 , 0 ∈ T (x4 ) ∩ T (x6 ) WPC12 between σ1 (x5 , d4 ) and σ3 (x6 , d4 ) due to d4 , r 4.2 ∈ T (x5 ) ∩ T (x6 ) WPC13 between σ3 (x5 , d4 ) and σ1 (x6 , d4 ) due to d4 , r 4.2 ∈ T (x5 ) ∩ T (x6 ) Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 53 We obtain the following control-arcs to resolve WPCs between sequences: CA(WPC1) = {(PF R , d2 ) ∈ AI , (PR , d2 ) ∈ AR }, CA(WPC2) = {(PF R , d3 ) ∈ AI , (PR , d3 ) ∈ AR , (G, d3 ) ∈ AR }, CA(WPC3) = {(G, d3 ) ∈ AR }, CA(WPC4) = {(F R, d3 ) ∈ AI , (G, d3 ) ∈ AR , (F R, r 1.2 ) ∈ AR , (G, r 1.2 ) ∈ AI }, CA(WPC5) = {(F R, r 1.2 ) ∈ AR }, CA(WPC6) = {(Spo, r 1.2 ) ∈ AI , (G, r 1.2 ) ∈ AI }, CA(WPC7) = {(R, r 4.2 ) ∈ AR , (R, d2 ) ∈ AI }, CA(WPC8) = {(R, r 4.2 ) ∈ AR , (G, r 4.2 ) ∈ AI , (R, d3 ) ∈ AI , (G, d3 ) ∈ AR }, CA(WPC9) = {(R, r 4.2 ) ∈ AR , (Spo, r 4.2 ) ∈ AI , (G, r 4.2 ) ∈ AI }, CA(WPC10) = {(R, d3 ) ∈ AI , (R, r 4.2 ) ∈ AR }, CA(WPC11) = {(R, r 4.2 ) ∈ AR , (Spo, r 4.2 ) ∈ AI }, CA(WPC12) = {(G, d4 ) ∈ AI , (G, r 4.2 ) ∈ AR }, CA(WPC13) = {(G, d4 ) ∈ AR , (G, r 4.2 ) ∈ AI }. The following reductions of sets of possible control-arcs are necessary: for all standard networks PSi , we have σ1 (x3 , d3 ) and σ1 (x7 , d3 ) selected simultane- ously, which both are in WPC with σ(x0 , 0), see WPC2 and WPC3. To resolve WPC2, we have CA(WPC2) = {(PF R , d3 ) ∈ AI , (PR , d3 ) ∈ AR , (G, d3 ) ∈ AR }. However, (PF R , d3 ) ∈ AI and (PR , d3 ) ∈ AR do not only disable d3 at x0 , but also d3 at x7 (due to x0PF R = x7PF R = 1 and x0PR = x7PR = 0). Since d3 is supposed to switch at x7 , we obtain CASi (WPC2) = {(G, d3 ) ∈ AR } as reduced set of possible control-arcs to resolve WPC2 in all networks P(Si ) . Similarly, (G, r 4.2 ) ∈ AI has to be excluded from CA(WPC8) and CA(WPC9) in PS4 and PS8 as otherwise r 4.2 would be disabled at x6 where it is supposed to switch by σ3 (x6 , d4 ) ∈ S4 , S8 . For the feasible set S4 , we obtain as matrix AS4 : (PF R , d2 ) ∈ AI (PR , d3 ) ∈ AR (G,d3 ) ∈ AR (R,r 4.2 ) ∈ AR (R,d2 ) ∈ AI (R,d3 ) ∈ AI (Spo,r 4.2 ) ∈ AI WPC1 X X WPC2 X WPC3 X WPC7 X X WPC8 X X X WPC9 X X WPC10 X X WPC11 X X The blocker b(AS4 ) contains four minimal covers of AS4 which correspond to the different sets of control-arcs in the four extended Petri nets depicted in Fig. 3, all arising from the standard network PS4 . For S1 , the matrix AS1 contains the first 3 rows and columns of AS4 , the blocker b(AS1 ) contains two minimal covers of AS1 which correspond to the control-arcs in the two extended Petri nets in Fig. 2 arising from PS1 . Note that b(AS ) is non-empty if and only if none of the sets CAS (σ, σ 0 ) is empty. We can show that there is at least one catalytic conformal network for any given X 0 . All catalytic conformal extended Petri nets PS,P 0 = (P, TS , AS,P 0 , w) based on PS can be made X 0 -deterministic n by taking all the priorities Oσ for o all σ ∈ S, where Oσ is defined by Oσ = ti > t : t ∈ TAS,P 0 (y i ) \ ti , 1 ≤ i ≤ m . By construction, there are no priority conflicts in the extended network PS,P 0 between Oσ and Oσ0 for any σ, σ 0 ∈ S, thus we obtain the studied partial order [ OS,P 0 = Oσ . σ∈S Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 54 M.C.F. Favre, W. Marwan and A.K. Wagler This finally implies: Theorem 1. Every extended network PS,P 0 = (P, TS , AS,P 0 , w) together with the partial order OS,P 0 is an X 0 -deterministic extended Petri net, and there are no other minimal extended Petri nets with priorities fitting the given data X 0 . PF R PF R PR 4 1 4 1 4 d d d d d d1 R FR R FR R FR PR PR PF R d2 G d2 G d2 G 3 Spo 3 Spo 3 Spo d d d Standard network PS1 PS1a PS1b Fig. 2. Standard network PS1 = (P, TS1 , AS1 , w) from solution S1 and the two catalytic conformal extended Petri nets resulting from PS1 . PR PF R PR FR d1 r 4.2 R r 4.1 r 4.2 d1 FR FR d1 r 4.2 r 4.1 r 4.1 PF R PR PF R R R d2 G d2 G d2 G d3 Spo d3 Spo d3 Spo Standard network PS4 PS4a PS4b PF R PR FR 1 4.2 FR 1 d r d r 4.2 r 4.1 R r 4.1 PR PF R R G G d2 d2 Spo Spo d3 d3 PS4c PS4d Fig. 3. Standard network PS4 = (P, TS4 , AS4 , w) from solution S4 and the four cat- alytic conformal extended Petri nets resulting from PS4 . Example 7. For the two extended Petri nets based on PS1 in Fig. 2, no priorities are needed to obtain X 0 -deterministic extended Petri nets (PS1a , ∅) and (PS1b , ∅). For the four extended Petri nets in Fig. 3 based on PS4 , the priority relation O4 = {(r 4.2 > d2 )} is required for PS4a and PS4b , whereas O4 = {(d2 > r 4.2 ), (d3 > r 4.2 )} is required for PS4c and PS4d , to obtain X 0 -deterministic extended Petri nets. In total, the complete solution set contains 66 minimal X 0 -deterministic ex- tended Petri nets with priorities, see Table 1. Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 55 3 Integrating prior biological knowledge 3.1 Indecomposability of difference vectors In the step Representing the observed responses, the set of potential update vectors which might constitute the incidence matrices of the networks are con- sidered. Hereby, for each dj ∈ D, the set Box(dj ) contains only sign-compatible vectors due to monotonicity (which avoids homogeneous solutions in (1) accord- ing to minimality) and takes P -invariants into account (which avoids infeasible intermediate states according to prior biological knowledge). In some cases, one could restrict Box(dj ) further, e.g., if – dj exactly corresponds to a well-known biochemical reaction (including the correct stoichiometry) or to a well-known mechanism (that a certain trigger is detected by a suitable receptor), – experiments have shown that subsets of the input components of dj do not lead to the observed response, – dj is treated as black box-like reaction where only input and output matter, but not the intermediate mechanism due to the chosen level of detail. In such cases, we propose to exclude the corresponding response dj ∈ D from decomposition and, instead, just define Box(dj ) := {dj } in accordance with the existing knowledge. Example 8. Since the light-dependent reactions of the photoreceptor are so much faster than the subsequent processes that are considered in the reconstruction process, the difference vector describing the phytochrome photoconversion will not be decomposed into different reaction vectors. Thus, for the difference vectors d1 , d4 and d5 only the canonical sequences σ1 (x1 , d1 ), σ1 (x5 , d4 ) and σ1 (x6 , d4 ) remain in the priority conflict graph, and S1 remains as only possible selection. Accordingly, the total number of solutions reduces from 66 to the 2 presented in Fig. 2, see Table 1. 3.2 Treating equal difference vectors in the same way In the step Detecting priority conflicts between sequences, all observed responses dj ∈ D are treated independently from each other, so far. If, however, two difference vectors di , dj ∈ D are equal, we clearly have Box(di ) = Box(dj ) and, thus, Λ(di ) = Λ(dj ). Here, it is natural to require that both di , dj are decomposed in the same way (i.e., by the same λ ∈ Λ(di ) = Λ(dj )) and that the involved reactions are applied in the same order (i.e., by the same permutation π of the elements in the sets R(di , λ) = R(dj , λ)) to obtain the same transitions (with equal control-arcs and priorities) in both cases. Indeed, using the same – λ ∈ Λ(di ) = Λ(dj ) corresponds to the fact that the same subset of molecules involved in a reaction will not interact according to different mechanisms, Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 56 M.C.F. Favre, W. Marwan and A.K. Wagler – permutation π of the elements in R(di , λ) = R(dj , λ) corresponds to the fact that the order in which the reactions are applied reflects the relative rates of the reactions in R(di , λ) = R(dj , λ), so the same relation between reaction rates shall lead to the same priorities within the resulting sequences. We call two sequences σπ,λ (xi , di ) and σπ,λ 0 (xj , dj ) twin sequences if di = dj and the same λ and π has been used. To force that twin sequences are always selected together, we propose to add strong priority conflicts between all other sequences stemming from a pair di , dj of equal difference vectors while creating the priority conflict graph, since no two sequences in strong priority conflict are selected for the same network. Example 9. In our running example, we have d3 = d6 and d4 = d5 . The latter vectors can be decomposed in different ways, among the resulting sequences we have σ1 (x5 , d4 ), σ1 (x6 , d5 ) and σ3 (x5 , d4 ), σ3 (x6 , d5 ) as pairs of twin sequences. Forcing to select these pairs together rules out the four selections S2 , S3 , S6 , S7 so that only S1 , S4 , S5 , S8 remain as possible selections. Accordingly, the total number of solutions reduces from 66 to 18, as reported in Table 1. 3.3 Knowledge on relative reaction rates In the step Constructing X 0 -deterministic Petri nets, to resolve a WPC(σ, σ 0 ) 0 between σ, σ 0 involving update vectors r t 6= r t and intermediate states y 6= y 0 , either transition t has to be disabled at y 0 or transition t0 at y, while the decision between t and t0 on the other state can be handled by a priority. Here, prior knowledge about the relative reaction rates of t and t0 (e.g. gained from the time-scales during the experiments) could help to decide whether t > t0 or t < t0 better reflects the reality, and than choose between control-arcs either from CAt>t0 (σ, σ 0 ) or from CAt0 >t (σ, σ 0 ). So far, this idea is already applied for WPCs involving a terminal state: if y 0 is a terminal state and σ 0 = σ(y 0 , 0) its trivial sequence, then t > 0 holds automatically at y, but t has to be disabled at y 0 using control-arcs from CAt>0 (σ, σ 0 ) whereas CAt<0 (σ, σ 0 ) is empty. This idea can be generalized to any WPC(σ, σ 0 ) where the time-scale of the corresponding experimental observations clearly differs in order to deduce the correct priority t > t0 or t < t0 . In such cases, we propose to reduce CA(σ, σ 0 ) accordingly either to CAt>t0 (σ, σ 0 ) or to CAt0 >t (σ, σ 0 ). Example 10. In our running example, we have WPC1, WPC2, WPC3, WPC5, WPC6, WPC9, WPC11 involving a terminal state; the according reductions of the sets CA(σ, σ 0 ) to CAt>0 (σ, σ 0 ) are already applied in Exp. 6. Moreover, WPC4, WPC7, WPC8, WPC10 involve reactions with clearly dif- ferent time-scales during the experimental observations: – d1 and d4 = d5 need only milliseconds to occur, – d2 needs about 1 hour to occur, and – d3 = d6 need at least 10 hours. Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 57 Accordingly, we can reduce the sets CA(σ, σ 0 ) as follows: – due to r 1.2 > d3 , for WPC4 only (F R, r 1.2 ) ∈ AR and (G, r 1.2 ) ∈ AI remain; – due to r 4.2 > d2 , for WPC7 only (R, r 4.2 ) ∈ AR remains; – due to r 4.2 > d3 , for WPC8 only (R, r 4.2 ) ∈ AR and (G, r 4.2 ) ∈ AI remain, while for WPC10 only (R, r 4.2 ) ∈ AR is left. At least one of these WPCs occurs in the standard networks coming from the selected sets S2 − S4 , S6 − S8 . Note that in the two remaining WPCs the time- scale of the involved responses is equal (see Exp. 6). Consequently, the number of extended Petri nets decreases from 66 to 36 as reported in Table 1. 4 Discussion The subject of this paper was an approach from [6] that aims at reconstructing all X 0 -deterministic extended Petri nets that fit given experimental data X 0 , to provide all possible alternatives of mechanisms behind the experimentally ob- served phenomena. This typically results in a large set of solution alternatives. To keep this solution set reasonably small while still guaranteeing its complete- ness, we firstly generate only Petri nets being minimal in the sense that all other networks fitting the data contain the reconstructed ones. In the presented approach, the minimality concept is applied twice: – monotone data: using only sign-compatible vectors in Box(dj ) avoids homo- geneous solutions during the decomposition of (dj ) (and superflous transi- tions in the networks), see [4]. – minimal hitting sets: we here proposed to use only minimal sets of control- arcs to resolve all weak priority conflicts in a standard network which avoids unnecessary control-arcs (and artificial dependencies), see Section 2. This ensures that the presented approach exactly generates all minimal extended Petri nets with priorities (Theorem 1). We further avoid generating minimal solutions which are “technically correct” but would be ruled out later during a subsequent verification process to check whether the returned solutions are “biological meaningful” or even contradict well-established biological knowledge as in [2]. For that, we extent the considered input by integrating prior biological knowledge beyond the information given with the experimental time-series data into the reconstruction process. We propose to integrate prior knowledge in the following way: – P-invariants: helps to obtain feasible intermediate states in all sequences which avoids the generation of solutions contradicting known facts, see [13]. – indecomposable difference vectors: help to keep already known subnetworks or mechanisms, see Section 3.1. – treating equal difference vectors in the same way: helps to keep consistency in the interpretation of experimental observations, see Section 3.2. – obeying terminal states and reaction rates: helps to chose meaningful prior- ities and to avoid artificial control-arcs, see Section 3.3. Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 58 M.C.F. Favre, W. Marwan and A.K. Wagler So far, the algorithmic procedure due to [6] takes as input (P, IP , X 0 ) where the list of P -invariants and the monotonicity of X 0 are already used in the first step Decomposing difference vectors to determine Box(dj ). To integrate further knowledge in the reconstruction procedure to force the algorithm to make “the right decisions” in some intermediate steps, we suggest, based on the previous discussions, to extend the input to (P, IP , X 0 , Din , Deq , OD ) where – Din contains all indecomposable difference vectors (for that, carefully select them according to the before mentioned critera, e.g., to preserve known subnetworks or mechanisms or to take a certain level of detail into account); – Deq contains all pairs of equal difference vectors that shall be treated in the same way (for that, only chose equal difference vectors where also the time elapsed during the experimental observation was equal); – OD contains information about the reaction rates between difference vectors (for that, only impose priorities for sufficiently different rates according to clearly different time scales during the experiments). Example 11. The three examples from Section 3 can be interpreted as follows: – Exp. 8 shows the result taking (P, IP , X 0 , Din = {d1 , d4 , d5 }, ∅, ∅) as input; – Exp. 9 shows the result with (P, IP , X 0 , ∅, Deq = {d3 = d6 , d4 = d5 }, ∅); – Exp. 10 the result with (P, IP , X 0 , ∅, ∅, OD = {d1 , d4 , d5 > d2 > d3 , d6 }). Whereas the first setting reduces the number of solution alternatives to 2, the combination of the two latter scenarios reduces the number of solution alterna- tives to 12, see Table 1 below. S1 S2 S3 S4 S5 S6 S7 S8 TOTAL minimal 2 8 8 4 4 16 16 8 66 Din 6= ∅ 2 0 0 0 0 0 0 0 2 Deq 6= ∅ 2 0 0 4 4 0 0 8 18 OD 6= ∅ 2 4 4 2 4 8 8 4 36 Deq , OD 6= ∅ 2 0 0 2 4 0 0 4 12 Table 1. Number of solutions depending on different input settings. To conclude, we notice that providing indecomposable difference vectors has the largest impact on the solution set. However, even if no indecomposable dif- ference vectors can be identified, treating equal difference vectors in the same way and deducing relative reaction rates from the time-scale of the experimental observations leads to a substantial reduction of the solution set, keeping only “biological meaningful” network alternatives. The further goal is to provide an implementation for the presented reconstruc- tion method, including the option of integrating prior knowledge as additional input during the reconstruction. For that, we will use Answer Set Programming as done for the reconstruction of standard networks in [1]. Finally, we plan to apply the presented reconstruction approach to different biological experimen- tal data. We expect an important impact of Automatic Network Reconstruction Proc. BioPPN 2014, a satellite event of PETRI NETS 2014 Automatic network reconstruction 59 on the integrated experimental and theoretical analysis of biological systems towards their holistic understanding, since computational models derived from experimental data by our exact, exclusively data-driven approach have predic- tive ability due to completeness of the solution set guaranteed by mathematical proofs. References 1. M. Durzinsky, W. Marwan, M. Ostrowski, T. Schaub, and A. K. Wagler. Automatic network reconstruction using ASP. Th. and Pract. of Logic Progr., 11:749–66, 2011. 2. M. Durzinsky, W. Marwan, and A. K. Wagler. Reconstruction of extended Petri nets from time series data and its application to signal transduction and to gene regulatory networks. BMC Systems Biology, 5, 2011. 3. M. Durzinsky, W. Marwan, and A. K. Wagler. Reconstruction of extended Petri nets from time-series data by using logical control functions. Journal of Mathe- matical Biology, 66:203–223, 2013. 4. M. Durzinsky, A. K. Wagler, and R. Weismantel. A combinatorial approach to reconstruct Petri nets from experimental data. Lecture Notes in Comp. Sc. (special Issue CMSP 2008), 5307:328–346, 2008. 5. M. Durzinsky, A. K. Wagler, and R. Weismantel. An algorithmic framework for network reconstruction. J.Theor. Computer Science, 412(26):2800–2815, 2011. 6. M. Favre and A. K. Wagler. Reconstructing X 0 -deterministic extended Petri nets from experimental time-series data X 0 (extended abstract). CEUR Workshop Pro- ceedings (Special Issue BioPPN 2013), 988:45–59, 2013. 7. I. Koch and M. Heiner. Petri nets. In B. H. Junker, F. Schreiber, editor, Biological Network Analysis, Wiley Book Series on Bioinformatics, pages 139–179, 2007. 8. T. Lamparter and W. Marwan. Spectroscopic detection of a phytochrome-like pho- toreceptor in the myxomycete physarum polycephalum and the kinetic mechanism for the photocontrol of sporulation. Photochem Photobiol, 73:697–702, 2001. 9. W. Marwan, A. K. Wagler, and R. Weismantel. A mathematical approach to solve the network reconstruction problem. Math. Methods of Operations Research, 67(1):117–132, 2008. 10. J. W. Pinney, R. D. Westhead, and G. A. McConkey. Petri net representations in systems biology. Biochem. Soc. Tarns., 31:1513–1515, 2003. 11. C. Starostzik and W. Marwan. Functional mapping of the branched signal trans- duction pathway that controls sporulation in Physarum polycephalum. Photochem. Photobiol., 62:930–933, 1995. 12. L. M. Torres and A. K. Wagler. Encoding the dynamics of deterministic systems. Math. Methods of Operations Research, 73:281–300, 2011. 13. A. K. Wagler. Prediction of network structure. In I. Koch, F. Schreiber, and W. Reisig, editors, Modeling in Systems Biology, volume 16 of Computational Bi- ology, pages 309–338, 2010. 14. A. K. Wagler and J.-T. Wegener. On minimality and equivalence of Petri nets (extended abstract). Fundamenta Informaticae, 128:209–222, 2013. 15. A. K. Wagler and J.-T. Wegener. Preprocessing for network reconstruction: Fea- sibility test and handling infeasibilty (extended abstract). CEUR Workshop Pro- ceedings (Special Issue CS&P 2013), (1032):434–47, 2013. Proc. BioPPN 2014, a satellite event of PETRI NETS 2014