1. Introduction

Electronic Journal of Statistics 6 (2021) 1550-1599. doi:10.1214/12

10.1007/11905455\_1

Step-by-step Robustness for Biochemical Networks

Ruggero Lanotte

Desiree Manicardi

Simone Tini

0 0 University of Insubria , Como , Italy

2023

12477 13 15

We propose a notion of robustness for biochemical networks that, intuitively, measures the ability of the network to exhibit step-by-step limited variations on the concentration of a species of interest at varying of the initial concentration of other species. We provide a statistical technique allowing for estimating robustness and showcase it on the EnvZ/OmpR Osmoregulatory Signal System of E. coli.

eol>Robustness Biochemical networks Behavioural distances

1. Introduction

Living cells are complex systems whose morphological and functional organization have been thoroughly investigated in recent years, in the context of system biology [ 1 ]. The complex behavior of cells arises from the interactions of their huge amount of components, that interact with each other, through chemical reaction networks. Malfunctioning or corruption of these interactions may originate in severe diseases, such as cancer or diabetes. Therefore, it is relevant to study how the components of the cells interact with each other as a system in order to be able to predict how perturbations can modify their behavior. In particular, in some cases, it is interesting to predict in which case the nominal behavior of the cell can be maintained in presence of those perturbations, which leads to the notion of robustness. Notably, the ability to maintain the nominal behavior is not a qualitative property, but can be quantified, to formalize to which extent the original behavior can be maintained in function of the amount of perturbation introduced in the system.

The notion of robustness has been widely used in several contexts, from control theory [ 2 ] to security [ 3 ] and biology [ 4 ], and is commonly meant as the ability of a system to maintain its functionalities against external and internal perturbations. Among the notions of robustness introduced in biology, -robustness [ 5 ] verifies how by varying the initial concentration of some species, called conventionally the input species, the concentration of other species of interest, called the output species, varies at steady state.

In the present paper, we study a notion of robustness that is inspired by that of -robustness, but with two main diferences: (i) we work within the stochastic model [ 6 ], whereas -robustness has been proposed for the deterministic model [ 7 ]; (ii) coherently with the choice of the model, instead of evaluating the concentration level of output species at steady state, we evaluate it step-by-step, up to a ifnite horizon. Being the deterministic and the stochastic approach complementary [ 7 ], we can argue that also our results and those in [ 5 ] are. Clearly, robustness in our sense captures random efects and temporary efects that are typical of the stochastic model.

1.1. Our contribution

In order to study the robustness of system behavior in the stochastic model, we need: (i) a language allowing us to specify systems of interest; (ii) a semantic model, which must capture the probabilistic behavior that characterizes the stochastic approach; (iii) a formal notion of robustness, defined on the semantic model; (iv) an algorithm allowing us to estimate the robustness of a system starting from its specification.

As regards specification, we rely on process calculi (similar to those in [8, 9]), adopting in particular the species as processes [10] approach. Essentially, a solution with species is modeled by the parallel composition of processes, where each process represents one species and its concentration level. In this paper, the concentration level is expressed in terms of the number of molecules, but this can be generalized.

In order to model the behavior, we follow Gillespie’s approach [ 6 ], where computation steps represent single chemical reactions. Here, at each step there is a competition between all available reactions, giving rise to a probabilistic behavior. In this context, we believe that it is convenient to adopt the semantic model of evolution sequences proposed in [11, 12, 13, 14, 15, 16]: essentially, an evolution sequence is a sequence of probability measures over the attainable configurations.

By adopting evolution sequences, one can exploit the whole theory developed in [12, 13] to measure the diferences between system behaviors, which supports the study of robustness properties. In [12, 13] the focus is on cyber-physical systems and the starting idea is to provide a notion of distance between computation states that quantifies to which extent they perform diferently with respect to some targets that are fixed initially. In other words, to each computation state one assigns a penalty quantifying how bad the system is working with respect to the targets, and the distance between two computation states is given by the diference of the penalties that are assigned to them. Then, the notion of distance between computation states is lifted first to probability measures over computation states and, then, to evolution sequences. In the present paper, we customize this approach to support the study of a robustness property for biochemical networks inspired by the -robustness of [ 5 ]. Essentially, if we are interested in studying the diference between the behaviour of two systems by focusing on the quantity of a given set of species, we can assign to each system state a rank that depends on the available quantity of these species. Then, the distance between two states coincides with the diference between their ranks, and this can be lifted to probability measures and to evolution sequences. Since -robustness bases on how much the output species vary depending on the variation of the initial concentration of input species, we can consider two diferent ranks, which capture the input and the output species, respectively, and that allow us to define the input distance and the output distance between the nominal system and the perturbed one. Then we formalize a notion of robustness whose intuition is that small variations on input distance should give rise to smooth and limited variations on output distance.

Then, following [12, 13], we provide a randomized algorithm that permits estimating the evolution sequences of systems and thus for the evaluation of the distances between them. This allows us to estimate robustness of a nominal system, by sampling perturbed systems at a fixed maximal input distance from it and by estimating their output distance. In order to validate our proposal, we have provided a Python implementation, available at https://github.com/dmanicardi/spebnr, for the EnvZ/OmpR Osmoregulatory Signaling System of E. coli.

1.2. Related work

Robustness in biology has been extensively studied in recent years. A very general approach has been proposed in [17, 18], where the nominal behaviour of a system subject to perturbations is expressed in terms of a formula specified with a linear temporal logic equipped with a quantitative semantics, then the robustness of the system is quantified as the average satisfaction degree of that property over all admissible perturbations, possibly weighted by their probabilities. Several notions of robustness proposed in the literature are less general and focus on steady state behaviour. As an example, the notion of adaptability in [19] captures the idea that the behaviour at steady state of a system is insensitive with respect to the initial concentration of some species. Absolute concentration robustness [20, 21] with respect to a given species requires that the system admits at least one positive steady state and that the concentration of that species is the same in all of the positive steady states that the system might admit. The notion of -robustness has been proposed in [ 5 ] on the Continuous Petri Nets model. Intuitively, this notion captures the idea that by varying the initial concentration of input species under suitable constraints, namely by remaining within a so called interval marking of the net, then at steady state one observes a bounded variation of the concentration of output species, namely that concentration is in a ball of radius .

1.3. Organization of contents.

We devote Section 2 to the presentation of our model. In Section 3 we discuss the behavioral distances and our notion of robustness. Then, in Section 4 we provide the algorithm, based on statistical inference, for estimating the distances and the robustness. The study of EnvZ/OmpR Osmoregulatory Signaling System is described in Section 5. Finally, Section 6 concludes the paper with a discussion on future research directions.

2. The model

We assume a set of names for species and a set ℛ of chemical reactions. Then, we use a set of actions describing how a species can take part to a reaction. In particular, if is a reaction in ℛ, then the following actions are in : • ?, denoting that the species participates in reaction as a reactant, consuming molecules, • !, denoting that the species participates in reaction as a product, producing molecules, • ?!, denoting that the species participates in reaction as both a reactant and a product, like in the case of enzymatic reactions. Here, molecules are consumed and molecules are produced. Elements in will be denoted with , ′, 1, . . . . For an action ∈ , the associated reaction r( ) is defined by r(?) = r(!) = r(?!) = . For a set of actions ⊆ , we let r() denote r() = {r( ) | ∈ }.

Definition 1 (System). The set of systems over , ℛ and is defined by

::= [] | ‖ where: (i) ∈ , (ii) ⊆ , and (iii) is a natural.

Intuitively, the system [] represents that molecules of species are in the mixture and have a potential behaviour described by , and ‖ is the (commutative and associative) parallel composition operator, which allows us to model that several species coexist in the same mixture.

Given systems [] , for = 1, . . . , , we will say that the system 1[1]1 ‖ · · · ‖ [] is well formed if for all 1 ≤ < ≤ it holds that ̸= . We will always assume to work with well formed systems. Intuitively, well formedness ensures that each species is represented by precisely one parallel component.

Example 1. Escherichia Coli is a bacterium whose cell contains a few million proteins of diferent types [ 1 ]. The EnvZ/OmpR Osmoregulatory Signaling System regulates two of those proteins. In order to support our exposition, in Figure 1 we report the graphical representation of this regulatory network as given in [ 5 ] by exploiting the formalism of Continuous Petri Nets. The main components of this chemical network are EnvZ (histine kinase) and OmpR (response regulator), denoted, respectively, as X and Y. EnvZ phosphorylates OmpR (YP in the figure) and itself ( XP), by binding and breaking down ATP. In the picture we can see that this system is characterized by eight species, represented by places in the net, and eleven reactions, represented by transitions. More in detail, the reactants and the products of a reaction are those that are represented by the places that are the sources and the targets, respectively, of the transition represented by that reaction. For instance, the species XT is a reactant 1 X Y XPY

XDYP 10 YP 8 for the reactions 4 and 5, and is a product for 3. In this example, each reaction uses precisely one molecule for each of its reactants and produces precisely one molecule for each of its products. Below we give the system that represents the regulatory network with species X, Y, XD and YP having a number of molecules corresponding to 25, 150, 50 and 10, respectively, and the other species having no molecule.

X[{2?1, 3?1, 1!1, 4!1, 8!1}]25 ‖

Y[{6?1, 7!1, 11!1}]150 ‖ ‖

XD[{1?1, 9?1, 2!1, 10!1, 11!1}]50 XPY[{6!1}]0 ‖

XDYP[{9!1}]0 ‖ ‖

XT[{3!1}]0 ‖

XP[{5!1, 7!1}]0

(1) YP[{9?1, 8!1, 10!1}]10

2.1. Behavioral model

We define the system’s behaviour by means of two types of transitions. Those of the first type represent the potential behaviour of systems. They are of the form →−− − − (,) ′, which models that the reaction can take to ′. Here, is the weight of the transition, and is a real number that will allow us to calculate the rate of the reaction , which is needed in order to calculate the probability of all enabled reactions. The transitions of the second type describe how all enabled reactions for a system compete and take it to a probability distribution over systems. They are of the form =⇒ , where is a discrete distribution over systems, namely a mapping : →− [ 0, 1 ] with ∑︀∈ () = 1. From the transitions of the form =⇒ we will derive transitions of the form =⇒ ′ that model the evolution of distributions of system.

The transitions of the first type are derived by means of the inference rules in Table 1. The first rule represents that molecules of species n are consumed by taking part to reaction as reactant. The weight (︀ )︀ coincides with the number of ways molecules of can be taken out from the available , namely it is the number of sets of molecules of that can take part to the reaction . The second rule =⇒ trgt() = {(, , ) | ∈ } ∑︁ ( · ) ∈ ∑︀∈ ( · ) () represents that molecules of n are produced by taking part to reaction as product. The weight 1 reflects that the number of molecules of the species does not impact on the rate of the reaction. The third rule represents that molecules of n are consumed and molecules of n are produced by taking part to reaction both as reactant and as product. The fourth rule is applied when is not involved in reaction . The inferred transition allows for composing the behaviour of [] with that of other species. The last rule allows for combining the behaviour of systems running in parallel. Clearly, they have to agree upon the reaction to be taken and the resulting weight is the product of the weights of the composed transitions. Here, we extend classic product by # · = · # = for all reals , and # · # = #.

For a system we denote by trgt() the set of the triples {(, , ) →|−−− − − and ̸= #}. Notice that for = 1[1]1 ‖ · · · ‖ [] , we have (, , ) ∈ trgt() with > 1 if and only if contains diferent sets of reagents that can take part to reaction .

In order to define the transitions of the form =⇒ , we need some notation for distributions over systems. By () we denote the point distribution giving probability 1 to and probability 0 to all ′ diferent from . Then, for a countable set of indexes and distributions with ∈ , the distribution ∑︀∈ with all ≥ 0 and ∑︀∈ = 1 is the convex distribution defined by (∑︀∈ )() = ∑︀∈ · ().

The first rule in Table 2 represents the competition between all enabled reactions of . For each reaction , we consider the constant reaction associated with the reaction, where, intuitively, is the probability that a particular combination of reactants gives rise to reaction in an infinitesimal time interval . The probability that behaves as described by (, , ) is the ratio between the rate · of the reaction and the sum of the rates of all reactions ∑︀∈ ( · ). To explain this point, we recall that · is the parameter of the exponential distribution modeling the time elapsing between two consecutive occurrences of reaction and ∑︀∈ ( · ) is the parameter of the exponential distribution modeling the time elapsing between two consecutive occurrences of arbitrary ( · ) reactions. Summarizing, reaches the convex distribution of systems ∑︀∈ with = ∑︀∈ ( · ) and = ().

The second rule in Table 2 lifts the behaviour of systems to that of distributions over systems. We note that if there is a sequence of transitions 1 =⇒ 2 =⇒ . . . . . . , for 1 the point distribution (1[1]1 ‖ · · · ‖ [] ), then all systems in the support of any are of the form = 1[1]′1 ‖ · · · ‖ []′ , for suitable ′1, . . . , ′. Following [12, 13], the sequence 1 =⇒ 2 =⇒ . . . . . . is called an evolution sequence. In particular, it is the evolution sequence of if 1 = ().

In a transition =⇒ the probability weight assigned to each element in the support of depends on the rate of all reactions that are possible in . This is no more true in a transition =⇒ ′ since the rates of the chemical reactions from two diferent systems in the support of are unrelated.

2.2. Remarks on the semantic model

We have described the behaviour of a system by means of an evolution sequence, namely a sequence of distributions of systems. Clearly, computing exactly these distributions is undoable. However, by adapting to our context the work in [12, 13], we can easily obtain a randomized algorithm allowing us to estimate the evolution sequence of a system . This randomized algorithm, detailed in Section 4, works as follows: (i) we apply times a procedure allowing us to compute a ℎ-steps trajectory. Essentially, this coincides with applying times the classical Gillespie algorithm [ 6 ]. For = 1, . . . , , we obtain the trajectory 0 , 1 , . . . , ℎ , where all 0 coincide with . Clearly, for all = 1, . . . , ℎ it holds that the samples 1, . . . , obtained at step are independent and identically distributed; (ii) for each = 1, . . . , ℎ, we use 1, . . . , to derive the empirical distribution ^ defined simply by ^ (′) = |{|=′ and 1≤ ≤ }| ; (iii) by applying the weak law of large numbers to i.i.d. samples we infer that when goes to infinite then ^ converges weakly to , where are the distributions such that () =⇒ 1 =⇒ 2 . . . .

3. Behavioral distances over systems

In [12, 13] a notion of behavioral distance over cyber-physical systems was proposed, aiming to express how well the cyber components fulfill their tasks. In this section, we customize that theory for our model, clearly with a diferent target. In particular, given two systems that difer by the number of molecules of species, say 1 = 1[1]11 ‖ · · · ‖ []1 and 2 = 1[1]21 ‖ · · · ‖ []2 , we aim to quantify the diferences over their evolution sequences 1 =⇒ 11 =⇒ 21 . . . and 2 =⇒ 12 =⇒ 22 . . . . To this purpose, we proceed as follows: (i) first we introduce the concept of distance between systems, which deals with the diference between the number of molecules of the species; (ii) then we lift this notion to a notion of distance between distributions, so that we can quantify the distance between the distributions 1 and 2 that are reached by 1 and 2, respectively, at step , and, finally, (iii) we lift this distance to a distance between the two evolution sequences.

We start with a notion of distance between systems that focuses on a single specie. We assume to know the minimum and maximum level min() and max() that a given species may reach during the computation. Notice that these values can be estimated by applying our randomized algorithm quickly described above in Section 2.2 and detailed below in Section 4.

Definition 2 (-distance over systems). Assume systems 1 = 1[1]11 ‖ · · · ‖ []1 and 2 = 1[1]21 ‖ · · · ‖ []2 . Then the distance between 1 and 2 with respect to species , or |1 − 2 | -distance between 1 and 2 for short, is d (1, 2) = max()− min() .

Clearly, d (1, 2) is a value in the interval [ 0, 1 ]. Moreover, d is a 1-bounded pseudometric, namely a function d : × → [ 0, 1 ] such that for all systems 1, 2, 3 the following properties hold: (i) d (1, 1) = 0, (ii) d (1, 2) = d (2, 1), and (iii) d (1, 3) ≤ d (1, 2) + d (2, 3). Notice that pseudometric d is not a metric, since, in general, it does not hold that d (1, 2) = 0 implies 1 = 2.

The second step to obtain the evolution distance consists in lifting distances on systems to distances on distributions over systems. Among the several notions of lifting for pseudometric proposed in the literature (see [22] for a survey), following [12, 13] we opt for that known as Wasserstein distance or Kantorovich-Rubinstein pseudometric, since it preserves the properties of the ground pseudometric and is computationally tractable via statistical inference.

In order to recall formally that notion, we need to introduce the notion of matching, also known as measure coupling [23], for pairs distributions.

Definition 3 (Matching, [23]). Assume a set . A matching for a pair of distributions (, ′) over is a distribution over the product state space × with left marginal , i.e. ∑︀′∈ (, ′) = () for all ∈ , and right marginal ′, i.e. ∑︀∈ (, ′) = ′(′) for all ′ ∈ . We let Ω(, ′) denote the set of all matchings for (, ′).

According to [23], a matching in Ω(, ′) may be understood as a transportation schedule for the shipment of probability mass from to ′. Then, by exploiting that notion, we can introduce the Kantorovich lifting.

Definition 4 (Kantorovich metric, [24]). The Kantorovich lifting of a pseudometric over a set is the pseudometric K() over distributions over defined for all distributions , ′ by K()(, ′) =

min ∑︁ (, ′) · (, ′).

∈Ω(, ′) ,′∈

The reader familiar with probability theory, might have recognized the Kantorovich lifting K()(, ′) as the minimum expected value of the ground distance , over the supports of and ′, with respect to the distribution . According to [23], K()(, ′) gives the optimal cost for shipping probability mass from to ′ when (, ′) is the unit cost for shipping mass from to ′.

We notice that pseudometric K(d ) preserves the 1-boundedness property of d . From the distances with respect to all species K(d1 ), . . . , K(d ), we can derive a unique notion of distance by exploiting weights that quantify the relevance that we want to assign to each specie.

Definition 5 (Distance between system distributions). Assume weights 1, . . . , ≥ 0 such that ∑︀

=1 = 1. The distance dd between system distributions induced by distances d1 , . . . , d and weights 1, . . . , is defined by dd(, ′) = ∑︀

=1 · K(d )(, ′).

Clearly, being a convex combination of 1-bounded pseudometrics, the function dd is a 1-bounded pseudometric. We argue that dd can be computed eficiently, by following [ 25]. In detail, computing dd(, ′) requires computing the Kantorovich liftings K(d1 )(, ′), . . . , K(d )(, ′). Both and ′ are discrete distributions, let us assume that their cardinality is . If and ′ are estimated by our randomized algorithm, then will be a parameter of the algorithm. In computing each lifting K(d )(, ′), each system in the support of and ′ can be viewed as a real, obtained as the cardinality of the molecules of divided by max() − min(). Therefore, and ′ can be viewed as multisets of reals of cardinality . This property is exploited in [25] as follows. In ( log ) these two multisets can be ordered. Let 1 ≤ · · · ≤ and 1 ≤ · · · ≤ be the ordered sequences. Notice that the ground distance d between two reals in the support of and ′ is a real (the absolute value of their diference, namely d (, ) = | − |). Following [25], we have that K(d )(, ′) = ∑︀=1 d (, ). Therefore, computing dd can be done in ( · log ).

We now need to lift dd to a distance on evolution sequences. To this end, we observe that the evolution sequence of a system includes the distributions over systems induced after each computation step. Following [12, 13] we define the evolution distance as a sort of weighted infinity norm of the tuple of the Kantorovich distances between the distributions in the evolution sequences. Definition 6. [Evolution distance] For a pseudometric dd over distributions over and an interval [ 1, 2], the evolution distance over [ 1, 2] is the mapping E(dd)[ 1, 2] such that, given systems 1 and 2 and their evolution sequences 1 =⇒ 11 =⇒ 21 =⇒ . . . 1 . . . and 2 =⇒ 12 =⇒ 22 =⇒ . . . 2 . . . we have

E(dd)[ 1, 2](1, 2) =

max dd( 1, 2) 1≤ ≤ 2 Since dd is a 1-bounded pseudometric, we can easily derive the same property for E(dd)[ 1, 2].

We remark that, as argued in [12], the choice of defining the evolution metric as the pointwise maximal distance in time between the evolution sequences of systems is natural and reasonable when one aims to evaluate the highest distance on the behaviour of systems. However, in diferent contexts it would be more appropriate to use a diferent aggregation function over the tuple of Kantorovich distances instead of max. The definition of E(dd)[ 1, 2] in Definition 6 could be replaced by a definition parametric with respect to aggregation functions. In order to prevent a too heavy notation, we decided to opt for the present formulation.

3.1. Robustness

Technically, in order to formalize the notion of robustness we need two distances, one taking into account input species and the other taking into account the output species. These will be the input distance dd and the output distance dd over distributions, and will be defined as in Definition 5 by giving positive weights to only input and only output species, respectively. The definition of robustness is parametric with respect to these two distances, two thresholds 1 and 2 and two intervals 1 and 2: a system is robust with respect to these parameters if whenever we consider a system ′ whose input distance from in interval 1 remains below 1, then the output distance observed in interval 2 remains below 2.

Definition 7.

A system is (dd, dd, 1, 2, 1, 2)-robust if for all systems ′:

E(dd)1 (, ′) ≤ 1 implies E(dd)2 (, ′) ≤ 2

Clearly, we will use 1 = [0, 0] if the input distance is relevant only in the initial state of systems and we use 2 = [ 1, ℎ] if we want to observe the behaviour of systems up to a time horizon ℎ.

4. Estimating the evolution distance

In this section we follow [12, 13] and propose an approach to estimate the evolution distance via statistical techniques. First, we show how we can estimate the evolution sequence of a given system . The idea was quickly anticipated in Section 2.2. Then, we evaluate the distance between two systems 1 and 2 on their estimated evolution sequences.

Given a system 0 and an integer ℎ we can use the function Simulate(0, ℎ), defined in [ 12] and adapted to our purpose in Figure 2, to sample a sequence of systems of the form = (0, 1, . . . , ℎ) that represents ℎ-steps of a computation starting from 0. Each step of the sequence is computed by using function SimStep, also defined in Figure 2. Here, we assume that ℛ = {1, . . . , |ℛ|} and we let rand be a function that allows us to get a uniform random number in (0, 1]. Essentially, SimStep computes one step of Gillespie algorithm. Our version of SimStep is significantly diferent with respect to that provided in [12], which was proposed for cyber physical systems and had to deal with the interactions between the cyber and the physical components of the systems. Our version of Simulate returns also the minimum and maximum level of each species in all generated systems. In computing the arrays and with the minimal and maximal levels of species we exploit the function proj mapping a system to the level of species , namely proj(1[1]1 ‖ · · · ‖ [] ) = .

To compute the empirical evolution sequence of a system the function Estimate proposed in [12] and adapted to our purposes in Figure 3 can be used. The function Estimate(, ℎ, ) invokes times the function Simulate in Figure 2 in order to sample sequences of systems (0 , . . . , ℎ ), for = 1, . . . , , each modeling ℎ steps of a computation from = 0. Thus, a sequence of tuples of samples {0, . . . , ℎ} is computed, where each is the tuple (1, . . . , ) of systems observed at time in each of the sampled computations. Notice that, for each ∈ {0, . . . , ℎ}, the samples 1, . . . , are independent and identically distributed. Each can be used to estimate the distribution , , namely the th element of the evolution sequence () = ,0 =⇒ ,1 =⇒ . . . =⇒ ,ℎ. For any , with 0 ≤ ≤ ℎ, we let ^, be the distribution such that for any system ′ we have ^, (′) = | ∩ {′}| .

Finally, we call ^ = ^,0 . . . ^,ℎ as the empirical evolution sequence. Notice that by applying the weak law of large numbers to the i.i.d samples, we get that when goes to infinite ^, converge weakly to , .

The algorithms in Figures 2 and 3 have been implemented in Python and are available at https: //github.com/dmanicardi/spebnr. This implementation can be viewed as a customisation of the tool Spear (https://github.com/quasylab/spear) implementing the algorithms in [12].

We have seen that function Estimate allows us to collect independent samples at each time step from 0 to a deadline ℎ. We proceed with showing how these samples can be used to estimate the distance between two systems 1 and 2.

if ? ∈ or ?! ∈ 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: end for = · ∏︀=1 , end for for = 1, . . . , |ℛ| do

{︃ = ∑︀|ℛ=|1 0 otherwise

if ̸= # ′ ← + end for ← rand() let s.t. ∑︀=−11 < ≤ ∑︀=1 for = 1, . . . , do ⎧⎪− if ? ∈ ⎪ ⎨⎪− + if ?! ∈ ⎪+ ⎪ ⎪⎩0 if ! ∈ otherwise 15: end for 16: return 1[1]′1 ‖ · · · ‖ []′ 17: end function

Following [25], to estimate the Kantorovich lifting K() of a distance between two (unknown) distributions over reals 1 and 2, one can use independent samples {11, . . . , 1 } taken from 1 and ℓ · independent samples {21, . . . , ℓ2· } taken from 2, with ℓ a natural suitable chosen. After that, one orders {11, . . . , 2

1 } and {21, . . . , ℓ· }, thus obtaining the two sequences of values and 1 ≤ . . . 1 ≤ ℓ· , respectively. The value K()( 1, 2) can be approximated as 1 ≤ · · · ≤ ℓ1 ∑︀ℓ=1 max{ − ⌈ ℓ ⌉, 0}.

In our case, since we need to estimate the Kantorovich lifting K(d ) for species between the distributions over systems 1, and 2, that are reached in steps from 1 and 2, respectively, we have to proceed as follows. We obtain the set of reals {11, . . . , 1 } by extracting samples from 1, , and, then, by taking from each sample the level of species and by dividing it by max() − min(). Analogously, we obtain the reals {21, . . . , ℓ2· } by extracting ℓ · samples from 2, , and, then, by taking from each sample the level of species and by dividing it by max() − min().

Clearly, having the estimation of K(d )( 1, , 2, ) for all species, we can derive an estimation for dd( 1, , 2, ) and, then, for E(dd)[,](1, 2). The whole procedure is realized by functions Distance and ComputeH in Figure 4, available at https://github.com/dmanicardi/spebnr. The former takes as input the two systems 1 and 2 to compare, the weights assigned to species = (1, . . . , ), the parameter ℎ giving the number of computation steps that are observed, the parameters and ℓ used to obtain the samplings of computation, and . Function Distance collects the samples 1,0, . . . , 1,ℎ and 2,0, . . . , 2,ℎ of possible computations of length ℎ from 1 and 2. Then, for each ∈ [, ], the distance at time is computed via the function ComputeH(1,, 2,, proj/(M() − m())), where m() and M() are the minimum and maximum level of species in all systems that are generated.

Due to the sorting of { ℎ | ℎ ∈ [1, . . . , ℓ ]} the complexity of function ComputeH is (ℓ log(ℓ )) (cf. [25]). We refer the interested reader to [26, Corollary 3.5, Equation (3.10)] for an estimation of the approximation error given by the evaluation of the Kantorovich distance over samples.

5. The EnvZ/OmpR Osmoregulatory Signaling System in E. Coli

In this section, we apply the theory presented in previous sections to the EnvZ/OmpR Osmoregulatory Signaling System in Escherichia Coli introduced in Example 1. This system was already analyzed in [ 5 ], within the deterministic model. As reported in [ 5 ], an interesting question is whether the initial concentrations of input species EnvZ and OmpR, which are represented in Figure 1 by X and Y, respectively, impact on the long-run concentration of output species phosphorylated OmpR, represented by YP. We can answer to that question by studying the robustness of systems with respect to an input distance dd and an output distance dd defined so that they capture the diferences over X and Y, and over YP, respectively.

The analysis in [ 5 ] concluded that the concentration of output species at steady state does not depend on the initial value of input species. In other words, at steady state an original system and its perturbed versions obtained by varying the initial concentration of input species X and Y do not exhibit any diference on the concentration of output species YP. Our analysis will allow us to study the step-by-step distances between the original system and the perturbed one, which are abstracted away in the steady state analysis.

Let us focus on system in Equation 1, whose initial species concentration are the same as in [ 5 ] and are reported in Table 3, together with the set ℛ of reactions and the reaction constants. We start our analysis by studying the distances between and two systems 1 and 2 that were obtained in [ 5 ] by perturbing its initial state as follows: system 1 starts with X = 10 and Y = 50, system 2 starts with X = 250 and Y = 1000. In [ 5 ] evidence is given that, at steady state, , 1 and 2 have the same level of YP. By applying function Simulate in Figure 2, we can simulate the evolution of these systems. For each system, we have applied = 100 times the function Simulate with parameter ℎ = 15000. In Figure 5 we report the step-by-step average value obtained for all relevant species, namely X, Y and YP, and we highlight the values obtained for YP.

By applying function Distance in Figure 4 we can estimate the evolution distance between the original system and the two perturbed ones. We define the input distance dd so that only the weights associated with X and Y are positive, in particular we decided that both weights are 0.5 to reflect that the two species have the same relevance. Then, we define the output distance dd so that only the weight associated with YP is positive. In Table 4 we report the results obtained by applying Distance with the following parameters: = 50 runs for the original system, ℓ · = 100 runs for (a) Evolution of X, Y, YP in (c) Evolution of X, Y, YP in 1 (e) Evolution of X, Y, YP in 2 (b) Evolution of YP in (d) Evolution of YP in 1 (f) Evolution of YP in 2 the perturbed systems, ℎ = 15000 steps for each run. For each perturbed system, in Table 4 we report: (i) the input distance between the perturbed system and , which is simply computed by focusing on the initial level of input species in and the perturbed systems, and the minimal and maximal level that are stored in m and M by Distance; (ii) the maximal output distance dd computed by Distance in interval [7500, 15000]; (iii) a pointer to a figure describing the step-by-step evolution of input and output distance between and the perturbed system. Notice that dd(, 2) is one order of magnitude higher than dd(, 1), whereas E(dd)[7500,15000](, 1) and E(dd)[7500,15000](, 2) are close each other and are quite low numbers. This suggests that even if one changes the initial level of X and Y in a light or a heavy way, the step-by-step variation of YP is light in all cases.

System

Initial X

Initial Y dd from

E(dd)[7500,15000] from 1 2

However, in our setting, a more systematic analysis for studying how the initial concentrations of X and Y in a system influence the concentration of YP in a temporal interval can be conducted as follows: we fix the maximal input distance 1 between system and its perturbed versions and, then, we estimate for which 2 the system is (dd, dd, 1 = [0, 0], , 1, 2)-robust. More precisely, in order to estimate 2, for a suitable we sample systems 1, . . . , satisfying E(dd)[0,0](, ) ≤ 1 and we fix 2 = max=1,..., E(dd) (, ).

We have considered five diferent values for 1: 0.3, 0.4, 0.5, 0.6, 0.7. For each choice for 1 we have sampled = 20 systems at input distance dd from bounded by 1. More precisely, we decided to sample these 20 systems so that they are at a dd distance from in the interval ( 1 − 0.1, 1]. Then, we have estimated the 2 for which is (dd, dd, [0, 0], , 1, 2)-robust, for = [7500, 15000]. In each experiment, we simulated runs consisting of ℎ = 15000 reactions, and we considered = 50 runs for the original system and ℓ · = 100 runs for the = 20 perturbed (a) step-by-step values for dd(, 1) and dd(, 1) (b) step-by-step values for dd(, 2) and dd(, 2) systems. For each of the five choices for 1, in Table 5 we report the value 2 for which we have estimated the (dd, dd, [0, 0], [7500, 15000], 1, 2)-robustness, a pointer to the picture showing the step-by-step evolution of dd and dd between and the perturbed system realizing 2 and, finally, the initial concentration levels of X and Y of that system. Summarizing, at varying of 1 in [0.3, 0.7] we get values for 2 that are close each other and are one order of magnitude lower than 1. This suggests that , that was classified as robust in the steady state analysis in [ 5 ], has a good level of robustness also in the step-by-step approach. The analysis in [ 5 ] allows one to conclude that at steady state the level of YP does not depend on the initial level of X and Y, our analysis allows to conclude that if one varies the initial level of YP then, step-by-step, the changes in YP are limited and smooth.

The Python code used for the analysis can be found at https://github.com/dmanicardi/spebnr. For each value for 1, the analysis took 150 minutes on a 1.70 GHz i7-1255U, with 16.00 GB RAM. 1

6. Conclusion and Future Work

We have proposed a notion of robustness for biochemical networks that, essentially, evaluates the ability of a network to exhibit step-by-step limited variations on the quantity of a so called output species at varying of the initial concentration of some so called input species.

Recently, Robustness Temporal Logic [14] (RobTL) has been proposed for the specification and analysis of distances between the behaviours of cyber-physical systems over a finite time horizon. Atomic propositions are defined by means of two simple languages: one to specify the efect of perturbations over an evolution sequence, and one to specify distance expressions between an evolution sequence (a) dd and dd, 1 = 0.3 (c) dd and dd, 1 = 0.5 (e) dd and dd, 1 = 0.7 (b) dd and dd, 1 = 0.4

(d) dd and dd, 1 = 0.6 and its perturbed version. In detail, atomic expressions are of the form Δ(exp, pert) ◁▷ , with ◁▷ ∈ {<, ≤ , =, ̸=, >, ≥} , and allow for comparing a threshold with the distance, specified by an expression exp, between an evolution sequence and its perturbed version, obtained by applying a perturbation specified by pert, starting from a given time step. Boolean and temporal operators allows for extending these evaluations to the entire evolution sequence. Then, the tool Stark [13] ofers: (i) languages for specifying systems, perturbations, distance expressions, and RobTL formulae; (ii) a module for the simulation of systems behaviours and their perturbed versions; (iii) a module for the evaluation of distances between behaviours; (iv) a statistical model checker for RobTL formulae. An interesting future work consists in extending RobTL in order to allow for specifying and analysing properties of biochemical networks, and, then, enriching the Stark tool accordingly in order to use it for the robustness analysis of biochemical networks.

Acknowledgments Bibliography

This publication is part of the project NODES which has received funding from the MUR – M4C2 1.5 of PNRR with grant agreement no. ECS00000036.

[1]

Uri , An introduction to systems biology: design principles of biological circuits, Chapman and Hall/CRC, 2006 .

[2]

Zhou ,

J. C.

Doyle , Essentials of robust control, Prentice-Hall, 1998 .

[3]

Chong ,

Lanotte ,

Merro ,

Tini ,

Xiang , Quantitative robustness analysis of sensor attacks on cyber-physical systems , in: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control , HSCC 2023 , San Antonio, TX, USA, May 9- 12 , 2023 , ACM, 2023 , pp. 20 : 1 - 20 : 12 .

[4]

Kitano , Towards a theory of biological robustness , Molecular Systems Biology 3 ( 2007 ) 137 .

[5]

Nasti ,

Gori ,

Milazzo , Formalizing a notion of concentration robustness for biochemical networks , in: Federation of International Conferences on Software Technologies: Applications and Foundations , Springer, 2018 , pp. 81 - 97 .

[6]

D. T.

Gillespie , Exact stochastic simulation of coupled chemical reactions , The journal of physical chemistry 81 ( 1977 ) 2340 - 2361 .

[7]

D. J.

Barnes ,

D. F.

Chu , Introduction to modeling for biosciences, Springer, 2010 .