<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Electronic Journal of Statistics 6 (2021) 1550-1599.
doi:10.1214/12</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/11905455\_1</article-id>
      <title-group>
        <article-title>Step-by-step Robustness for Biochemical Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ruggero Lanotte</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Desiree Manicardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Tini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Insubria</institution>
          ,
          <addr-line>Como</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>12477</volume>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We propose a notion of robustness for biochemical networks that, intuitively, measures the ability of the network to exhibit step-by-step limited variations on the concentration of a species of interest at varying of the initial concentration of other species. We provide a statistical technique allowing for estimating robustness and showcase it on the EnvZ/OmpR Osmoregulatory Signal System of E. coli.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Robustness</kwd>
        <kwd>Biochemical networks</kwd>
        <kwd>Behavioural distances</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Living cells are complex systems whose morphological and functional organization have been thoroughly
investigated in recent years, in the context of system biology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The complex behavior of cells arises
from the interactions of their huge amount of components, that interact with each other, through
chemical reaction networks. Malfunctioning or corruption of these interactions may originate in severe
diseases, such as cancer or diabetes. Therefore, it is relevant to study how the components of the cells
interact with each other as a system in order to be able to predict how perturbations can modify their
behavior. In particular, in some cases, it is interesting to predict in which case the nominal behavior of
the cell can be maintained in presence of those perturbations, which leads to the notion of robustness.
Notably, the ability to maintain the nominal behavior is not a qualitative property, but can be quantified,
to formalize to which extent the original behavior can be maintained in function of the amount of
perturbation introduced in the system.
      </p>
      <p>
        The notion of robustness has been widely used in several contexts, from control theory [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to security
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and biology [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and is commonly meant as the ability of a system to maintain its functionalities
against external and internal perturbations. Among the notions of robustness introduced in biology,
 -robustness [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] verifies how by varying the initial concentration of some species, called conventionally
the input species, the concentration of other species of interest, called the output species, varies at steady
state.
      </p>
      <p>
        In the present paper, we study a notion of robustness that is inspired by that of  -robustness, but
with two main diferences: (i) we work within the stochastic model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], whereas  -robustness has
been proposed for the deterministic model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; (ii) coherently with the choice of the model, instead of
evaluating the concentration level of output species at steady state, we evaluate it step-by-step, up to a
ifnite horizon. Being the deterministic and the stochastic approach complementary [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we can argue
that also our results and those in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are. Clearly, robustness in our sense captures random efects and
temporary efects that are typical of the stochastic model.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Our contribution</title>
        <p>In order to study the robustness of system behavior in the stochastic model, we need: (i) a language
allowing us to specify systems of interest; (ii) a semantic model, which must capture the probabilistic
behavior that characterizes the stochastic approach; (iii) a formal notion of robustness, defined on the
semantic model; (iv) an algorithm allowing us to estimate the robustness of a system starting from its
specification.</p>
        <p>As regards specification, we rely on process calculi (similar to those in [8, 9]), adopting in particular
the species as processes [10] approach. Essentially, a solution with  species is modeled by the parallel
composition of  processes, where each process represents one species and its concentration level. In
this paper, the concentration level is expressed in terms of the number of molecules, but this can be
generalized.</p>
        <p>
          In order to model the behavior, we follow Gillespie’s approach [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], where computation steps represent
single chemical reactions. Here, at each step there is a competition between all available reactions,
giving rise to a probabilistic behavior. In this context, we believe that it is convenient to adopt the
semantic model of evolution sequences proposed in [11, 12, 13, 14, 15, 16]: essentially, an evolution
sequence is a sequence of probability measures over the attainable configurations.
        </p>
        <p>
          By adopting evolution sequences, one can exploit the whole theory developed in [12, 13] to measure
the diferences between system behaviors, which supports the study of robustness properties. In [12, 13]
the focus is on cyber-physical systems and the starting idea is to provide a notion of distance between
computation states that quantifies to which extent they perform diferently with respect to some targets
that are fixed initially. In other words, to each computation state one assigns a penalty quantifying
how bad the system is working with respect to the targets, and the distance between two computation
states is given by the diference of the penalties that are assigned to them. Then, the notion of distance
between computation states is lifted first to probability measures over computation states and, then,
to evolution sequences. In the present paper, we customize this approach to support the study of a
robustness property for biochemical networks inspired by the  -robustness of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Essentially, if we are
interested in studying the diference between the behaviour of two systems by focusing on the quantity
of a given set of species, we can assign to each system state a rank that depends on the available quantity
of these species. Then, the distance between two states coincides with the diference between their
ranks, and this can be lifted to probability measures and to evolution sequences. Since  -robustness
bases on how much the output species vary depending on the variation of the initial concentration
of input species, we can consider two diferent ranks, which capture the input and the output species,
respectively, and that allow us to define the input distance and the output distance between the nominal
system and the perturbed one. Then we formalize a notion of robustness whose intuition is that small
variations on input distance should give rise to smooth and limited variations on output distance.
        </p>
        <p>Then, following [12, 13], we provide a randomized algorithm that permits estimating the evolution
sequences of systems and thus for the evaluation of the distances between them. This allows us to
estimate robustness of a nominal system, by sampling perturbed systems at a fixed maximal input
distance from it and by estimating their output distance. In order to validate our proposal, we have
provided a Python implementation, available at https://github.com/dmanicardi/spebnr, for the EnvZ/OmpR
Osmoregulatory Signaling System of E. coli.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Related work</title>
        <p>
          Robustness in biology has been extensively studied in recent years. A very general approach has been
proposed in [17, 18], where the nominal behaviour of a system subject to perturbations is expressed
in terms of a formula specified with a linear temporal logic equipped with a quantitative semantics,
then the robustness of the system is quantified as the average satisfaction degree of that property over
all admissible perturbations, possibly weighted by their probabilities. Several notions of robustness
proposed in the literature are less general and focus on steady state behaviour. As an example, the notion
of adaptability in [19] captures the idea that the behaviour at steady state of a system is insensitive with
respect to the initial concentration of some species. Absolute concentration robustness [20, 21] with
respect to a given species requires that the system admits at least one positive steady state and that the
concentration of that species is the same in all of the positive steady states that the system might admit.
The notion of  -robustness has been proposed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] on the Continuous Petri Nets model. Intuitively,
this notion captures the idea that by varying the initial concentration of input species under suitable
constraints, namely by remaining within a so called interval marking of the net, then at steady state
one observes a bounded variation of the concentration of output species, namely that concentration is
in a ball of radius  .
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Organization of contents.</title>
        <p>We devote Section 2 to the presentation of our model. In Section 3 we discuss the behavioral distances
and our notion of robustness. Then, in Section 4 we provide the algorithm, based on statistical inference,
for estimating the distances and the robustness. The study of EnvZ/OmpR Osmoregulatory Signaling
System is described in Section 5. Finally, Section 6 concludes the paper with a discussion on future
research directions.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The model</title>
      <p>We assume a set  of names for species and a set ℛ of chemical reactions. Then, we use a set of actions
 describing how a species can take part to a reaction. In particular, if  is a reaction in ℛ, then the
following actions are in :
• ?, denoting that the species participates in reaction  as a reactant, consuming  molecules,
• !, denoting that the species participates in reaction  as a product, producing  molecules,
• ?!, denoting that the species participates in reaction  as both a reactant and a product, like in
the case of enzymatic reactions. Here,  molecules are consumed and  molecules are produced.
Elements in  will be denoted with  ,  ′,  1, . . . . For an action  ∈ , the associated reaction
r( ) is defined by r(?) = r(!) = r(?!) = . For a set of actions  ⊆  , we let r() denote
r() = {r( ) |  ∈ }.</p>
      <p>Definition 1 (System). The set  of systems over  , ℛ and  is defined by</p>
      <p>::= [] |  ‖ 
where: (i)  ∈  , (ii)  ⊆  , and (iii)  is a natural.</p>
      <p>Intuitively, the system [] represents that  molecules of species  are in the mixture and have a
potential behaviour described by , and ‖ is the (commutative and associative) parallel composition
operator, which allows us to model that several species coexist in the same mixture.</p>
      <p>Given systems [] , for  = 1, . . . , , we will say that the system 1[1]1 ‖ · · · ‖ [] is
well formed if for all 1 ≤  &lt;  ≤  it holds that  ̸=  . We will always assume to work with well
formed systems. Intuitively, well formedness ensures that each species is represented by precisely one
parallel component.</p>
      <p>
        Example 1. Escherichia Coli is a bacterium whose cell contains a few million proteins of diferent
types [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The EnvZ/OmpR Osmoregulatory Signaling System regulates two of those proteins. In order
to support our exposition, in Figure 1 we report the graphical representation of this regulatory network
as given in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by exploiting the formalism of Continuous Petri Nets. The main components of this
chemical network are EnvZ (histine kinase) and OmpR (response regulator), denoted, respectively,
as X and Y. EnvZ phosphorylates OmpR (YP in the figure) and itself ( XP), by binding and breaking
down ATP. In the picture we can see that this system is characterized by eight species, represented by
places in the net, and eleven reactions, represented by transitions. More in detail, the reactants and the
products of a reaction are those that are represented by the places that are the sources and the targets,
respectively, of the transition represented by that reaction. For instance, the species XT is a reactant
1
X
Y
XPY
      </p>
      <p>XDYP
10
YP
8
for the reactions 4 and 5, and is a product for 3. In this example, each reaction uses precisely one
molecule for each of its reactants and produces precisely one molecule for each of its products. Below
we give the system  that represents the regulatory network with species X, Y, XD and YP having a
number of molecules corresponding to 25, 150, 50 and 10, respectively, and the other species having no
molecule.</p>
      <p>=</p>
      <p>X[{2?1, 3?1, 1!1, 4!1, 8!1}]25
‖</p>
      <p>Y[{6?1, 7!1, 11!1}]150
‖
‖</p>
      <p>XD[{1?1, 9?1, 2!1, 10!1, 11!1}]50
XPY[{6!1}]0
‖</p>
      <p>XDYP[{9!1}]0
‖
‖</p>
      <p>XT[{3!1}]0
‖</p>
      <p>XP[{5!1, 7!1}]0</p>
      <p>(1)
YP[{9?1, 8!1, 10!1}]10</p>
      <sec id="sec-2-1">
        <title>2.1. Behavioral model</title>
        <p>
          We define the system’s behaviour by means of two types of transitions. Those of the first type represent
the potential behaviour of systems. They are of the form →−− − − (,) ′, which models that the reaction
 can take  to ′. Here,  is the weight of the transition, and is a real number that will allow us to
calculate the rate of the reaction , which is needed in order to calculate the probability of all enabled
reactions. The transitions of the second type describe how all enabled reactions for a system  compete
and take it to a probability distribution over systems. They are of the form  =⇒  , where  is a
discrete distribution over systems, namely a mapping  : →− [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] with ∑︀∈  () = 1. From
the transitions of the form  =⇒  we will derive transitions of the form  =⇒  ′ that model the
evolution of distributions of system.
        </p>
        <p>The transitions of the first type are derived by means of the inference rules in Table 1. The first rule
represents that  molecules of species n are consumed by taking part to reaction  as reactant. The
weight (︀ )︀ coincides with the number of ways  molecules of  can be taken out from the available ,

namely it is the number of sets of molecules of  that can take part to the reaction . The second rule
 =⇒
trgt() = {(, , ) |  ∈ }
∑︁ ( · )
∈ ∑︀∈ ( · )
 ()
represents that  molecules of n are produced by taking part to reaction  as product. The weight 1
reflects that the number of molecules of the species does not impact on the rate of the reaction. The
third rule represents that  molecules of n are consumed and  molecules of n are produced by taking
part to reaction  both as reactant and as product. The fourth rule is applied when  is not involved
in reaction . The inferred transition allows for composing the behaviour of [] with that of other
species. The last rule allows for combining the behaviour of systems running in parallel. Clearly, they
have to agree upon the reaction to be taken and the resulting weight is the product of the weights of
the composed transitions. Here, we extend classic product by # ·  =  · # =  for all reals , and
# · # = #.</p>
        <p>For a system  we denote by trgt() the set of the triples {(, , ) →|−−− − −  and  ̸= #}.
Notice that for  = 1[1]1 ‖ · · · ‖ [] , we have (, , ) ∈ trgt() with  &gt; 1 if and only
if  contains  diferent sets of reagents that can take part to reaction .</p>
        <p>In order to define the transitions of the form  =⇒  , we need some notation for distributions
over systems. By  () we denote the point distribution giving probability 1 to  and probability 0
to all ′ diferent from . Then, for a countable set of indexes  and distributions   with  ∈ ,
the distribution ∑︀∈   with all  ≥ 0 and ∑︀∈  = 1 is the convex distribution defined by
(∑︀∈  )() = ∑︀∈  ·  ().</p>
        <p>The first rule in Table 2 represents the competition between all enabled reactions of . For each
reaction , we consider the constant reaction  associated with the reaction, where, intuitively,   is
the probability that a particular combination of reactants gives rise to reaction  in an infinitesimal
time interval . The probability that  behaves as described by (, , ) is the ratio between the
rate  ·  of the reaction  and the sum of the rates of all reactions ∑︀∈ ( ·  ). To explain
this point, we recall that  ·  is the parameter of the exponential distribution modeling the time
elapsing between two consecutive occurrences of reaction  and ∑︀∈ ( ·  ) is the parameter of the
exponential distribution modeling the time elapsing between two consecutive occurrences of arbitrary
( · )
reactions. Summarizing,  reaches the convex distribution of systems ∑︀∈   with  = ∑︀∈ ( · )
and   =  ().</p>
        <p>The second rule in Table 2 lifts the behaviour of systems to that of distributions over systems. We
note that if there is a sequence of transitions  1 =⇒  2 =⇒ . . .   . . . , for  1 the point distribution
 (1[1]1 ‖ · · · ‖ [] ), then all systems  in the support of any   are of the form  =
1[1]′1 ‖ · · · ‖ []′ , for suitable ′1, . . . , ′. Following [12, 13], the sequence  1 =⇒  2 =⇒
. . .   . . . is called an evolution sequence. In particular, it is the evolution sequence of  if  1 =  ().</p>
        <p>In a transition  =⇒  the probability weight assigned to each element in the support of  depends
on the rate of all reactions that are possible in . This is no more true in a transition  =⇒  ′ since the
rates of the chemical reactions from two diferent systems in the support of  are unrelated.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Remarks on the semantic model</title>
        <p>
          We have described the behaviour of a system by means of an evolution sequence, namely a sequence
of distributions of systems. Clearly, computing exactly these distributions is undoable. However, by
adapting to our context the work in [12, 13], we can easily obtain a randomized algorithm allowing us
to estimate the evolution sequence of a system . This randomized algorithm, detailed in Section 4,
works as follows: (i) we apply  times a procedure allowing us to compute a ℎ-steps trajectory.
Essentially, this coincides with applying  times the classical Gillespie algorithm [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For  = 1, . . . ,  ,
we obtain the trajectory 0 , 1 , . . . , ℎ , where all 0 coincide with . Clearly, for all  = 1, . . . , ℎ
it holds that the samples 1, . . . ,  obtained at step  are independent and identically distributed;
(ii) for each  = 1, . . . , ℎ, we use 1, . . . ,  to derive the empirical distribution ^ defined simply
by ^ (′) = |{|=′ and 1≤ ≤ }| ; (iii) by applying the weak law of large numbers to i.i.d. samples we

infer that when  goes to infinite then ^ converges weakly to   , where   are the distributions such
that  () =⇒  1 =⇒  2 . . . .
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Behavioral distances over systems</title>
      <p>In [12, 13] a notion of behavioral distance over cyber-physical systems was proposed, aiming to express
how well the cyber components fulfill their tasks. In this section, we customize that theory for our model,
clearly with a diferent target. In particular, given two systems that difer by the number of molecules of
species, say 1 = 1[1]11 ‖ · · · ‖ []1 and 2 = 1[1]21 ‖ · · · ‖ []2 , we aim to quantify
the diferences over their evolution sequences 1 =⇒  11 =⇒  21 . . . and 2 =⇒  12 =⇒  22 . . . . To
this purpose, we proceed as follows: (i) first we introduce the concept of distance between systems,
which deals with the diference between the number of molecules of the species; (ii) then we lift this
notion to a notion of distance between distributions, so that we can quantify the distance between the
distributions  1 and  2 that are reached by 1 and 2, respectively, at step , and, finally, (iii) we lift
this distance to a distance between the two evolution sequences.</p>
      <p>We start with a notion of distance between systems that focuses on a single specie. We assume to
know the minimum and maximum level min() and max() that a given species  may reach during
the computation. Notice that these values can be estimated by applying our randomized algorithm
quickly described above in Section 2.2 and detailed below in Section 4.</p>
      <p>Definition 2 (-distance over systems). Assume systems 1 = 1[1]11 ‖ · · · ‖ []1 and
2 = 1[1]21 ‖ · · · ‖ []2 . Then the distance between 1 and 2 with respect to species , or
|1 − 2 |
-distance between 1 and 2 for short, is d (1, 2) = max()− min() .</p>
      <p>
        Clearly, d (1, 2) is a value in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. Moreover, d is a 1-bounded pseudometric,
namely a function d :  ×  → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] such that for all systems 1, 2, 3 the following properties hold:
(i) d (1, 1) = 0, (ii) d (1, 2) = d (2, 1), and (iii) d (1, 3) ≤ d (1, 2) + d (2, 3).
Notice that pseudometric d is not a metric, since, in general, it does not hold that d (1, 2) = 0
implies 1 = 2.
      </p>
      <p>The second step to obtain the evolution distance consists in lifting distances on systems to distances
on distributions over systems. Among the several notions of lifting for pseudometric proposed in the
literature (see [22] for a survey), following [12, 13] we opt for that known as Wasserstein distance or
Kantorovich-Rubinstein pseudometric, since it preserves the properties of the ground pseudometric and
is computationally tractable via statistical inference.</p>
      <p>In order to recall formally that notion, we need to introduce the notion of matching, also known as
measure coupling [23], for pairs distributions.</p>
      <p>Definition 3 (Matching, [23]). Assume a set . A matching for a pair of distributions (,  ′) over 
is a distribution  over the product state space  ×  with left marginal  , i.e. ∑︀′∈ (, ′) =  ()
for all  ∈ , and right marginal  ′, i.e. ∑︀∈ (, ′) =  ′(′) for all ′ ∈ . We let Ω(,  ′) denote
the set of all matchings for (,  ′).</p>
      <p>According to [23], a matching in Ω(,  ′) may be understood as a transportation schedule for the
shipment of probability mass from  to  ′. Then, by exploiting that notion, we can introduce the
Kantorovich lifting.</p>
      <p>Definition 4 (Kantorovich metric, [24]). The Kantorovich lifting of a pseudometric  over a set  is
the pseudometric K() over distributions over  defined for all distributions ,  ′ by
K()(,  ′) =</p>
      <p>min ∑︁ (, ′) · (, ′).</p>
      <p>∈Ω(, ′) ,′∈</p>
      <p>The reader familiar with probability theory, might have recognized the Kantorovich lifting K()(,  ′)
as the minimum expected value of the ground distance , over the supports of  and  ′, with respect to
the distribution . According to [23], K()(,  ′) gives the optimal cost for shipping probability mass
from  to  ′ when (, ′) is the unit cost for shipping mass from  to ′.</p>
      <p>We notice that pseudometric K(d ) preserves the 1-boundedness property of d . From the distances
with respect to all species K(d1 ), . . . , K(d ), we can derive a unique notion of distance by exploiting
weights that quantify the relevance that we want to assign to each specie.</p>
      <p>Definition 5 (Distance between system distributions). Assume weights 1, . . . ,  ≥ 0 such that
∑︀</p>
      <p>=1  = 1. The distance dd between system distributions induced by distances d1 , . . . , d and
weights 1, . . . ,  is defined by dd(,  ′) = ∑︀</p>
      <p>=1  · K(d )(,  ′).</p>
      <p>Clearly, being a convex combination of 1-bounded pseudometrics, the function dd is a 1-bounded
pseudometric. We argue that dd can be computed eficiently, by following [ 25]. In detail, computing
dd(,  ′) requires computing the Kantorovich liftings K(d1 )(,  ′), . . . , K(d )(,  ′). Both  and
 ′ are discrete distributions, let us assume that their cardinality is  . If  and  ′ are estimated by
our randomized algorithm, then  will be a parameter of the algorithm. In computing each lifting
K(d )(,  ′), each system in the support of  and  ′ can be viewed as a real, obtained as the cardinality
of the molecules of  divided by max() − min(). Therefore,  and  ′ can be viewed as multisets of
reals of cardinality  . This property is exploited in [25] as follows. In ( log  ) these two multisets
can be ordered. Let 1 ≤ · · · ≤  and 1 ≤ · · · ≤  be the ordered sequences. Notice that the ground
distance d between two reals in the support of  and  ′ is a real (the absolute value of their diference,
namely d (,  ) = | −  |). Following [25], we have that K(d )(,  ′) = ∑︀=1 d (,  ).
Therefore, computing dd can be done in ( ·  log  ).</p>
      <p>We now need to lift dd to a distance on evolution sequences. To this end, we observe that the
evolution sequence of a system includes the distributions over systems induced after each computation
step. Following [12, 13] we define the evolution distance as a sort of weighted infinity norm of the tuple
of the Kantorovich distances between the distributions in the evolution sequences.
Definition 6. [Evolution distance] For a pseudometric dd over distributions over  and an interval
[ 1,  2], the evolution distance over [ 1,  2] is the mapping E(dd)[ 1, 2] such that, given systems 1 and
2 and their evolution sequences 1 =⇒  11 =⇒  21 =⇒ . . .  1 . . . and 2 =⇒  12 =⇒  22 =⇒
. . .  2 . . . we have</p>
      <p>E(dd)[ 1, 2](1, 2) =</p>
      <p>max dd( 1,  2)
 1≤ ≤  2
Since dd is a 1-bounded pseudometric, we can easily derive the same property for E(dd)[ 1, 2].</p>
      <p>We remark that, as argued in [12], the choice of defining the evolution metric as the pointwise
maximal distance in time between the evolution sequences of systems is natural and reasonable when
one aims to evaluate the highest distance on the behaviour of systems. However, in diferent contexts
it would be more appropriate to use a diferent aggregation function over the tuple of Kantorovich
distances instead of max. The definition of E(dd)[ 1, 2] in Definition 6 could be replaced by a definition
parametric with respect to aggregation functions. In order to prevent a too heavy notation, we decided
to opt for the present formulation.</p>
      <sec id="sec-3-1">
        <title>3.1. Robustness</title>
        <p>Technically, in order to formalize the notion of robustness we need two distances, one taking into
account input species and the other taking into account the output species. These will be the input
distance dd and the output distance dd over distributions, and will be defined as in Definition 5 by
giving positive weights to only input and only output species, respectively. The definition of robustness
is parametric with respect to these two distances, two thresholds  1 and  2 and two intervals 1 and
2: a system  is robust with respect to these parameters if whenever we consider a system ′ whose
input distance from  in interval 1 remains below  1, then the output distance observed in interval 2
remains below  2.</p>
        <p>Definition 7.</p>
        <p>A system  is (dd, dd, 1, 2,  1,  2)-robust if for all systems ′:</p>
        <p>E(dd)1 (, ′) ≤  1 implies E(dd)2 (, ′) ≤  2</p>
        <p>Clearly, we will use 1 = [0, 0] if the input distance is relevant only in the initial state of systems and
we use 2 = [ 1, ℎ] if we want to observe the behaviour of systems up to a time horizon ℎ.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Estimating the evolution distance</title>
      <p>In this section we follow [12, 13] and propose an approach to estimate the evolution distance via
statistical techniques. First, we show how we can estimate the evolution sequence of a given system .
The idea was quickly anticipated in Section 2.2. Then, we evaluate the distance between two systems
1 and 2 on their estimated evolution sequences.</p>
      <p>Given a system 0 and an integer ℎ we can use the function Simulate(0, ℎ), defined in [ 12] and
adapted to our purpose in Figure 2, to sample a sequence of systems of the form  = (0, 1, . . . , ℎ)
that represents ℎ-steps of a computation starting from 0. Each step of the sequence is computed by
using function SimStep, also defined in Figure 2. Here, we assume that ℛ = {1, . . . , |ℛ|} and we
let rand be a function that allows us to get a uniform random number in (0, 1]. Essentially, SimStep
computes one step of Gillespie algorithm. Our version of SimStep is significantly diferent with respect
to that provided in [12], which was proposed for cyber physical systems and had to deal with the
interactions between the cyber and the physical components of the systems. Our version of Simulate
returns also the minimum and maximum level of each species  in all generated systems. In computing
the arrays  and  with the minimal and maximal levels of species we exploit the function proj
mapping a system to the level of species , namely proj(1[1]1 ‖ · · · ‖ [] ) = .</p>
      <p>To compute the empirical evolution sequence of a system  the function Estimate proposed in
[12] and adapted to our purposes in Figure 3 can be used. The function Estimate(, ℎ,  ) invokes
 times the function Simulate in Figure 2 in order to sample  sequences of systems (0 , . . . , ℎ ),
for  = 1, . . . ,  , each modeling ℎ steps of a computation from  = 0. Thus, a sequence of tuples
of samples {0, . . . , ℎ} is computed, where each  is the tuple (1, . . . ,  ) of systems observed
at time  in each of the  sampled computations. Notice that, for each  ∈ {0, . . . , ℎ}, the samples
1, . . . ,  are independent and identically distributed. Each  can be used to estimate the distribution
 , , namely the th element of the evolution sequence  () =  ,0 =⇒  ,1 =⇒ . . . =⇒  ,ℎ. For

any , with 0 ≤  ≤ ℎ, we let ^, be the distribution such that for any system ′ we have
^, (′) = | ∩ {′}| .</p>
      <p>Finally, we call ^ = ^,0 . . . ^,ℎ as the empirical evolution sequence. Notice that by applying the
weak law of large numbers to the i.i.d samples, we get that when  goes to infinite ^, converge
weakly to  , .</p>
      <p>The algorithms in Figures 2 and 3 have been implemented in Python and are available at https:
//github.com/dmanicardi/spebnr. This implementation can be viewed as a customisation of the tool
Spear (https://github.com/quasylab/spear) implementing the algorithms in [12].</p>
      <p>We have seen that function Estimate allows us to collect independent samples at each time step from
0 to a deadline ℎ. We proceed with showing how these samples can be used to estimate the distance
between two systems 1 and 2.</p>
      <p>if ? ∈  or ?! ∈ 
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
end for
 =  · ∏︀=1 ,
end for
for  = 1, . . . , |ℛ| do</p>
      <p>{︃ 
 = ∑︀|ℛ=|1 
0 otherwise</p>
      <p>if  ̸= #
′ ←  +
end for
 ← rand()
let  s.t. ∑︀=−11  &lt;  ≤ ∑︀=1 
for  = 1, . . . ,  do
⎧⎪−  if ? ∈ 
⎪
⎨⎪−  +  if ?! ∈ 
⎪+
⎪
⎪⎩0
if ! ∈ 
otherwise
15: end for
16: return 1[1]′1 ‖ · · · ‖ []′
17: end function</p>
      <p>Following [25], to estimate the Kantorovich lifting K() of a distance  between two (unknown)
distributions over reals  1 and  2, one can use  independent samples {11, . . . , 
1 } taken from
 1 and ℓ ·  independent samples {21, . . . , ℓ2·  } taken from  2, with ℓ a natural suitable chosen.
After that, one orders {11, . . . ,  2</p>
      <p>1 } and {21, . . . , ℓ·  }, thus obtaining the two sequences of values
 and 1 ≤ . . . 1 ≤ ℓ·  , respectively. The value K()( 1,  2) can be approximated as
1 ≤ · · · ≤
ℓ1 ∑︀ℓ=1 max{ − ⌈ ℓ ⌉, 0}.</p>
      <p>In our case, since we need to estimate the Kantorovich lifting K(d ) for species  between the
distributions over systems  1, and  2, that are reached in  steps from 1 and 2, respectively, we
have to proceed as follows. We obtain the set of reals {11, . . . , 1 } by extracting  samples from  1, ,
and, then, by taking from each sample the level of species  and by dividing it by max() − min().
Analogously, we obtain the reals {21, . . . , ℓ2·  } by extracting ℓ ·  samples from  2, , and, then, by
taking from each sample the level of species  and by dividing it by max() − min().</p>
      <p>Clearly, having the estimation of K(d )( 1, ,  2, ) for all species, we can derive an estimation
for dd( 1, ,  2, ) and, then, for E(dd)[,](1, 2). The whole procedure is realized by functions
Distance and ComputeH in Figure 4, available at https://github.com/dmanicardi/spebnr. The former
takes as input the two systems 1 and 2 to compare, the weights assigned to species  = (1, . . . , ),
the parameter ℎ giving the number of computation steps that are observed, the parameters  and ℓ used
to obtain the samplings of computation,  and . Function Distance collects the samples 1,0, . . . , 1,ℎ
and 2,0, . . . , 2,ℎ of possible computations of length ℎ from 1 and 2. Then, for each  ∈ [, ], the
 distance at time  is computed via the function ComputeH(1,, 2,, proj/(M() − m())),
where m() and M() are the minimum and maximum level of species  in all systems that are generated.</p>
      <p>Due to the sorting of { ℎ | ℎ ∈ [1, . . . , ℓ ]} the complexity of function ComputeH is (ℓ log(ℓ ))
(cf. [25]). We refer the interested reader to [26, Corollary 3.5, Equation (3.10)] for an estimation of the
approximation error given by the evaluation of the Kantorovich distance over  samples.</p>
    </sec>
    <sec id="sec-5">
      <title>5. The EnvZ/OmpR Osmoregulatory Signaling System in E. Coli</title>
      <p>
        In this section, we apply the theory presented in previous sections to the EnvZ/OmpR Osmoregulatory
Signaling System in Escherichia Coli introduced in Example 1. This system was already analyzed
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], within the deterministic model. As reported in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an interesting question is whether the
initial concentrations of input species EnvZ and OmpR, which are represented in Figure 1 by X and Y,
respectively, impact on the long-run concentration of output species phosphorylated OmpR, represented
by YP. We can answer to that question by studying the robustness of systems with respect to an input
distance dd and an output distance dd defined so that they capture the diferences over X and Y, and
over YP, respectively.
      </p>
      <p>
        The analysis in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] concluded that the concentration of output species at steady state does not depend
on the initial value of input species. In other words, at steady state an original system and its perturbed
versions obtained by varying the initial concentration of input species X and Y do not exhibit any
diference on the concentration of output species YP. Our analysis will allow us to study the step-by-step
distances between the original system and the perturbed one, which are abstracted away in the steady
state analysis.
      </p>
      <p>
        Let us focus on system  in Equation 1, whose initial species concentration are the same as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and are reported in Table 3, together with the set ℛ of reactions and the reaction constants. We start
our analysis by studying the distances between  and two systems 1 and 2 that were obtained in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
by perturbing its initial state as follows: system 1 starts with X = 10 and Y = 50, system 2 starts
with X = 250 and Y = 1000. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] evidence is given that, at steady state, , 1 and 2 have the same
level of YP. By applying function Simulate in Figure 2, we can simulate the evolution of these systems.
For each system, we have applied  = 100 times the function Simulate with parameter ℎ = 15000.
In Figure 5 we report the step-by-step average value obtained for all relevant species, namely X, Y and
YP, and we highlight the values obtained for YP.
      </p>
      <p>By applying function Distance in Figure 4 we can estimate the evolution distance between the
original system  and the two perturbed ones. We define the input distance dd so that only the
weights associated with X and Y are positive, in particular we decided that both weights are 0.5 to
reflect that the two species have the same relevance. Then, we define the output distance dd so that
only the weight associated with YP is positive. In Table 4 we report the results obtained by applying
Distance with the following parameters:  = 50 runs for the original system, ℓ ·  = 100 runs for
(a) Evolution of X, Y, YP in 
(c) Evolution of X, Y, YP in 1
(e) Evolution of X, Y, YP in 2
(b) Evolution of YP in 
(d) Evolution of YP in 1
(f) Evolution of YP in 2
the perturbed systems, ℎ = 15000 steps for each run. For each perturbed system, in Table 4 we report:
(i) the input distance between the perturbed system and , which is simply computed by focusing
on the initial level of input species in  and the perturbed systems, and the minimal and maximal
level that are stored in m and M by Distance; (ii) the maximal output distance dd computed by
Distance in interval [7500, 15000]; (iii) a pointer to a figure describing the step-by-step evolution of
input and output distance between  and the perturbed system. Notice that dd(, 2) is one order
of magnitude higher than dd(, 1), whereas E(dd)[7500,15000](, 1) and E(dd)[7500,15000](, 2)
are close each other and are quite low numbers. This suggests that even if one changes the initial level
of X and Y in a light or a heavy way, the step-by-step variation of YP is light in all cases.</p>
      <p>System</p>
      <p>Initial X</p>
      <p>Initial Y
dd from</p>
      <p>E(dd)[7500,15000] from 
1
2</p>
      <p>However, in our setting, a more systematic analysis for studying how the initial concentrations of X
and Y in a system  influence the concentration of YP in a temporal interval  can be conducted as
follows: we fix the maximal input distance  1 between system  and its perturbed versions and, then,
we estimate for which  2 the system  is (dd, dd, 1 = [0, 0], ,  1,  2)-robust. More precisely, in
order to estimate  2, for a suitable  we sample  systems 1, . . . ,  satisfying E(dd)[0,0](, ) ≤  1
and we fix  2 = max=1,..., E(dd) (, ).</p>
      <p>
        We have considered five diferent values for  1: 0.3, 0.4, 0.5, 0.6, 0.7. For each choice for  1
we have sampled  = 20 systems at input distance dd from  bounded by  1. More precisely,
we decided to sample these 20 systems so that they are at a dd distance from  in the interval
( 1 − 0.1,  1]. Then, we have estimated the  2 for which  is (dd, dd, [0, 0], ,  1,  2)-robust, for
 = [7500, 15000]. In each experiment, we simulated runs consisting of ℎ = 15000 reactions, and we
considered  = 50 runs for the original system  and ℓ ·  = 100 runs for the  = 20 perturbed
(a) step-by-step values for dd(, 1) and
dd(, 1)
(b) step-by-step values for dd(, 2) and
dd(, 2)
systems. For each of the five choices for  1, in Table 5 we report the value  2 for which we have
estimated the (dd, dd, [0, 0], [7500, 15000],  1,  2)-robustness, a pointer to the picture showing the
step-by-step evolution of dd and dd between  and the perturbed system realizing  2 and, finally,
the initial concentration levels of X and Y of that system. Summarizing, at varying of  1 in [0.3, 0.7] we
get values for  2 that are close each other and are one order of magnitude lower than  1. This suggests
that , that was classified as robust in the steady state analysis in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], has a good level of robustness
also in the step-by-step approach. The analysis in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] allows one to conclude that at steady state the
level of YP does not depend on the initial level of X and Y, our analysis allows to conclude that if one
varies the initial level of YP then, step-by-step, the changes in YP are limited and smooth.
      </p>
      <p>The Python code used for the analysis can be found at https://github.com/dmanicardi/spebnr. For
each value for  1, the analysis took 150 minutes on a 1.70 GHz i7-1255U, with 16.00 GB RAM.
 1</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>We have proposed a notion of robustness for biochemical networks that, essentially, evaluates the ability
of a network to exhibit step-by-step limited variations on the quantity of a so called output species at
varying of the initial concentration of some so called input species.</p>
      <p>Recently, Robustness Temporal Logic [14] (RobTL) has been proposed for the specification and analysis
of distances between the behaviours of cyber-physical systems over a finite time horizon. Atomic
propositions are defined by means of two simple languages: one to specify the efect of perturbations
over an evolution sequence, and one to specify distance expressions between an evolution sequence
(a) dd and dd,  1 = 0.3
(c) dd and dd,  1 = 0.5
(e) dd and dd,  1 = 0.7
(b) dd and dd,  1 = 0.4</p>
      <p>(d) dd and dd,  1 = 0.6
and its perturbed version. In detail, atomic expressions are of the form Δ(exp, pert) ◁▷  , with
◁▷ ∈ {&lt;, ≤ , =, ̸=, &gt;, ≥} , and allow for comparing a threshold  with the distance, specified by an
expression exp, between an evolution sequence and its perturbed version, obtained by applying a
perturbation specified by pert, starting from a given time step. Boolean and temporal operators allows
for extending these evaluations to the entire evolution sequence. Then, the tool Stark [13] ofers:
(i) languages for specifying systems, perturbations, distance expressions, and RobTL formulae; (ii) a
module for the simulation of systems behaviours and their perturbed versions; (iii) a module for the
evaluation of distances between behaviours; (iv) a statistical model checker for RobTL formulae. An
interesting future work consists in extending RobTL in order to allow for specifying and analysing
properties of biochemical networks, and, then, enriching the Stark tool accordingly in order to use it
for the robustness analysis of biochemical networks.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments Bibliography</title>
      <p>This publication is part of the project NODES which has received funding from the MUR – M4C2 1.5 of
PNRR with grant agreement no. ECS00000036.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Uri</surname>
          </string-name>
          ,
          <article-title>An introduction to systems biology: design principles of biological circuits, Chapman</article-title>
          and Hall/CRC,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Doyle</surname>
          </string-name>
          , Essentials of robust control, Prentice-Hall,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lanotte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Merro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <article-title>Quantitative robustness analysis of sensor attacks on cyber-physical systems</article-title>
          ,
          <source>in: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control</source>
          ,
          <string-name>
            <surname>HSCC</surname>
          </string-name>
          <year>2023</year>
          , San Antonio, TX, USA, May 9-
          <issue>12</issue>
          ,
          <year>2023</year>
          , ACM,
          <year>2023</year>
          , pp.
          <volume>20</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          :
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kitano</surname>
          </string-name>
          ,
          <article-title>Towards a theory of biological robustness</article-title>
          ,
          <source>Molecular Systems Biology</source>
          <volume>3</volume>
          (
          <year>2007</year>
          )
          <fpage>137</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nasti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Milazzo</surname>
          </string-name>
          ,
          <article-title>Formalizing a notion of concentration robustness for biochemical networks</article-title>
          ,
          <source>in: Federation of International Conferences on Software Technologies: Applications and Foundations</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Gillespie</surname>
          </string-name>
          ,
          <article-title>Exact stochastic simulation of coupled chemical reactions</article-title>
          ,
          <source>The journal of physical chemistry 81</source>
          (
          <year>1977</year>
          )
          <fpage>2340</fpage>
          -
          <lpage>2361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Chu</surname>
          </string-name>
          , Introduction to modeling for biosciences, Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>