<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>PloS one 3 (2008) e1672. doi:10.1371/journal.pone.0001672.
[31] S. Klamt</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-319-16483-0_</article-id>
      <title-group>
        <article-title>Towards Flexible Criteria in the Revision of Boolean Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rafael Patronilo</string-name>
          <email>r.patronilo@campus.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Knorr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Leite</string-name>
          <email>jleite@fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NOVA LINCS, NOVA University Lisbon</institution>
          ,
          <addr-line>2829-516 Caparica</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>3</volume>
      <fpage>599</fpage>
      <lpage>612</lpage>
      <abstract>
        <p>Biological regulatory networks, and Boolean networks (BNs) in particular, are widely used to investigate the dynamics of complex cellular processes and chemical reactions. BNs model the interactions between compounds through regulatory functions that determine a compound's state based on the activating or inhibiting efects of others. This enables the analysis of system behavior under incomplete, imprecise, or noisy information through in silico experiments, supporting hypothesis testing and the prediction of experimental outcomes-e.g., in healthcare research. In this context, new experimental data may reveal inconsistencies in the current model, necessitating its revision to account for these observations. This revision process has traditionally been manual, laborious, and error-prone due to the vast space of possible modifications. To address this, ARBoLoM was recently introduced as a tool for the automated revision of BNs. It leverages Answer Set Programming (ASP) to eficiently verify consistency and, when needed, compute repairs based on time series of experimental results. The revision is guided by an ordered set of criteria for determining minimal modifications, where the first criterion is fixed using iterative deepening, and the remaining ones are handled via optimization in ASP. However, fixing the first criterion limits the flexibility of the tool, and it remains an open question whether this first fixed criterion reflects practical or biological relevance. In this paper, we address this limitation by extending ARBoLoM to allow for a fully flexible order of the criteria. This extension also enables us to concisely characterize the behavior of prior approaches to BN revision and confirm that our tool remains orders of magnitude faster while still producing correct revision results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Biological Regulatory Networks</kwd>
        <kwd>Revision</kwd>
        <kwd>Answer Set Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Over the past twenty years, systems biology has grown into a dynamic and influential discipline. It
integrates concepts from engineering, mathematics, physics, and computer science to construct models
that capture the complexity of biological phenomena. By adopting a comprehensive, system-level
approach, researchers aim to uncover insights that were previously out of reach [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A key aspect of
this area lies in the precise modeling of biological systems found in nature, with the overarching aim of
deepening our understanding of cellular processes. Such insights may pave the way for novel scientific
ifndings and conceptual frameworks regarding living systems.
      </p>
      <p>
        In this context, Biological Regulatory Networks (BRNs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are widely used to simulate the interactions
between compounds in a computer environment or in silico. Among these, continuous models work
directly with the concentration of each compound while logical models [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] represent each compound
as a Boolean variable, by measuring said concentration against a threshold. Though less concise, this is
often advantageous in the absence of experimental data granular enough to support continuous models.
      </p>
      <p>
        A prominent type of such logical models is Boolean logical models, also known as Boolean Networks
(BNs) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The interactions between compounds are represented as a directed graph, with an edge
indicating whether a compound is a regulator of another compound, that is, whether its presence
or absence influences the other compound. Each of these edges is labeled as a positive or negative
interaction.1 To allow more specific interactions to be described, each compound is usually accompanied
by a regulatory function, a logical formula determining the compound’s truth value according to the
truth values of the regulators afecting it. Such BNs have been successfully applied in as models of, e.g.,
gene regulation networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        It can happen that new experimental data is inconsistent with an existing BN, indicating one or
more regulatory functions require correcting. This revision process has typically been done manually,
but it can prove laborious and error-prone for large BNs. Unlike the automatic creation of BRNs
[
        <xref ref-type="bibr" rid="ref10 ref6 ref7 ref8 ref9">6, 7, 8, 9, 10</xref>
        ], revision has only recently started to draw wider attention. Among the attempts at
automating this revision process, most approaches are either limited on the type of repairs [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ] or
the complexity of the interactions described by the BN [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. ModRev [16] exceeds these limitations
and is additionally capable of handling incomplete experimental data. However, it can take a long time
to repair each BN, with the repair process often exceeding the predefined timeout of one hour and
consequentially failing. ARBoLoM [17] addresses this problem, combining a deepening search approach
with Answer Set Programming, in clingo [18], to find and optimize repairs. The tool proves orders of
magnitude faster than ModRev, succeeding at finding a repair for a regulatory function in a number of
real-life BNs often instantaneously and much before the previous timeout of one hour.
      </p>
      <p>There is, however, a notable shortcoming that afects both tools, albeit to diferent extents. For finding
optimal repairs, i.e., repairs that perform as few changes as possible to restore consistency, they employ
an ordered set of criteria to determine the best solution, which is not fully flexible, making it dificult
(or even impossible) for the user to apply either tool with some specific orders of the criteria. Notably,
it is not possible to assign the same order to the criteria on both tools, making their direct comparison
more challenging.</p>
      <p>ModRev distinguishes three types of repairs, in the following order of preference (which is fixed):</p>
      <sec id="sec-1-1">
        <title>1. Minimize changes to function regulators</title>
        <p>2. Minimize changes to regulator signs
3. Minimize the changes in the formula of the regulatory function</p>
        <p>ARBoLoM distinguishes between changes in the formula due to changes in the number of function
terms and changes in each term. ARBoLoM therefore distinguishes four types of criteria, in the following
default order of preference (where the order of all but the first one can be changed):</p>
      </sec>
      <sec id="sec-1-2">
        <title>1. Minimize the diference in the number of terms of the regulatory function 2. Minimize changes to function regulators 3. Minimize changes to regulator signs 4. Minimize changes in the format of each term.</title>
        <p>This lack of complete flexibility in specifying the order of the criteria is, arguably, a shortcoming,
as, to the best of our knowledge, no specific order has been identified as the preferred one to be used.
It would therefore be clearly preferable to allow the user to adjust the order to their needs without
limitation.</p>
        <p>In this paper, we amend this problem and extend ARBoLoM in such a way that the user gains the
desired flexibility on the choice of repair criteria, particularly on the order of preference for each type
of repair. We achieve that by removing the hard-coded fixation of the first criterion in the iterative
deepening search, introducing a parametrized solution which allows the user to indicate the order of
preference on the criteria. We also perform a detailed evaluation analysing whether (a) performance
is afected as such, (b) whether the order of criteria has an impact on the performance, and (c) how
ARBoLoM’s performance compares to that of ModRev when we adjust the optimization criteria to
exactly match the order used in ModRev.</p>
        <p>The remainder of this paper is structured as follows. We recall relevant material in Section 2 to make
the paper self-contained. Then, in Section 3 we introduce the novel version of ARBoLoM that allows
the flexible choice of the order of criteria for revision of BNs. We assess how well this new version
performs in Section 4, before we conclude in Section 5.</p>
        <p>+
 
 
_
+
+
_</p>
        <p> =  
+
    = ¬   ⋀  
    =     ⋁ (   ⋀ ¬  )</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>We briefly recall relevant notions and notation on Boolean Networks, Answer Set Programming, and
ARBoLoM as a tool of revision in Boolean Networks, following the presentation in [17].</p>
      <sec id="sec-2-1">
        <title>2.1. Boolean Networks</title>
        <p>A Boolean Regulatory Network, or Boolean Network (BN), represents a set of biological compounds
(including proteins or genes) and their interactions with each other modelling complex biological
processes. Such BN makes use of a regulatory graph, which is a directed graph  = (V, ), where
 = {1, ..., } is the set of vertices (nodes) representing the regulatory compounds, and  =
{(, , ) : ,  ∈ ,  ∈ {+, −}} is the set of signed edges representing the interactions between
compounds such that  does not contain (, , +) and (, , − ) for any ,  ∈  . Edges with  = +
are called positive interactions (or activation), i.e.,  activates , while edges with  = − are called
negative interactions (or inhibition), i.e.,  inhibits .Nodes with no incoming edges in  are called input
nodes. Such nodes represent external stimuli, and their values do not change over time.
Example 1. Fig. 1 shows regulatory graph  = (V, ) with  = {1, 2, 3, 4} and  = {(1, 2, − ),
(1, 3, +), (2, 1, +), (2, 3, +), (4, 2, +), (4, 3, − )}.</p>
        <p>Regulatory graphs alone do not specify how diferent compounds that afect the same node interact
with each other for that node’s activation. To that end, a regulatory function specifies, for each
compound, what combination of regulators produces an efect on a given compound. Then, a Boolean
logical model  of a regulatory network is defined as a tuple (V,  ) where  = {1, 2, ..., } is the
set of variables representing the regulatory compounds of the network such that  is assigned to a
value in {0, 1}, and  = {1 , 2 , ...,  } is the set of regulatory (Boolean) functions such that 
defines the value of  and where  =  if  is an input node. Regulatory functions of input nodes
may be omitted (cf. 4 in Fig. 1), but an explicit representation (4 = 4) is commonly used when
involving implementational aspects [19].</p>
        <p>Example 2. Fig. 1 presents a Boolean logical model with  from Ex. 1 and regulatory functions for  on
the right.</p>
        <p>
          Commonly, Blake canonical form (BCF), a special case of the disjunctive normal form, is used in
the literature [
          <xref ref-type="bibr" rid="ref12">12, 16, 17</xref>
          ] to represent the regulatory functions of each biological compound by the
disjunction of all its prime implicants [20]. A conjunction of literals is an implicant of a Boolean function
if, whenever the conjunction takes the value 1, so does the function, and an implicant is prime if no
subset of its literals forms itself an implicant [21]. BCF is unique up to reordering (of disjuncts), and since
no compound can occur both positively and negatively in a specific regulatory function by definition
of regulatory graphs, verification of prime implicants in such a function reduces to verifying that no
disjunct is contained in the other. E.g., 3 (Fig. 1) is in BCF and both disjuncts are prime implicants.
0100
0000
        </p>
        <p>The dynamics of BNs is captured in their state resulting from the interactions between compounds
over time. Formally, the network state of a BN with  compounds is a vector  = [1, 2, ...] where 
is the value of the variable representing the -th compound of the network. Clearly, for Boolean logical
models, the number of diferent states in a network is given by 2. E.g., if nodes 1 and 3 in Ex. 2 are
active, and the other two are not, then this state is represented by 1010.</p>
        <p>The evolution of states is captured in state transition graphs [22]. A State Transition Graph (STG) is
a directed graph   = (,  ) where  is the set of vertices representing all the diferent states of
the network, and  is the set of edges representing the viable transitions between states.</p>
        <p>Updates to the values of the nodes in a BN are applied commonly in two diferent ways: synchronous
and asynchronous [23, 24]. For synchronous update, at each time step, all compounds are updated
simultaneously. Each network state has exactly one successor. For asynchronous update, at each time
step, only one regulatory function may be applied. For a network of n compounds, each state can have
at most  possible state transitions (including a transition to itself - cf. Fig. 2).</p>
        <p>Boolean networks may enter so-called attractors during updates, i.e., sets of network states that form
a cycle. These cycles are linked to many important cellular processes [26]. Among them, stable states
are attractors that contain only a single state. An example for a stable state is 0000 on the left of Fig. 2.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Answer Set Programming</title>
        <p>Answer Set Programming (ASP) aims at solving dificult combinatorial (primarily NP-hard) search
problems [27]. It builds on the combination of a rich declarative knowledge representation language with
eficient solvers [ 28]. Here, we briefly recall essential notions and notations for the sake of readability
of the following material.</p>
        <p>We start with the syntax of basic normal logic programs  that are a finite set of rules  of the form:
0 ← 1, ..., ,  +1, ...,  .
(1)
where each  ( 0 ≤  ≤ , 0 ≤  ≤ ) is an atom. An atom is of the form (1, ..., ), with p a
predicate symbol of arity k, and 1, ...,  terms, built from variables and constants of the (implicit)
language of the program of r2. The atom to the left of the arrow (← ) is said to be the head of the rule,
whereas the (negated) atoms to the right of the arrow, corresponding to a conjunction, are the body
of the rule. We call literals both an atom  as well as its negation,   and we call 0 the head of r
(ℎ() = 0), and the set of all literals that occur in its body (also known as body literals) the body
of r (() = {1, ..., ,  +1, ...,  }). Rules with  =  = 0 are called facts and we
admit that 0 is ⊥ to represent constraints, and we commonly omit ← in the former case and ⊥ in the
latter. Facts allow us to express what necessarily must be true, while constraints provide us with the
possibility to express conditions that we do not want to hold. The ground instantiation of a program,
( ) is obtained by replacing all (first-order) variables with constants from the given language
2Actually, function symbols are also permitted, but care must be taken as unbounded recursion is not admitted in ASP.
observations
(ASP)
model
(ASP or Text)
process inputs
consistency
checking
clingo
consistent? no</p>
        <p>yes
inform user
repairing
clingo
repairs yes show repairs
found?</p>
        <p>no
inform user
terminate
of the program in all possible ways. A set of ground atoms  is called an answer set of ( ) if 
is the subset-minimal model of the reduct ( ) obtained as {0 ← 1, ...,  |  of the form (1)
in ( ) and {+1, . . . , } ∩  = ∅}.</p>
        <p>ASP also admits a number of commonly used language constructs, most of which can be expressed by
rules of form (1), but whose direct usage is much more convenient, and we indicate them briefly next.</p>
        <p>First, choices permit that a number of atoms may be part of an answer set. E.g., for
{(1); ℎ(1)} both (1) and ℎ(1) can be part of an answer set. As a
result, we obtain four diferent answer sets in this case, corresponding to the power set over the set of
possible elements. Then, constraints can be used to filter out certain models. We can also turn such a
choice into a cardinality constraint, providing lower and/or upper bounds on the number of atoms to
appear in each answer set. E.g., 1{(1); ℎ(1)}1 admits exactly two answer sets as in
each one exactly one of the two atoms can appear. Often, conditionals are employed, that allow us to
determine ranges. E.g., a choice {_(, ) : ()} would admit answer sets
with atoms over _/2 as long as its second argument is a compound.</p>
        <p>These constructs are also used in aggregates. E.g., #{ : _( 1, )} counts
all diferent values  that appear as second argument of the instances of _/2.
Optimization statements allow us not to just search for one (or more) solutions of a problem, but also
assign weights to solutions which can be used to order solutions and optimize in accordance with
these. E.g., #{1@2,  : _ℎ()} efectively sums the weights of all atoms over
_ℎ/1, where in this case, 1 is the weight of the atom and 2 the assigned priority (to be able
to combine several optimization criteria with priorities – the higher priority is considered first). Based
on that, the solver will aim to find the answer set with the lowest overall assigned weight.</p>
        <p>Finally, constants can be declared using # with a name and a default value. The latter can be
overwritten when running that program. Comparison operators are naturally allowed, and the infix
notation .. for numeric values or variables representing numeric values is shorthand for all the
values between and including  and  .</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. ARBoLoM</title>
        <p>ARBoLoM [17] has been developed for revision of BNs given a set of experimental results. The main
idea is to check for consistency of the current model comparing its dynamic behaviour, i.e., what states
are observed on what inputs, with the experimental observations. If the model’s behaviour replicates
these, then the model is consistent, otherwise it is inconsistent. In the latter case, repair is applied
adjusting the functions whose results do not match the observations.</p>
        <p>As shown in Fig. 3, ARBoLoM is written in Python to handle the processing of data and integration
of the diferent components and employs ASP encodings for consistency checking and repair using the
state-of-the-art ASP tool clingo [18]. ARBoLoM expects a set of (possibly incomplete) experimental
observations and the specification of a BN. The former is assumed to be in a specific format, but any
diferent representation of states of activations of the relevant compounds can be converted to it easily.
The latter is expected to be either already encoded in ASP or in one of the standard formats, which is
converted automatically by the tool into the required ASP format.</p>
        <p>For checking consistency (and repair), three diferent modes of interaction between compounds are
considered, namely stable states (without any time component), and synchronous and asynchronous
updates, and three slightly varying ASP encodings exist. For the details, we refer to Aleixo et al. [17].3</p>
        <p>The functions that are inconsistent need to be repaired by modifying one by one each inconsistent
regulatory function. This requires again as input the representations of the network and the observations.
The considered modifications are adding and removing regulators, changing regulators from activator
to inhibitor and vice-versa, and changing the function’s format, i.e., changing the number of terms as
well as their composition. This results in a large variety of possible repaired functions. ARBoLoM aims
at finding repairs that are as close as possible to the original functions in the spirit of minimal change,
using the following order of priority:</p>
        <sec id="sec-2-3-1">
          <title>1. Minimize the changes to the number of function terms;</title>
          <p>2. Minimize the changes to the function’s regulators;
3. Minimize the changes to the signs of regulators;
4. Minimize the changes to the format of each term.</p>
          <p>While the order of criteria 2.–4. is easily adjustable (as we will see), criterion 1. is not, because ARBoLoM
employs iterative deepening search on the number of changes to the number of function terms first.
The central idea is to start searching for solutions with the same number of function terms as the
function to be repaired, and if no solution is found, simultaneously, we search for solutions with one
more and one less function term in parallel, determining the solution in accordance with the other
criteria if both solutions exist, picking the single one if one of the cases fails, and repeat the process
otherwise incrementing the distance w.r.t. number of terms present. The iteration is naturally limited
by the lower bound of 0 function terms and by an upper bound determined based on the number of
positive observations. This is possible because, in the limit, each positive observation is only covered
by a single term in the function, so if we cannot find a function with at most as many terms as existing
distinct observations (for the function to be repaired), then no such candidate can exist. This is further
improved by not counting identical transitions of such observations.</p>
          <p>The details for the respective ASP encodings can also be found in [17], but since, unlike the encodings
for consistency checking, we revise these, we will discuss them with more detail next. 4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Flexible Revision of Boolean Networks</title>
      <p>Out of the four possible diferent repair operations distinguished in ARBoLoM, for the final three, their
order of priority can be easily changed. The minimization of these is handled in ASP with optimization
statements, each defined with a priority value corresponding to the order given in the end of the
previous section. Reordering these criteria amounts to changing the priority value in these statements.
For the first criterion however, which minimizes the change in the number of terms of the regulatory
function, the optimization is handled by the iterative deepening search guided within the python code
and therefore cannot be reordered in this way. This is a limitation, since a biologist might prefer a
diferent order of preference for these criteria, which for a particular use case is more realistic. For
example, if we know that a compound has been revealed to be relevant for a process in whose model
it currently does not occur, then it needs to be added to the model in the revision process, and this
requires adjusting the priority of criteria. But such modification cannot be achieved without substantial
modifications to the implementation of the tool.</p>
      <p>Therefore, we tackle this problem here and extend ARBoLoM to allow more flexibility on the choice
of repair criteria, particularly in the order of preference for each type of repair. The main idea is to
turn the criterion used for iterative deepening into one handled by ASP optimization. This allows
parameterized optimization of all criteria in accordance with the preference of the user. We should note
that this might afect overall performance, and we will analise this subsequently together with other
important aspects in Sec. 4.
3The complete encodings can be found at https://github.com/fpaleixo/arbolom/tree/main/encodings/consistency.
4Still, the original complete encodings can be found at https://github.com/fpaleixo/arbolom/tree/main/encodings/repairs.</p>
      <p>Listing 1: Simplified ASP encoding for repair with stable state observations.
also the necessary update aspects due to the dynamics, which are not essential for our modifications.
5 In the following, we briefly describe the encoding and the main modifications, starting by briefly
recalling the input format [17].</p>
      <p>The regulatory graph  = (, ) is encoded as a set of facts over /1 to represent all
elements in  and over /3 s.t. (, , ) represents that compound  regulates 
with the sign  (where  = 0 encodes an activator, and  = 1 an inhibitor). For regulatory functions,
we use predicates  /2 and /3 to represent the number of their terms and the constituting
terms themselves, respectively. E.g., function(v1, 1) represents that 1’s regulatory function has one
term, while term(v1, 1, v2) represents that term 1 of 1’s regulatory function contains 2. This notation
is a bit less compact, but facilitates specific modifications.</p>
      <p>The observations are encoded by means of predicates, /1 and /3. The
former is used to represent the id of an experiment, while observation(e, c, s) represents the observed
state  (0 for inactive, 1 for active) of compound  of experiment . Note that, for updates where the
network does change its state over time, an additional argument for discrete time  is required.</p>
      <p>The repair encoding itself first introduces two constants as parameters (Lines 1–2), one for the
compound whose function is to be repaired, the other for the upper bound of the number of terms
(called nodes here). Then, we incorporate additional knowledge on biologically mandatory known
regulators (possibly including the fixed signs, i.e., whether they are known to be activating or inhibiting)
(Lines 3–8), and for those not fixed whether they are activating or inhibiting (Line 9). Then, we determine
the number of available nodes (i.e., conjunctions of regulators) picking the larger number among the
current number of terms and the given parameter (Lines 10–11). For each such node, we generate
possible regulators (Line 12), and turn a node efective if it has at least one regulator associated to it
(Line 13). Then, we ensure that the fixed regulators indeed appear in some node (Line 14), and that
the function be in BCF (Line 15–17) verifying that the node is not fully contained in another node, by
assuring its number of compounds difers from the number of compounds in common with another node.
5All new encodings can be found at https://github.com/rafael-patronilo/arbolom/tree/terms-criterion-asp/encodings/repairs.</p>
      <p>Listing 2: New ASP minimization statements with parameterized priority.
We identify nodes as negative if one of its activators is inactive (Line 18) or one of its inhibitors active
(Line 19), and as positive if it is not negative (Line 20–21). Likewise, we determine which regulators
appear in the original function and in the modified one (Lines 22–23) and which compounds exhibit
changes in sign between the original and the modified function (Lines 24–25). Finally, we discard all
solutions for which an experimental observation of the considered compound difers from the value
computed by the modified function (Lines 26–27).</p>
      <p>Let us discuss the major changes applied to this encoding. First and foremost, it difers conceptionally.
Namely, it no longer includes the minimization statements directly, since these are formatted and added
by python code, allowing their priority to be set at runtime. We explain how this works in a moment.</p>
      <p>This principal modification causes a significant change in this encoding relative to the original one
of ARBoLoM, required for handling the minimization of the change in the number of terms. We have
removed the restriction which fixated the number of nodes of the functions to consider. This restriction
was required by the deepening search to restrict the search space to the number of nodes being currently
considered. As this minimization of the change in the number of nodes has been moved to the ASP
program, this is neither required nor desirable as we wish for the search space to contain all possible
functions regardless of the number of nodes. The constant max_node_number (formerly node_number)
now represents the upper bound for the number of the terms rather than an exact restriction. This upper
bound was already calculated in ARBoLoM to restrict the range of the deepening search as described in
Section 2.3. We still calculate this value in the same way, by counting unique positive observations of
the compound being repaired. We note that equivalent modifications were applied to the encodings for
synchronous and asynchronous updates.</p>
      <p>Regarding the minimization of changes, the tool admits now as additional input the parameters for
the priority of criteria. This can be used within the Python code to prepare a number of optimization
statements to be added to the ASP encoding. Listing 2 shows an abstract presentation of the minimization
statements added by the python code, each with the placeholder P which is substituted with the correct
priority for each statement according to the user’s selection. Note that this is indeed an abstracted
representation, and each of the lines can be associated with a diferent value, permitting in principle
a very flexible attribution of priorities. Our focus though is to assign these in accordance with the
characteristics of modifications considered in the previous sections and also used as separating comments
in the listing.</p>
      <p>In more detail, we have added two new minimization statements which count and minimize changes in
the number of terms for both addition (Line 2) and removal (Line 3). This accounts for the minimization
of changes previously handled by the iterative deepening process. Lines 5 and 6 account for changes
in the number of regulators and allows their minimization, while Line 8 captures minimization of
sign changes of the involved regulators. Finally, Lines 10 and 11 handle minimization in changes of
terms, by accounting for regulators that have been added to a node or removed from it. The resulting
minimized numbers of modifications are also extracted from the results and can be used for analysis
and comparison.</p>
      <p>In addition, we also modified the minimize statements in Lines 5, 6, 10 and 11 in the sense of
internalising former auxiliary predicates inside of these minimize statements. While this change has
Abbreviation</p>
      <p>MCC</p>
      <p>FY
TCR
SP
Th
no efect on the result of the optimization itself, this kind of encoding commonly improves processing
times in search with ASP and may thus impact positively overall performance of the repair program.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We now assess in a detailed manner how these modifications afect the overall performance, studying
in particular to what extent changing the order of preference itself impacts on performance and how
ARBoLoM with a preference order of criteria aligned with that of ModRev compares with ModRev.</p>
      <p>To that end, we have taken advantage of the benchmarks employed before for ARBoLoM [17] which
were generated based on the same methodology used in [29]. Essentially real BNs are considered and
corrupted, and sets of observations, generated from the original BN, are used to repair its corrupted
version. The BNs are corrupted by applying four operations, with varying probability :
• Altering each regulatory function with a probability  (0.1 to 1), changing the number of terms
and/or the format of the terms of the function.
• Altering the sign of each edge (regulator) in the function with a probability  (0.05 to 0.75).
• Removing a regulator from the function with probability  (0.01 to 0.15).</p>
      <p>• Adding a new regulator to the function with probability  (0.01 to 0.15)</p>
      <p>Based on that, 24 diferent corruption configurations were defined, varying both the available
corruption operations and respective probabilities. Then, five benchmarks were created using real
biological models: MCC [23], FY [30], TCR [31], SP [32], and Th [33] – some information on these is
summarized in Table 1. For each configuration 100 corrupted models were generated, resulting in a
total of 2400 models for each benchmark. Aside the corrupted models, each benchmark includes four
sets of observations generated using the original model, ranging from a set with 1 experiment with 3
timepoints to a set with 5 experiments with 20 timepoints.</p>
      <p>In addition, to test the tool on a larger network, a benchmark using a model that combines all the
ifve real models was created – its corresponding data can be inferred from Table 1 as these 5 models
are distinct for all but one compound shared between two BNs. Even though this network is thus a bit
disconnected, this still provides a challenge, as the corruptions are generated arbitrarily, and possible
ifxes need to inspect the entire search space. Given the size of this new network, in this benchmark only
10 corrupted models for each of the 24 configurations were generated and only one set of observations
with 5 experiments and 20 timepoints.</p>
      <p>Out of the six benchmarks the original ARBoLoM was tested on, so far we have evaluated the new
version of the tool on two of them: the TCR benchmark, which is the largest (by number of compounds)
among the 5 original networks and consequentially the model whose benchmark takes longer to repair
on average; and the benchmark combining all five models, which we will refer to as All5. We ran our
tests on a Linux machine with an AMD Ryzen 7 9700X CPU and 32GB of RAM.</p>
      <p>We tested five orders of priority which we consider significant, analyzing both the time the revision
process takes and the cost of each criterion in the optimal repairs, that is the number of changes of each
type applied in the repaired function. Table 2 shows the mean, min and max costs of each criterion
considering each of the five orders of priority. Note that the order A is equivalent to the order used
in the original ARBoLoM. Orders B, C and D are obtained by moving the criterion with the highest
priority in the previous order to the lowest priority, therefore cycling the highest priority between all
criteria. Order E is diferent. It is equivalent to the order used in ModRev [ 16], that is, it first prioritizes
changes to the regulators, second changes to the signs of the regulators and lastly changes to the format
of the regulatory function, be it term number or format of each term. This last order is therefore similar
to order B with the diference that the same priority is assigned to reducing the change in term number
and the change in the format of each term.</p>
      <p>We can see that the criterion with the highest priority always has the lowest average cost, indicating
that the prioritization works as intended. This is however not always the case for the other criteria. For
example, in the order A, the average cost of the change in signs is higher than the cost in the change to
term format, despite signs having a higher priority. This can be explained by the fact that the space of
possible repairs has already been limited by the first and second criteria, and therefore it may not be
possible to pick a set of repairs with fewer changes in signs than in the term format.</p>
      <p>Table 3 shows the times for the whole revision process, consistency checking and repairing for each
of the five orders of priority listed in Table 2. Note that we include the consistency checking time
merely for context when analyzing the total revision time. Consistency checking should not depend on
the order of the criteria as this is a parameter of the repair process, and the small diferences in the
respective times between diferent orders are therefore most likely related to circumstantial factors of
the testing environment. Variations in the repair time however can be related to the choice of criteria,
but even these do not show much variation. Looking at the maximum times, we see that the tool seems
to take under 5 seconds for any model, regardless of the interaction mode or order of priority in the
repair criteria, and, on average for all cases, revision is performed in less than 0.5 seconds. This shows
that for TCR, and therefore likely for the other four smaller real models as well, this new version of
ARBoLoM is as eficient as the original [ 17], with the advantage of flexibility in the order of the criteria.</p>
      <p>We were also interested in understanding how our new solution performs on a larger model. Table 4
shows the times for the benchmark All5 in asynchronous mode, using each of the five orders of priority
we have established. We note that times overall increase, but results are still obtained quickly with
average repair times varying between 1 and 4 seconds and maximum values between 7 and 33 seconds.
Notably, among the 5 orders, the one corresponding to that previously employed by ARBoLoM (order
A) turns out to be the slowest, followed by the one used by ModRev (order E). We deem that, for A, this
results from the fact that there is in principle a large number of nodes to be added, and the optimization
process takes more time due to that. For E, we conjecture that this is influenced by the two conjoined
criteria, as mixing them might make it more complicated to distinguish between diferent solutions.</p>
      <p>Still, this shows that the solution employed by ARBoLoM is clearly superior to that of ModRev, even
when using the same criteria, as revision always succeeds and with, on average, interactive response
times. ModRev, on the other hand, only succeeds in 79% of the cases and takes on average around 105
seconds on those instances it successfully revises, even if on a slightly slower system [17].</p>
      <p>We also include on the right of Table 4, for order A, the times for the former version of ARBoLoM
using deepening search. On average, deepening search proves to be over two times as fast and on the
worst case it proves four times and over 20 seconds faster. Despite the new version being notably slower,
in exchange we obtain the flexibility, and even for this large BN, we are able to provide repairs well
below a minute in asynchronous mode for all models, regardless of the order of priority for the criteria.</p>
      <p>We cannot provide the same reassurance for synchronous mode. Our experiments so far show that
synchronous mode poses a significant challenge, with the repair of multiple compounds timing out.
All these compounds have a high max number of nodes (terms) relative to the compounds which get
repaired within reasonable time. Recall that this upper bound for the number of terms in the repaired
function is calculated by counting the number of unique positive observations of the compound. This is
a limitation of the new version over the former version, which uses deepening search to progressively
try diferent numbers of terms and therefore is not susceptible to problems related to a large upper
limit. We are currently studying possible strategies to mitigate this issue. One easy solution would
be to use the deepening search strategy whenever it is possible, so we can at least provide the same
performance in those situations, while retaining the flexibility of criteria otherwise.</p>
      <p>Finally, as described in Section 3, we modified existing ASP minimization criteria, internalising
auxiliary predicates into the minimize statements, with the goal of improving performance. To understand
if and how much our changes to the existing ASP minimization criteria afect the performance, we
transposed these changes to the former version of ARBoLoM (with the deepening search) and test on
the benchmark All5 in synchronous mode, which, due to the overall expected longer revision times,
should allow us to give a better idea of the impact on performance. Table 5 shows the times for both
versions, both obtained in the same testing environment. Surprisingly, there is no significant diference
between both versions, and we will have to investigate this matter further.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>We have introduced a new version of ARBoLoM, a tool for the revision of Boolean Networks given a
set of experimental observations. The main improvement lies in the new capacity for the user to be
able to freely choose the preference order of application of optimization criteria, employed to ensure
that revision is done with minimal changes. Our assessment shows that this is indeed viable in the
sense that revision of a real BN, for several variants of preference orders, can be done very fast. This
includes the specific order employed by another tool, ModRev [ 16], validating the previous assessment
[17], now for identical optimization criteria, that ARBoLoM’s solution based on ASP is indeed superior
by a considerable margin.</p>
      <p>We have however also seen, that for larger models, dropping the iterative deepening search in general
has a negative impact on performance, in particular if the search space in terms of possible nodes to be
added increases. We plan to incorporate the previous solution of ARBoLoM for this particular choice of
criteria to take advantage of the improved performance. We also will do more exhaustive experiments
with a wider set of real BNs, aiming to confirm our positive results here, and to tackle open issues
including the ongoing study of the impact of diferent forms of encoding the optimizations.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank the anonymous reviewers for their valuable feedback. This work is supported by project
BIOREVISE (2023.13327.PEX) and by UID/04516/NOVA Laboratory for Computer Science and Informatics
(NOVA LINCS) with the financial support of FCT.IP.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The author(s) have not employed any generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Radulescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Borgne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Veber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ouy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lagarrigue</surname>
          </string-name>
          ,
          <article-title>Qualitative analysis of the relation between DNA microarray data and behavioral models of regulation networks</article-title>
          ,
          <source>Biosystems</source>
          <volume>84</volume>
          (
          <year>2006</year>
          )
          <fpage>153</fpage>
          -
          <lpage>174</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.biosystems.
          <year>2005</year>
          .
          <volume>10</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jong</surname>
          </string-name>
          ,
          <article-title>Modeling and simulation of genetic regulatory systems: A literature review</article-title>
          ,
          <source>Journal of computational biology : a journal of computational molecular cell biology</source>
          <volume>9</volume>
          (
          <year>2002</year>
          )
          <fpage>67</fpage>
          -
          <lpage>103</lpage>
          . doi:
          <volume>10</volume>
          .1089/10665270252833208.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Glass</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. A</surname>
          </string-name>
          . Kaufman,
          <article-title>The logical analysis of continuous, non-linear biochemical control networks</article-title>
          ,
          <source>Journal of Theoretical Biology</source>
          <volume>39</volume>
          (
          <year>1973</year>
          )
          <fpage>103</fpage>
          -
          <lpage>129</lpage>
          . doi:https://doi.org/10.1016/
          <fpage>0022</fpage>
          -
          <lpage>5193</lpage>
          (
          <issue>73</issue>
          )
          <fpage>90208</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <article-title>Boolean formalization of genetic control circuits</article-title>
          ,
          <source>Journal of Theoretical Biology</source>
          <volume>42</volume>
          (
          <year>1973</year>
          )
          <fpage>563</fpage>
          -
          <lpage>585</lpage>
          . doi:https://doi.org/10.1016/
          <fpage>0022</fpage>
          -
          <lpage>5193</lpage>
          (
          <issue>73</issue>
          )
          <fpage>90247</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Salinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aracena</surname>
          </string-name>
          ,
          <article-title>Existence and non existence of limit cycles in boolean networks</article-title>
          ,
          <source>in: Automata and Complexity</source>
          , volume
          <volume>42</volume>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>252</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>030</fpage>
          -92551-2_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostrowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Paulevé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guziolowski</surname>
          </string-name>
          ,
          <article-title>Boolean network identification from perturbation time series data combining dynamics abstraction and logic programming</article-title>
          ,
          <source>Biosystems</source>
          <volume>149</volume>
          (
          <year>2016</year>
          )
          <fpage>139</fpage>
          -
          <lpage>153</lpage>
          . doi:https://doi.org/10.1016/j.biosystems.
          <year>2016</year>
          .
          <volume>07</volume>
          .009,
          <article-title>selected papers from the Computational Methods in Systems Biology 2015 conference</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Réda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilczyński</surname>
          </string-name>
          ,
          <article-title>Automated inference of gene regulatory networks using explicit regulatory modules</article-title>
          ,
          <source>Journal of Theoretical Biology</source>
          <volume>486</volume>
          (
          <year>2020</year>
          )
          <article-title>110091</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.jtbi.
          <year>2019</year>
          .
          <volume>110091</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mitsos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. N.</given-names>
            <surname>Melas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Siminelakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Chairakaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saez-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Alexopoulos</surname>
          </string-name>
          ,
          <article-title>Identifying drug efects via pathway alterations using an integer linear programming optimization formulation on phosphoproteomic data</article-title>
          ,
          <source>PLoS Computational Biology</source>
          <volume>5</volume>
          (
          <year>2009</year>
          )
          <article-title>e1000591</article-title>
          . doi:
          <volume>10</volume>
          . 1371/journal.pcbi.
          <volume>1000591</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Videla</surname>
          </string-name>
          ,
          <source>Reasoning on the Response of Logical Signaling Networks with ASP</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .1002/9781119005223.ch2.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Videla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guziolowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Eduati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Grabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saez-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <article-title>Revisiting the training of logic models of protein signaling networks with ASP</article-title>
          ,
          <source>in: Computational Methods in Systems Biology</source>
          , Springer Berlin Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>342</fpage>
          -
          <lpage>361</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>642</fpage>
          -33636-2_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Merhej</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schockaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Cock</surname>
          </string-name>
          ,
          <article-title>Repairing inconsistent answer set programs using rules of thumb: A gene regulatory networks case study</article-title>
          ,
          <source>International Journal of Approximate Reasoning</source>
          <volume>83</volume>
          (
          <year>2017</year>
          )
          <fpage>243</fpage>
          -
          <lpage>264</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.ijar.
          <year>2017</year>
          .
          <volume>01</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lemos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Lynce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monteiro</surname>
          </string-name>
          ,
          <article-title>Repairing boolean logical models from time-series data using answer set programming</article-title>
          ,
          <source>Algorithms for Molecular Biology</source>
          <volume>14</volume>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          . doi:
          <volume>10</volume>
          .1186/ s13015-019-0145-8.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <article-title>Reachability Analysis and Revision of Dynamics of Biological Regulatory Networks</article-title>
          ,
          <source>Ph.D. thesis</source>
          , École centrale de Nantes,
          <year>2019</year>
          . URL: https://theses.hal.science/tel-02145438/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gebser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guziolowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ivanchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Veber</surname>
          </string-name>
          ,
          <article-title>Repair and prediction (under inconsistency) in large biological networks with answer set programming</article-title>
          ,
          <source>Principles of Knowledge Representation and Reasoning: Proceedings of the 12th International Conference, KR</source>
          <year>2010</year>
          (
          <year>2010</year>
          ). URL: https://cdn.aaai.org/ocs/1334/1334-7439-1-PB.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mobilia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rocca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chorlton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fanchon</surname>
          </string-name>
          , L. Trilling,
          <article-title>Logical modeling and analysis of regulatory genetic networks in a non monotonic framework</article-title>
          , in: Bioinformatics and Biomedical Engineer-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>