1. Introduction

PloS one 3 (2008) e1672. doi:10.1371/journal.pone.0001672. [31] S. Klamt

10.1007/978-3-319-16483-0_

Towards Flexible Criteria in the Revision of Boolean Networks

Rafael Patronilo

r.patronilo@campus.fct.unl.pt 0

Matthias Knorr

João Leite

jleite@fct.unl.pt 0 0 NOVA LINCS, NOVA University Lisbon , 2829-516 Caparica , Portugal

2015

3 599 612

Biological regulatory networks, and Boolean networks (BNs) in particular, are widely used to investigate the dynamics of complex cellular processes and chemical reactions. BNs model the interactions between compounds through regulatory functions that determine a compound's state based on the activating or inhibiting efects of others. This enables the analysis of system behavior under incomplete, imprecise, or noisy information through in silico experiments, supporting hypothesis testing and the prediction of experimental outcomes-e.g., in healthcare research. In this context, new experimental data may reveal inconsistencies in the current model, necessitating its revision to account for these observations. This revision process has traditionally been manual, laborious, and error-prone due to the vast space of possible modifications. To address this, ARBoLoM was recently introduced as a tool for the automated revision of BNs. It leverages Answer Set Programming (ASP) to eficiently verify consistency and, when needed, compute repairs based on time series of experimental results. The revision is guided by an ordered set of criteria for determining minimal modifications, where the first criterion is fixed using iterative deepening, and the remaining ones are handled via optimization in ASP. However, fixing the first criterion limits the flexibility of the tool, and it remains an open question whether this first fixed criterion reflects practical or biological relevance. In this paper, we address this limitation by extending ARBoLoM to allow for a fully flexible order of the criteria. This extension also enables us to concisely characterize the behavior of prior approaches to BN revision and confirm that our tool remains orders of magnitude faster while still producing correct revision results.

eol>Biological Regulatory Networks Revision Answer Set Programming

1. Introduction

Over the past twenty years, systems biology has grown into a dynamic and influential discipline. It integrates concepts from engineering, mathematics, physics, and computer science to construct models that capture the complexity of biological phenomena. By adopting a comprehensive, system-level approach, researchers aim to uncover insights that were previously out of reach [ 1 ]. A key aspect of this area lies in the precise modeling of biological systems found in nature, with the overarching aim of deepening our understanding of cellular processes. Such insights may pave the way for novel scientific ifndings and conceptual frameworks regarding living systems.

In this context, Biological Regulatory Networks (BRNs) [ 2 ] are widely used to simulate the interactions between compounds in a computer environment or in silico. Among these, continuous models work directly with the concentration of each compound while logical models [ 3, 4 ] represent each compound as a Boolean variable, by measuring said concentration against a threshold. Though less concise, this is often advantageous in the absence of experimental data granular enough to support continuous models.

A prominent type of such logical models is Boolean logical models, also known as Boolean Networks (BNs) [ 5 ]. The interactions between compounds are represented as a directed graph, with an edge indicating whether a compound is a regulator of another compound, that is, whether its presence or absence influences the other compound. Each of these edges is labeled as a positive or negative interaction.1 To allow more specific interactions to be described, each compound is usually accompanied by a regulatory function, a logical formula determining the compound’s truth value according to the truth values of the regulators afecting it. Such BNs have been successfully applied in as models of, e.g., gene regulation networks [ 5 ].

It can happen that new experimental data is inconsistent with an existing BN, indicating one or more regulatory functions require correcting. This revision process has typically been done manually, but it can prove laborious and error-prone for large BNs. Unlike the automatic creation of BRNs [ 6, 7, 8, 9, 10 ], revision has only recently started to draw wider attention. Among the attempts at automating this revision process, most approaches are either limited on the type of repairs [ 11, 12, 13 ] or the complexity of the interactions described by the BN [ 14, 15 ]. ModRev [16] exceeds these limitations and is additionally capable of handling incomplete experimental data. However, it can take a long time to repair each BN, with the repair process often exceeding the predefined timeout of one hour and consequentially failing. ARBoLoM [17] addresses this problem, combining a deepening search approach with Answer Set Programming, in clingo [18], to find and optimize repairs. The tool proves orders of magnitude faster than ModRev, succeeding at finding a repair for a regulatory function in a number of real-life BNs often instantaneously and much before the previous timeout of one hour.

There is, however, a notable shortcoming that afects both tools, albeit to diferent extents. For finding optimal repairs, i.e., repairs that perform as few changes as possible to restore consistency, they employ an ordered set of criteria to determine the best solution, which is not fully flexible, making it dificult (or even impossible) for the user to apply either tool with some specific orders of the criteria. Notably, it is not possible to assign the same order to the criteria on both tools, making their direct comparison more challenging.

ModRev distinguishes three types of repairs, in the following order of preference (which is fixed):

1. Minimize changes to function regulators

2. Minimize changes to regulator signs 3. Minimize the changes in the formula of the regulatory function

ARBoLoM distinguishes between changes in the formula due to changes in the number of function terms and changes in each term. ARBoLoM therefore distinguishes four types of criteria, in the following default order of preference (where the order of all but the first one can be changed):

1. Minimize the diference in the number of terms of the regulatory function 2. Minimize changes to function regulators 3. Minimize changes to regulator signs 4. Minimize changes in the format of each term.

This lack of complete flexibility in specifying the order of the criteria is, arguably, a shortcoming, as, to the best of our knowledge, no specific order has been identified as the preferred one to be used. It would therefore be clearly preferable to allow the user to adjust the order to their needs without limitation.

In this paper, we amend this problem and extend ARBoLoM in such a way that the user gains the desired flexibility on the choice of repair criteria, particularly on the order of preference for each type of repair. We achieve that by removing the hard-coded fixation of the first criterion in the iterative deepening search, introducing a parametrized solution which allows the user to indicate the order of preference on the criteria. We also perform a detailed evaluation analysing whether (a) performance is afected as such, (b) whether the order of criteria has an impact on the performance, and (c) how ARBoLoM’s performance compares to that of ModRev when we adjust the optimization criteria to exactly match the order used in ModRev.

The remainder of this paper is structured as follows. We recall relevant material in Section 2 to make the paper self-contained. Then, in Section 3 we introduce the novel version of ARBoLoM that allows the flexible choice of the order of criteria for revision of BNs. We assess how well this new version performs in Section 4, before we conclude in Section 5.

+ _ + + _

= + = ¬ ⋀ = ⋁ ( ⋀ ¬ )

2. Background

We briefly recall relevant notions and notation on Boolean Networks, Answer Set Programming, and ARBoLoM as a tool of revision in Boolean Networks, following the presentation in [17].

2.1. Boolean Networks

A Boolean Regulatory Network, or Boolean Network (BN), represents a set of biological compounds (including proteins or genes) and their interactions with each other modelling complex biological processes. Such BN makes use of a regulatory graph, which is a directed graph = (V, ), where = {1, ..., } is the set of vertices (nodes) representing the regulatory compounds, and = {(, , ) : , ∈ , ∈ {+, −}} is the set of signed edges representing the interactions between compounds such that does not contain (, , +) and (, , − ) for any , ∈ . Edges with = + are called positive interactions (or activation), i.e., activates , while edges with = − are called negative interactions (or inhibition), i.e., inhibits .Nodes with no incoming edges in are called input nodes. Such nodes represent external stimuli, and their values do not change over time. Example 1. Fig. 1 shows regulatory graph = (V, ) with = {1, 2, 3, 4} and = {(1, 2, − ), (1, 3, +), (2, 1, +), (2, 3, +), (4, 2, +), (4, 3, − )}.

Regulatory graphs alone do not specify how diferent compounds that afect the same node interact with each other for that node’s activation. To that end, a regulatory function specifies, for each compound, what combination of regulators produces an efect on a given compound. Then, a Boolean logical model of a regulatory network is defined as a tuple (V, ) where = {1, 2, ..., } is the set of variables representing the regulatory compounds of the network such that is assigned to a value in {0, 1}, and = {1 , 2 , ..., } is the set of regulatory (Boolean) functions such that defines the value of and where = if is an input node. Regulatory functions of input nodes may be omitted (cf. 4 in Fig. 1), but an explicit representation (4 = 4) is commonly used when involving implementational aspects [19].

Example 2. Fig. 1 presents a Boolean logical model with from Ex. 1 and regulatory functions for on the right.

Commonly, Blake canonical form (BCF), a special case of the disjunctive normal form, is used in the literature [ 12, 16, 17 ] to represent the regulatory functions of each biological compound by the disjunction of all its prime implicants [20]. A conjunction of literals is an implicant of a Boolean function if, whenever the conjunction takes the value 1, so does the function, and an implicant is prime if no subset of its literals forms itself an implicant [21]. BCF is unique up to reordering (of disjuncts), and since no compound can occur both positively and negatively in a specific regulatory function by definition of regulatory graphs, verification of prime implicants in such a function reduces to verifying that no disjunct is contained in the other. E.g., 3 (Fig. 1) is in BCF and both disjuncts are prime implicants. 0100 0000

The dynamics of BNs is captured in their state resulting from the interactions between compounds over time. Formally, the network state of a BN with compounds is a vector = [1, 2, ...] where is the value of the variable representing the -th compound of the network. Clearly, for Boolean logical models, the number of diferent states in a network is given by 2. E.g., if nodes 1 and 3 in Ex. 2 are active, and the other two are not, then this state is represented by 1010.

The evolution of states is captured in state transition graphs [22]. A State Transition Graph (STG) is a directed graph = (, ) where is the set of vertices representing all the diferent states of the network, and is the set of edges representing the viable transitions between states.

Updates to the values of the nodes in a BN are applied commonly in two diferent ways: synchronous and asynchronous [23, 24]. For synchronous update, at each time step, all compounds are updated simultaneously. Each network state has exactly one successor. For asynchronous update, at each time step, only one regulatory function may be applied. For a network of n compounds, each state can have at most possible state transitions (including a transition to itself - cf. Fig. 2).

Boolean networks may enter so-called attractors during updates, i.e., sets of network states that form a cycle. These cycles are linked to many important cellular processes [26]. Among them, stable states are attractors that contain only a single state. An example for a stable state is 0000 on the left of Fig. 2.

2.2. Answer Set Programming

Answer Set Programming (ASP) aims at solving dificult combinatorial (primarily NP-hard) search problems [27]. It builds on the combination of a rich declarative knowledge representation language with eficient solvers [ 28]. Here, we briefly recall essential notions and notations for the sake of readability of the following material.

We start with the syntax of basic normal logic programs that are a finite set of rules of the form: 0 ← 1, ..., , +1, ..., . (1) where each ( 0 ≤ ≤ , 0 ≤ ≤ ) is an atom. An atom is of the form (1, ..., ), with p a predicate symbol of arity k, and 1, ..., terms, built from variables and constants of the (implicit) language of the program of r2. The atom to the left of the arrow (← ) is said to be the head of the rule, whereas the (negated) atoms to the right of the arrow, corresponding to a conjunction, are the body of the rule. We call literals both an atom as well as its negation, and we call 0 the head of r (ℎ() = 0), and the set of all literals that occur in its body (also known as body literals) the body of r (() = {1, ..., , +1, ..., }). Rules with = = 0 are called facts and we admit that 0 is ⊥ to represent constraints, and we commonly omit ← in the former case and ⊥ in the latter. Facts allow us to express what necessarily must be true, while constraints provide us with the possibility to express conditions that we do not want to hold. The ground instantiation of a program, ( ) is obtained by replacing all (first-order) variables with constants from the given language 2Actually, function symbols are also permitted, but care must be taken as unbounded recursion is not admitted in ASP. observations (ASP) model (ASP or Text) process inputs consistency checking clingo consistent? no

yes inform user repairing clingo repairs yes show repairs found?

no inform user terminate of the program in all possible ways. A set of ground atoms is called an answer set of ( ) if is the subset-minimal model of the reduct ( ) obtained as {0 ← 1, ..., | of the form (1) in ( ) and {+1, . . . , } ∩ = ∅}.

ASP also admits a number of commonly used language constructs, most of which can be expressed by rules of form (1), but whose direct usage is much more convenient, and we indicate them briefly next.

First, choices permit that a number of atoms may be part of an answer set. E.g., for {(1); ℎ(1)} both (1) and ℎ(1) can be part of an answer set. As a result, we obtain four diferent answer sets in this case, corresponding to the power set over the set of possible elements. Then, constraints can be used to filter out certain models. We can also turn such a choice into a cardinality constraint, providing lower and/or upper bounds on the number of atoms to appear in each answer set. E.g., 1{(1); ℎ(1)}1 admits exactly two answer sets as in each one exactly one of the two atoms can appear. Often, conditionals are employed, that allow us to determine ranges. E.g., a choice {_(, ) : ()} would admit answer sets with atoms over _/2 as long as its second argument is a compound.

These constructs are also used in aggregates. E.g., #{ : _( 1, )} counts all diferent values that appear as second argument of the instances of _/2. Optimization statements allow us not to just search for one (or more) solutions of a problem, but also assign weights to solutions which can be used to order solutions and optimize in accordance with these. E.g., #{1@2, : _ℎ()} efectively sums the weights of all atoms over _ℎ/1, where in this case, 1 is the weight of the atom and 2 the assigned priority (to be able to combine several optimization criteria with priorities – the higher priority is considered first). Based on that, the solver will aim to find the answer set with the lowest overall assigned weight.

Finally, constants can be declared using # with a name and a default value. The latter can be overwritten when running that program. Comparison operators are naturally allowed, and the infix notation .. for numeric values or variables representing numeric values is shorthand for all the values between and including and .

2.3. ARBoLoM

ARBoLoM [17] has been developed for revision of BNs given a set of experimental results. The main idea is to check for consistency of the current model comparing its dynamic behaviour, i.e., what states are observed on what inputs, with the experimental observations. If the model’s behaviour replicates these, then the model is consistent, otherwise it is inconsistent. In the latter case, repair is applied adjusting the functions whose results do not match the observations.

As shown in Fig. 3, ARBoLoM is written in Python to handle the processing of data and integration of the diferent components and employs ASP encodings for consistency checking and repair using the state-of-the-art ASP tool clingo [18]. ARBoLoM expects a set of (possibly incomplete) experimental observations and the specification of a BN. The former is assumed to be in a specific format, but any diferent representation of states of activations of the relevant compounds can be converted to it easily. The latter is expected to be either already encoded in ASP or in one of the standard formats, which is converted automatically by the tool into the required ASP format.

For checking consistency (and repair), three diferent modes of interaction between compounds are considered, namely stable states (without any time component), and synchronous and asynchronous updates, and three slightly varying ASP encodings exist. For the details, we refer to Aleixo et al. [17].3

The functions that are inconsistent need to be repaired by modifying one by one each inconsistent regulatory function. This requires again as input the representations of the network and the observations. The considered modifications are adding and removing regulators, changing regulators from activator to inhibitor and vice-versa, and changing the function’s format, i.e., changing the number of terms as well as their composition. This results in a large variety of possible repaired functions. ARBoLoM aims at finding repairs that are as close as possible to the original functions in the spirit of minimal change, using the following order of priority:

1. Minimize the changes to the number of function terms;

2. Minimize the changes to the function’s regulators; 3. Minimize the changes to the signs of regulators; 4. Minimize the changes to the format of each term.

While the order of criteria 2.–4. is easily adjustable (as we will see), criterion 1. is not, because ARBoLoM employs iterative deepening search on the number of changes to the number of function terms first. The central idea is to start searching for solutions with the same number of function terms as the function to be repaired, and if no solution is found, simultaneously, we search for solutions with one more and one less function term in parallel, determining the solution in accordance with the other criteria if both solutions exist, picking the single one if one of the cases fails, and repeat the process otherwise incrementing the distance w.r.t. number of terms present. The iteration is naturally limited by the lower bound of 0 function terms and by an upper bound determined based on the number of positive observations. This is possible because, in the limit, each positive observation is only covered by a single term in the function, so if we cannot find a function with at most as many terms as existing distinct observations (for the function to be repaired), then no such candidate can exist. This is further improved by not counting identical transitions of such observations.

The details for the respective ASP encodings can also be found in [17], but since, unlike the encodings for consistency checking, we revise these, we will discuss them with more detail next. 4

3. Flexible Revision of Boolean Networks

Out of the four possible diferent repair operations distinguished in ARBoLoM, for the final three, their order of priority can be easily changed. The minimization of these is handled in ASP with optimization statements, each defined with a priority value corresponding to the order given in the end of the previous section. Reordering these criteria amounts to changing the priority value in these statements. For the first criterion however, which minimizes the change in the number of terms of the regulatory function, the optimization is handled by the iterative deepening search guided within the python code and therefore cannot be reordered in this way. This is a limitation, since a biologist might prefer a diferent order of preference for these criteria, which for a particular use case is more realistic. For example, if we know that a compound has been revealed to be relevant for a process in whose model it currently does not occur, then it needs to be added to the model in the revision process, and this requires adjusting the priority of criteria. But such modification cannot be achieved without substantial modifications to the implementation of the tool.

Therefore, we tackle this problem here and extend ARBoLoM to allow more flexibility on the choice of repair criteria, particularly in the order of preference for each type of repair. The main idea is to turn the criterion used for iterative deepening into one handled by ASP optimization. This allows parameterized optimization of all criteria in accordance with the preference of the user. We should note that this might afect overall performance, and we will analise this subsequently together with other important aspects in Sec. 4. 3The complete encodings can be found at https://github.com/fpaleixo/arbolom/tree/main/encodings/consistency. 4Still, the original complete encodings can be found at https://github.com/fpaleixo/arbolom/tree/main/encodings/repairs.

Listing 1: Simplified ASP encoding for repair with stable state observations. also the necessary update aspects due to the dynamics, which are not essential for our modifications. 5 In the following, we briefly describe the encoding and the main modifications, starting by briefly recalling the input format [17].

The regulatory graph = (, ) is encoded as a set of facts over /1 to represent all elements in and over /3 s.t. (, , ) represents that compound regulates with the sign (where = 0 encodes an activator, and = 1 an inhibitor). For regulatory functions, we use predicates /2 and /3 to represent the number of their terms and the constituting terms themselves, respectively. E.g., function(v1, 1) represents that 1’s regulatory function has one term, while term(v1, 1, v2) represents that term 1 of 1’s regulatory function contains 2. This notation is a bit less compact, but facilitates specific modifications.

The observations are encoded by means of predicates, /1 and /3. The former is used to represent the id of an experiment, while observation(e, c, s) represents the observed state (0 for inactive, 1 for active) of compound of experiment . Note that, for updates where the network does change its state over time, an additional argument for discrete time is required.

The repair encoding itself first introduces two constants as parameters (Lines 1–2), one for the compound whose function is to be repaired, the other for the upper bound of the number of terms (called nodes here). Then, we incorporate additional knowledge on biologically mandatory known regulators (possibly including the fixed signs, i.e., whether they are known to be activating or inhibiting) (Lines 3–8), and for those not fixed whether they are activating or inhibiting (Line 9). Then, we determine the number of available nodes (i.e., conjunctions of regulators) picking the larger number among the current number of terms and the given parameter (Lines 10–11). For each such node, we generate possible regulators (Line 12), and turn a node efective if it has at least one regulator associated to it (Line 13). Then, we ensure that the fixed regulators indeed appear in some node (Line 14), and that the function be in BCF (Line 15–17) verifying that the node is not fully contained in another node, by assuring its number of compounds difers from the number of compounds in common with another node. 5All new encodings can be found at https://github.com/rafael-patronilo/arbolom/tree/terms-criterion-asp/encodings/repairs.

Listing 2: New ASP minimization statements with parameterized priority. We identify nodes as negative if one of its activators is inactive (Line 18) or one of its inhibitors active (Line 19), and as positive if it is not negative (Line 20–21). Likewise, we determine which regulators appear in the original function and in the modified one (Lines 22–23) and which compounds exhibit changes in sign between the original and the modified function (Lines 24–25). Finally, we discard all solutions for which an experimental observation of the considered compound difers from the value computed by the modified function (Lines 26–27).

Let us discuss the major changes applied to this encoding. First and foremost, it difers conceptionally. Namely, it no longer includes the minimization statements directly, since these are formatted and added by python code, allowing their priority to be set at runtime. We explain how this works in a moment.

This principal modification causes a significant change in this encoding relative to the original one of ARBoLoM, required for handling the minimization of the change in the number of terms. We have removed the restriction which fixated the number of nodes of the functions to consider. This restriction was required by the deepening search to restrict the search space to the number of nodes being currently considered. As this minimization of the change in the number of nodes has been moved to the ASP program, this is neither required nor desirable as we wish for the search space to contain all possible functions regardless of the number of nodes. The constant max_node_number (formerly node_number) now represents the upper bound for the number of the terms rather than an exact restriction. This upper bound was already calculated in ARBoLoM to restrict the range of the deepening search as described in Section 2.3. We still calculate this value in the same way, by counting unique positive observations of the compound being repaired. We note that equivalent modifications were applied to the encodings for synchronous and asynchronous updates.

Regarding the minimization of changes, the tool admits now as additional input the parameters for the priority of criteria. This can be used within the Python code to prepare a number of optimization statements to be added to the ASP encoding. Listing 2 shows an abstract presentation of the minimization statements added by the python code, each with the placeholder P which is substituted with the correct priority for each statement according to the user’s selection. Note that this is indeed an abstracted representation, and each of the lines can be associated with a diferent value, permitting in principle a very flexible attribution of priorities. Our focus though is to assign these in accordance with the characteristics of modifications considered in the previous sections and also used as separating comments in the listing.

In more detail, we have added two new minimization statements which count and minimize changes in the number of terms for both addition (Line 2) and removal (Line 3). This accounts for the minimization of changes previously handled by the iterative deepening process. Lines 5 and 6 account for changes in the number of regulators and allows their minimization, while Line 8 captures minimization of sign changes of the involved regulators. Finally, Lines 10 and 11 handle minimization in changes of terms, by accounting for regulators that have been added to a node or removed from it. The resulting minimized numbers of modifications are also extracted from the results and can be used for analysis and comparison.

In addition, we also modified the minimize statements in Lines 5, 6, 10 and 11 in the sense of internalising former auxiliary predicates inside of these minimize statements. While this change has Abbreviation

MCC

FY TCR SP Th no efect on the result of the optimization itself, this kind of encoding commonly improves processing times in search with ASP and may thus impact positively overall performance of the repair program.

4. Evaluation

We now assess in a detailed manner how these modifications afect the overall performance, studying in particular to what extent changing the order of preference itself impacts on performance and how ARBoLoM with a preference order of criteria aligned with that of ModRev compares with ModRev.

To that end, we have taken advantage of the benchmarks employed before for ARBoLoM [17] which were generated based on the same methodology used in [29]. Essentially real BNs are considered and corrupted, and sets of observations, generated from the original BN, are used to repair its corrupted version. The BNs are corrupted by applying four operations, with varying probability : • Altering each regulatory function with a probability (0.1 to 1), changing the number of terms and/or the format of the terms of the function. • Altering the sign of each edge (regulator) in the function with a probability (0.05 to 0.75). • Removing a regulator from the function with probability (0.01 to 0.15).

• Adding a new regulator to the function with probability (0.01 to 0.15)

Based on that, 24 diferent corruption configurations were defined, varying both the available corruption operations and respective probabilities. Then, five benchmarks were created using real biological models: MCC [23], FY [30], TCR [31], SP [32], and Th [33] – some information on these is summarized in Table 1. For each configuration 100 corrupted models were generated, resulting in a total of 2400 models for each benchmark. Aside the corrupted models, each benchmark includes four sets of observations generated using the original model, ranging from a set with 1 experiment with 3 timepoints to a set with 5 experiments with 20 timepoints.

In addition, to test the tool on a larger network, a benchmark using a model that combines all the ifve real models was created – its corresponding data can be inferred from Table 1 as these 5 models are distinct for all but one compound shared between two BNs. Even though this network is thus a bit disconnected, this still provides a challenge, as the corruptions are generated arbitrarily, and possible ifxes need to inspect the entire search space. Given the size of this new network, in this benchmark only 10 corrupted models for each of the 24 configurations were generated and only one set of observations with 5 experiments and 20 timepoints.

Out of the six benchmarks the original ARBoLoM was tested on, so far we have evaluated the new version of the tool on two of them: the TCR benchmark, which is the largest (by number of compounds) among the 5 original networks and consequentially the model whose benchmark takes longer to repair on average; and the benchmark combining all five models, which we will refer to as All5. We ran our tests on a Linux machine with an AMD Ryzen 7 9700X CPU and 32GB of RAM.

We tested five orders of priority which we consider significant, analyzing both the time the revision process takes and the cost of each criterion in the optimal repairs, that is the number of changes of each type applied in the repaired function. Table 2 shows the mean, min and max costs of each criterion considering each of the five orders of priority. Note that the order A is equivalent to the order used in the original ARBoLoM. Orders B, C and D are obtained by moving the criterion with the highest priority in the previous order to the lowest priority, therefore cycling the highest priority between all criteria. Order E is diferent. It is equivalent to the order used in ModRev [ 16], that is, it first prioritizes changes to the regulators, second changes to the signs of the regulators and lastly changes to the format of the regulatory function, be it term number or format of each term. This last order is therefore similar to order B with the diference that the same priority is assigned to reducing the change in term number and the change in the format of each term.

We can see that the criterion with the highest priority always has the lowest average cost, indicating that the prioritization works as intended. This is however not always the case for the other criteria. For example, in the order A, the average cost of the change in signs is higher than the cost in the change to term format, despite signs having a higher priority. This can be explained by the fact that the space of possible repairs has already been limited by the first and second criteria, and therefore it may not be possible to pick a set of repairs with fewer changes in signs than in the term format.

Table 3 shows the times for the whole revision process, consistency checking and repairing for each of the five orders of priority listed in Table 2. Note that we include the consistency checking time merely for context when analyzing the total revision time. Consistency checking should not depend on the order of the criteria as this is a parameter of the repair process, and the small diferences in the respective times between diferent orders are therefore most likely related to circumstantial factors of the testing environment. Variations in the repair time however can be related to the choice of criteria, but even these do not show much variation. Looking at the maximum times, we see that the tool seems to take under 5 seconds for any model, regardless of the interaction mode or order of priority in the repair criteria, and, on average for all cases, revision is performed in less than 0.5 seconds. This shows that for TCR, and therefore likely for the other four smaller real models as well, this new version of ARBoLoM is as eficient as the original [ 17], with the advantage of flexibility in the order of the criteria.

We were also interested in understanding how our new solution performs on a larger model. Table 4 shows the times for the benchmark All5 in asynchronous mode, using each of the five orders of priority we have established. We note that times overall increase, but results are still obtained quickly with average repair times varying between 1 and 4 seconds and maximum values between 7 and 33 seconds. Notably, among the 5 orders, the one corresponding to that previously employed by ARBoLoM (order A) turns out to be the slowest, followed by the one used by ModRev (order E). We deem that, for A, this results from the fact that there is in principle a large number of nodes to be added, and the optimization process takes more time due to that. For E, we conjecture that this is influenced by the two conjoined criteria, as mixing them might make it more complicated to distinguish between diferent solutions.

Still, this shows that the solution employed by ARBoLoM is clearly superior to that of ModRev, even when using the same criteria, as revision always succeeds and with, on average, interactive response times. ModRev, on the other hand, only succeeds in 79% of the cases and takes on average around 105 seconds on those instances it successfully revises, even if on a slightly slower system [17].

We also include on the right of Table 4, for order A, the times for the former version of ARBoLoM using deepening search. On average, deepening search proves to be over two times as fast and on the worst case it proves four times and over 20 seconds faster. Despite the new version being notably slower, in exchange we obtain the flexibility, and even for this large BN, we are able to provide repairs well below a minute in asynchronous mode for all models, regardless of the order of priority for the criteria.

We cannot provide the same reassurance for synchronous mode. Our experiments so far show that synchronous mode poses a significant challenge, with the repair of multiple compounds timing out. All these compounds have a high max number of nodes (terms) relative to the compounds which get repaired within reasonable time. Recall that this upper bound for the number of terms in the repaired function is calculated by counting the number of unique positive observations of the compound. This is a limitation of the new version over the former version, which uses deepening search to progressively try diferent numbers of terms and therefore is not susceptible to problems related to a large upper limit. We are currently studying possible strategies to mitigate this issue. One easy solution would be to use the deepening search strategy whenever it is possible, so we can at least provide the same performance in those situations, while retaining the flexibility of criteria otherwise.

Finally, as described in Section 3, we modified existing ASP minimization criteria, internalising auxiliary predicates into the minimize statements, with the goal of improving performance. To understand if and how much our changes to the existing ASP minimization criteria afect the performance, we transposed these changes to the former version of ARBoLoM (with the deepening search) and test on the benchmark All5 in synchronous mode, which, due to the overall expected longer revision times, should allow us to give a better idea of the impact on performance. Table 5 shows the times for both versions, both obtained in the same testing environment. Surprisingly, there is no significant diference between both versions, and we will have to investigate this matter further.

5. Conclusions

We have introduced a new version of ARBoLoM, a tool for the revision of Boolean Networks given a set of experimental observations. The main improvement lies in the new capacity for the user to be able to freely choose the preference order of application of optimization criteria, employed to ensure that revision is done with minimal changes. Our assessment shows that this is indeed viable in the sense that revision of a real BN, for several variants of preference orders, can be done very fast. This includes the specific order employed by another tool, ModRev [ 16], validating the previous assessment [17], now for identical optimization criteria, that ARBoLoM’s solution based on ASP is indeed superior by a considerable margin.

We have however also seen, that for larger models, dropping the iterative deepening search in general has a negative impact on performance, in particular if the search space in terms of possible nodes to be added increases. We plan to incorporate the previous solution of ARBoLoM for this particular choice of criteria to take advantage of the improved performance. We also will do more exhaustive experiments with a wider set of real BNs, aiming to confirm our positive results here, and to tackle open issues including the ongoing study of the impact of diferent forms of encoding the optimizations.

Acknowledgments

We thank the anonymous reviewers for their valuable feedback. This work is supported by project BIOREVISE (2023.13327.PEX) and by UID/04516/NOVA Laboratory for Computer Science and Informatics (NOVA LINCS) with the financial support of FCT.IP.

Declaration on Generative AI The author(s) have not employed any generative AI tools.

[1]

Siegel ,

Radulescu ,

M. L.

Borgne ,

Veber ,

Ouy ,

Lagarrigue , Qualitative analysis of the relation between DNA microarray data and behavioral models of regulation networks , Biosystems 84 ( 2006 ) 153 - 174 . doi: 10 .1016/j.biosystems. 2005 . 10 .006.

[2]

Jong , Modeling and simulation of genetic regulatory systems: A literature review , Journal of computational biology : a journal of computational molecular cell biology 9 ( 2002 ) 67 - 103 . doi: 10 .1089/10665270252833208.

[3]

Glass , S. A . Kaufman, The logical analysis of continuous, non-linear biochemical control networks , Journal of Theoretical Biology 39 ( 1973 ) 103 - 129 . doi:https://doi.org/10.1016/ 0022 - 5193 ( 73 ) 90208 - 7 .

[4]

Thomas , Boolean formalization of genetic control circuits , Journal of Theoretical Biology 42 ( 1973 ) 563 - 585 . doi:https://doi.org/10.1016/ 0022 - 5193 ( 73 ) 90247 - 6 .

[5]

Salinas ,

Gómez ,

Aracena , Existence and non existence of limit cycles in boolean networks , in: Automata and Complexity , volume 42 , Springer, 2022 , pp. 233 - 252 . doi: 10 .1007/ 978-3- 030 -92551-2_ 15 .

[6]

Ostrowski ,

Paulevé ,

Schaub ,

Siegel ,

Guziolowski , Boolean network identification from perturbation time series data combining dynamics abstraction and logic programming , Biosystems 149 ( 2016 ) 139 - 153 . doi:https://doi.org/10.1016/j.biosystems. 2016 . 07 .009, selected papers from the Computational Methods in Systems Biology 2015 conference .

[7]

Réda ,

Wilczyński , Automated inference of gene regulatory networks using explicit regulatory modules , Journal of Theoretical Biology 486 ( 2020 ) 110091 . doi: 10 .1016/j.jtbi. 2019 . 110091 .

[8]

Mitsos ,

I. N.

Melas ,

Siminelakis ,

A. D.

Chairakaki ,

Saez-Rodriguez ,

L. G.

Alexopoulos , Identifying drug efects via pathway alterations using an integer linear programming optimization formulation on phosphoproteomic data , PLoS Computational Biology 5 ( 2009 ) e1000591 . doi: 10 . 1371/journal.pcbi. 1000591 .

[9]

Schaub ,

Siegel ,

Videla , Reasoning on the Response of Logical Signaling Networks with ASP , 2014 , pp. 49 - 92 . doi: 10 .1002/9781119005223.ch2.

[10]

Videla ,

Guziolowski ,

Eduati ,

Thiele ,

Grabe ,

Saez-Rodriguez ,

Siegel , Revisiting the training of logic models of protein signaling networks with ASP , in: Computational Methods in Systems Biology , Springer Berlin Heidelberg, 2012 , pp. 342 - 361 . doi: 10 .1007/ 978-3- 642 -33636-2_ 20 .

[11]

Merhej ,

Schockaert ,

M. D.

Cock , Repairing inconsistent answer set programs using rules of thumb: A gene regulatory networks case study , International Journal of Approximate Reasoning 83 ( 2017 ) 243 - 264 . doi: 10 .1016/j.ijar. 2017 . 01 .012.

[12]

Lemos , I. Lynce ,

Monteiro , Repairing boolean logical models from time-series data using answer set programming , Algorithms for Molecular Biology 14 ( 2019 ) 9 . doi: 10 .1186/ s13015-019-0145-8.

[13]

Chai , Reachability Analysis and Revision of Dynamics of Biological Regulatory Networks , Ph.D. thesis , École centrale de Nantes, 2019 . URL: https://theses.hal.science/tel-02145438/.

[14]

Gebser ,

Guziolowski ,

Ivanchev ,

Schaub ,

Siegel ,

Thiele ,

Veber , Repair and prediction (under inconsistency) in large biological networks with answer set programming , Principles of Knowledge Representation and Reasoning: Proceedings of the 12th International Conference, KR 2010 ( 2010 ). URL: https://cdn.aaai.org/ocs/1334/1334-7439-1-PB.pdf.

[15]

Mobilia ,

Rocca ,

Chorlton ,

Fanchon , L. Trilling, Logical modeling and analysis of regulatory genetic networks in a non monotonic framework , in: Bioinformatics and Biomedical Engineer-