Introduction

Learning from Interpretation Transition using Feed-Forward Neural Networks

Enguerrand Gentet

enguerrand.gentet@ens-cachan.fr 0 1

Sophie Tourret

tourret@nii.ac.jp 1

Katsumi Inoue

inoue@nii.ac.jp 1 2 0 ENS (Cachan)/Paris-Sud University (Orsay) France 1 National Institute of Informatics (Tokyo) Japan 2 Tokyo Institute of Technology (Tokyo) Japan

27 33

Understanding the evolution of dynamical systems is an ILP topic with many application domains, e.g., multi-agent systems, robotics and systems biology. In this paper, we present a method relying on an arti cial neural network (NN) to learn rules describing the evolution of a dynamical system of Boolean variables. The experimental results show the potential of this approach, which opens the way to many extensions naturally supported by NNs, such as the handling of noisy data, continuous variables or time delayed systems.

Introduction

hnhid Input layer

Hidden layer

Output layer xnvar (t) invar w1;nvar;nhid w2;nhid;nvar onvar xn(t + 1) We adopt the representation of dynamical systems used in [ 9 ]. A system is a nite state vector evolving through time x(t) = (x1(t); x2(t); :::; xnvar (t)) where each xi(t) is Boolean variable. In systems biology these variables can represent, e.g., the presence or absence of some genes or proteins inside a cell. The aim of NN-LFIT is to output a normal logic program P that satis es the condition x(t + 1) = TP (x(t)) for any t, where TP is the immediate consequence operator for P [ 9 ]. The rules of P are of the form 8t; xi(t + 1) F (x(t)) for all i in f1 : : : ; nvarg where F is a Boolean formula in propositional logic (PL). The standard terminology and notations of PL are used1, i.e., when referring to literals (variables or negation of variables), terms (conjunctions of literals) and formul . We are especially concerned with formul in disjunctive normal form (DNF), i.e., disjunctions of terms. Note that this formalism allows us to describe only the simplest of dynamical systems, meaning those purely Boolean and without delays i.e. where x(t + 1) depends only of x(t).

The type of NN used in NN-LFIT re ects the simplicity of the systems considered. We use feed-forward NNs [ 17 ], where the information moves in only one direction, forward, starting from the input layer, going through the hidden layers and ending to the output layer. We furthermore restrict ourselves to using only one hidden layer, i.e. a total of three layers, because it simpli es a lot the architecture of the NN and its treatment. This does not limit the accuracy of the NN as long as there are enough neurons in the hidden layer [ 8 ]. The notations related to the NN are introduced in Fig. 1. Formally, each ith neuron on the lth layer is connected to each jth neuron on the (l +1)th layer by a link characterized by its weight denoted wl;i;j 2 R and the output of each neuron is computed from nhid the weighted sum of all its inputs2, e.g., output(ok) = f ( P w2;j;koutput(hj )), j=1 where f is a sigmoid function. The state vector x(t) is directly fed to the input 1 An introduction to logic is available in, e.g., [ 1 ]. 2 Due to the space limitations, details about inner mechanisms of NNs, e.g. biases, are omitted. . . .

x1(t + 1) x2(t + 1) layer and the output layer predicts the values of x(t + 1). The last parameter remaining to choose is the number of neurons on the hidden layer nhid, which is tuned speci cally by NN-LFIT to suit each problem. To determine the correct weights of an NN, it must be trained on the available data. The standard approach consists in splitting the available data in two sets: the training set, on which the training of the NN is performed, and the test set, on which the performance of the trained NN is measured. Usually 80% of the training set is used to tune the weights of the NN while the remaining 20%, called the validation set, is used to tune the NN parameters (meaning in our case, the nhid value). The training method used in NN-LFIT is standard: backward propagation with an adaptive rule on the gradient step and L2 regularization to avoid over- tting the training data. The error made by the trained NN on each data set (resp. written Etrain, Eval, and Etest) is the ratio of incorrect predictions made by each output neuron averaged on all output neurons. 3

The NN-LFIT algorithm

The purpose of this section is to introduce the NN-LFIT algorithm. This algorithm constructs automatically a model of a system from the observation of its state transitions and generate transition rules which describe the dynamic of the system. The main steps of NN-LFIT are listed bellow: Step 1: Create the model of the system.

1. Choose the number of hidden neurons nhid and train the NN. (a) Initialize nhid with a trial and error algorithm.

(b) Re ne nhid with a basic constructive algorithm.

2. Simplify the NN by pruning useless links.

Step 2: Extract the rules 1. Extract logical rules in DNF using a blackbox algorithm.

2. Simplify logical rules into DNF form with an external tool.

The NN building steps are inspired by the work presented in [ 10 ]. Step 1 - Creation of the model. The rst building step is to generate a fully connected NN with a well tted architecture to learn the dynamics of the observed system. We rst use an initialization algorithm and then we re ne the architecture with a constructive algorithm.

Initialization algorithm The initial number of neurons on the hidden layer nhid is chosen using a simple trial and error algorithm. It consists in training the NN using several architectures with an incremental initial number of hidden neurons starting from one and stopping when Eval no longer decreases after a few tries. Every time we try a new architecture, we randomly initialize all the weights.

Constructive algorithm The architecture is improved by using a basic constructive algorithm. It uses the same principle as the initialization algorithm except that every time we add a hidden neuron, we keep all the trained weights attached to the former neurons. Pruning algorithm The purpose of this step is to remove useless links. To do so we introduce the notion of link e ciency. To compute the e ciency of a speci c link, we multiply its weight by the weights of every other link starting from (or ending to) the same hidden neuron it ends to (or starts from). In other words, the e ciency of a link quanti es the best contribution among all the paths going through this link. It is therefore logical to remove links with low e ciency because they have less e ects on the predictions compared to others. We use a simple dichotomous search to remove as many links as possible without increasing Etrain. After the pruning algorithm has been run, if some hidden neurons have lost all their links to the output layer or all their links from the input layer, they can be removed.

Extracting rules from the fully connected NN right after the steps 1.(a) and 1.(b) is possible. However, as shown in the experimental results, the rules extracted after the simpli cation step are both simpler and more accurate than those extracted before. In addition, thanks to the simpli cation (step 2.2), the rule extraction process is less time consuming.

Step 2 - Extraction of the rules. To extract the rules underlying the transition system from the NN, each output neuron oi is considered independently. First the sub-NN Ni, made of oi plus all the input and hidden neurons that can reach oi and their connections to each other, is extracted from the main N N . Then, Ni is used as a blackbox to construct the rules. All possible input vectors are fed to Ni and only those that activate oi are kept. The union of these vectors is converted into a DNF formula F that is then simpli ed using a tool called primer [ 15 ]. For example, let us consider a system (x1(t); : : : ; x4(t)) and the NN N obtained by applying Step 1 of NN-LFIT on it. We focus on the neuron o2 of N and consider that, due to the pruning algorithm, o2 only depends on i1, i2 and i4. We start by querying all the possible combinations of (i1; i2; i4) inputs, keeping only the ones that activate o2. Now consider that o2 is activated only in the following cases: (1) i1 is on, i2 and i4 are o ;(2) i1 and i2 are on and i4 is o ; (3) i1, i2 and i4 are on. Then o2 is represented by the formula: F = (i1 ^ :i2 ^ :i4) _ (i1 ^ i2 ^ :i4) _ (i1 ^ i2 ^ i4). Finally, primer returns the simpler but equivalent formula F 0 = (i1 ^ :i4) _ (i1 ^ i2). Going back to the original transition system, the rule describing the evolution of x2(t) extracted from N is thus: x2(t + 1) (x1(t) ^ :x4(t)) _ (x1(t) ^ x2(t)). 4

Experimental results

The benchmarks used in this paper are three Boolean networks from [ 4 ] also used for evaluating LFIT in [ 9 ], describing the cell cycle regulation of budding yeast, ssion yeast and mammalians. We randomly assign the 2nvar transitions describing these networks into the test set and training set (that includes the validation set). Although it is standard to put around 80% of the available data in the training set, we want to simulate the fact that real world biological data are incomplete hence we start by analyzing the in uence of the size of the training set on the accuracy of the NN (see Fig. 2)3 It is measured by Etest and averaged over 30 random allocations of the data in the di erent sets. We observe that each successive sub-step of NN-LFIT improves the accuracy of the model and that, as expected, Etest decreases when the size of the training set increases. It reaches an error rate of only 1% while training only on 15% of the data and becomes negligible when the training covers 50% of the data. In comparison, LFIT [ 9 ] has a nearly constant error rate on the test set (resp. 36% and 33% on the mammalian and ssion benchmarks) for all sizes of the training set. The following experiments are conducted allocating 15% of the data to the training set and the results are also averaged over 30 random allocations.

Table 1 shows the parameters of the NN architectures produced by NNLFIT and their corresponding Etest. The numbers of neurons and links decrease signi cantly during the pruning step (16% less hidden neurons and 65% less links) along with Etest (29% reduction) showing that the simpli cation step not only reduces the complexity of the NN but also improves the model performances through an e cient generalization.

Finally we evaluate the correctness and simplicity of the rules learned by NNLFIT. For each variable xi, we write Ri the corresponding inferred rule and Ri the original rule. Considering each term D in Ri and D in Ri , we identify three 3 The results for the budding benchmark are omitted due to space limitations. categories: true positives (valid) when D^Ri is true; false positives (wrong) when D ^ :Ri is true; and false negatives (missing) when D ^ :Ri is true. For each variable, the distribution of these categories after the construction and pruning steps of NN-LFIT are shown on Fig. 34. The pruning step reduces the number of terms (true and false positives) in almost all the rules which means they are simpler. Moreover the proportion of false positives and negatives diminishes, re ecting the increase of the accuracy of the rules observed on Fig. 2. 5

Conclusion

In this paper, we present NN-LFIT, a method using feed-forward NNs to extract logic programs, describing the dynamics of systems from state measurements. Experimental results indicate good overall performances in term of correctness and simplicity of the obtained rules, even when handling only a fraction of the data. Improvements and extensions of NN-LFIT exploiting more capacities of NNs are planned. One such improvement is to extract the rules using a decompositional approach as in, e.g., [ 6 ] which details a sound but incomplete extraction algorithm improving the complexity quality trade-o . Considered extensions also include the handling of noisy data and systems with continuous variables which can be naturally handled by feed-forward NNs. It should also be possible to use recursive NNs to model systems with delays where x(t) depends not only on x(t 1) but also on some x(t k) for k greater than one. Equipped with such extensions, the eld of application of NN-LFIT would encompass problems such as those found in the Dream challenges [ 7 ], including real-life data. 4 Note that a rule of a logic program as de ned in [ 9 ] is a term here, except for constant rules, e.g., x1 in 3b which is always false and thus contains no term.

Ricardo

Caferra . Logic for computer science and arti cial intelligence . John Wiley & Sons, 2013 .

Vladimir

Cherkassky , Jerome H Friedman , and Harry Wechsler. From statistics to neural networks: theory and pattern recognition applications , volume 136 . Springer Science & Business Media , 2012 .

3. Jean-Paul Comet , Jonathan Fromentin, Gilles Bernot, and Olivier Roux . A formal model for gene regulatory networks with time delays . In Computational SystemsBiology and Bioinformatics , pages 1 { 13. Springer, 2010 .

Elena

Dubrova and

Maxim

Teslenko . A sat-based algorithm for nding attractors in synchronous boolean networks . IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) , 8 ( 5 ): 1393 { 1399 , 2011 .

5. Artur

S Avila

Garcez and Gerson

Zaverucha . The connectionist inductive learning and logic programming system . Applied Intelligence , 11 ( 1 ): 59 { 77 , 1999 .

6. AS d'Avila Garcez , Krysia Broda, and Dov M Gabbay. Symbolic knowledge extraction from trained neural networks: A sound approach . Arti cial Intelligence , 125 ( 1 ): 155 { 207 , 2001 .

7. Alex Green eld, Aviv Madar, Harry Ostrer, and Richard Bonneau. Dream4: Combining genetic and dynamic information to identify biological networks and dynamical models . PloS one , 5 ( 10 ): e13397 , 2010 .

Kurt

Hornik , Maxwell Stinchcombe,

and Halbert

White . Multilayer feedforward networks are universal approximators . Neural networks , 2 ( 5 ): 359 { 366 , 1989 .

Katsumi

Inoue , Tony Ribeiro, and

Chiaki

Sakama . Learning from interpretation transition . Machine Learning , 94 ( 1 ): 51 { 79 , 2014 .

10. SM Kamruzzaman and Md Monirul Islam . An algorithm to extract rules from arti cial neural networks for medical diagnosis problems . International Journal of Information Technology , 12 ( 8 ), 2006 .

11. Andreas D Lattner , Andrea Miene, Ubbo Visser, and Otthein Herzog . Sequential pattern mining for situation and behavior prediction in simulated robotic soccer . In Robot Soccer World Cup , pages 118 { 129 . Springer, 2005 .

12. Jens

Lehmann

Sebastian

Bader , and

Pascal

Hitzler . Extracting reduced logic programs from arti cial neural networks . Applied intelligence , 32 ( 3 ): 249 { 266 , 2010 .

13. David Mart nez, Guillem Alenya, Carme Torras, Tony Ribeiro, and

Katsumi

Inoue . Learning relational dynamics of stochastic domains for planning . Proceedings of ICAPS 2016 , pages 235 { 243 , 2016 .

14. Stephen

Muggleton

, Luc De Raedt, David Poole, Ivan Bratko, Peter Flach, Katsumi Inoue, and

Ashwin

Srinivasan . Ilp turns 20{biography and future challenges . Machine Learning , 86 ( 1 ):3{ 23 , 2012 .

15. Alessandro

Previti

, Alexey Ignatiev, Antonio Morgado, and Joao Marques-Silva. Prime compilation of non-clausal formulae . In Proceedings of the 24th International Conference on Arti cial Intelligence , pages 1980 { 1987 . AAAI Press, 2015 .

16. Tony

Ribeiro

, Morgan Magnin, Katsumi Inoue, and

Chiaki

Sakama . Learning delayed in uences of biological systems . Frontiers in bioengineering and biotechnology, 2 , 2014 .

17. Daniel Svozil, Vladimir Kvasnicka, and

Jiri

Pospichal . Introduction to multi-layer feed-forward neural networks . Chemometrics and intelligent laboratory systems , 39 ( 1 ): 43 { 62 , 1997 .