<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning from Interpretation Transition using Feed-Forward Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enguerrand Gentet</string-name>
          <email>enguerrand.gentet@ens-cachan.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Tourret</string-name>
          <email>tourret@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katsumi Inoue</string-name>
          <email>inoue@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENS (Cachan)/Paris-Sud University (Orsay)</institution>
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Informatics (Tokyo)</institution>
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tokyo Institute of Technology (Tokyo)</institution>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>27</fpage>
      <lpage>33</lpage>
      <abstract>
        <p>Understanding the evolution of dynamical systems is an ILP topic with many application domains, e.g., multi-agent systems, robotics and systems biology. In this paper, we present a method relying on an arti cial neural network (NN) to learn rules describing the evolution of a dynamical system of Boolean variables. The experimental results show the potential of this approach, which opens the way to many extensions naturally supported by NNs, such as the handling of noisy data, continuous variables or time delayed systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>hnhid
Input layer</p>
      <p>Hidden layer</p>
      <p>
        Output layer
xnvar (t)
invar
w1;nvar;nhid
w2;nhid;nvar
onvar
xn(t + 1)
We adopt the representation of dynamical systems used in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A system is a
nite state vector evolving through time x(t) = (x1(t); x2(t); :::; xnvar (t)) where
each xi(t) is Boolean variable. In systems biology these variables can represent,
e.g., the presence or absence of some genes or proteins inside a cell. The aim
of NN-LFIT is to output a normal logic program P that satis es the condition
x(t + 1) = TP (x(t)) for any t, where TP is the immediate consequence
operator for P [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The rules of P are of the form 8t; xi(t + 1) F (x(t)) for all i in
f1 : : : ; nvarg where F is a Boolean formula in propositional logic (PL). The
standard terminology and notations of PL are used1, i.e., when referring to literals
(variables or negation of variables), terms (conjunctions of literals) and formul .
We are especially concerned with formul in disjunctive normal form (DNF),
i.e., disjunctions of terms. Note that this formalism allows us to describe only
the simplest of dynamical systems, meaning those purely Boolean and without
delays i.e. where x(t + 1) depends only of x(t).
      </p>
      <p>
        The type of NN used in NN-LFIT re ects the simplicity of the systems
considered. We use feed-forward NNs [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], where the information moves in only one
direction, forward, starting from the input layer, going through the hidden
layers and ending to the output layer. We furthermore restrict ourselves to using
only one hidden layer, i.e. a total of three layers, because it simpli es a lot the
architecture of the NN and its treatment. This does not limit the accuracy of the
NN as long as there are enough neurons in the hidden layer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The notations
related to the NN are introduced in Fig. 1. Formally, each ith neuron on the lth
layer is connected to each jth neuron on the (l +1)th layer by a link characterized
by its weight denoted wl;i;j 2 R and the output of each neuron is computed from
nhid
the weighted sum of all its inputs2, e.g., output(ok) = f ( P w2;j;koutput(hj )),
j=1
where f is a sigmoid function. The state vector x(t) is directly fed to the input
1 An introduction to logic is available in, e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2 Due to the space limitations, details about inner mechanisms of NNs, e.g. biases, are
omitted.
.
.
.
      </p>
      <p>x1(t + 1)
x2(t + 1)
layer and the output layer predicts the values of x(t + 1). The last parameter
remaining to choose is the number of neurons on the hidden layer nhid, which
is tuned speci cally by NN-LFIT to suit each problem. To determine the
correct weights of an NN, it must be trained on the available data. The standard
approach consists in splitting the available data in two sets: the training set, on
which the training of the NN is performed, and the test set, on which the
performance of the trained NN is measured. Usually 80% of the training set is used to
tune the weights of the NN while the remaining 20%, called the validation set,
is used to tune the NN parameters (meaning in our case, the nhid value). The
training method used in NN-LFIT is standard: backward propagation with an
adaptive rule on the gradient step and L2 regularization to avoid over- tting the
training data. The error made by the trained NN on each data set (resp. written
Etrain, Eval, and Etest) is the ratio of incorrect predictions made by each output
neuron averaged on all output neurons.
3</p>
    </sec>
    <sec id="sec-2">
      <title>The NN-LFIT algorithm</title>
      <p>The purpose of this section is to introduce the NN-LFIT algorithm. This
algorithm constructs automatically a model of a system from the observation of its
state transitions and generate transition rules which describe the dynamic of the
system. The main steps of NN-LFIT are listed bellow:
Step 1: Create the model of the system.</p>
      <p>1. Choose the number of hidden neurons nhid and train the NN.
(a) Initialize nhid with a trial and error algorithm.</p>
      <p>(b) Re ne nhid with a basic constructive algorithm.</p>
      <p>2. Simplify the NN by pruning useless links.</p>
      <p>Step 2: Extract the rules
1. Extract logical rules in DNF using a blackbox algorithm.</p>
      <p>2. Simplify logical rules into DNF form with an external tool.</p>
      <p>
        The NN building steps are inspired by the work presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Step 1 - Creation of the model. The rst building step is to generate a fully
connected NN with a well tted architecture to learn the dynamics of the
observed system. We rst use an initialization algorithm and then we re ne the
architecture with a constructive algorithm.
      </p>
      <p>Initialization algorithm The initial number of neurons on the hidden layer
nhid is chosen using a simple trial and error algorithm. It consists in training
the NN using several architectures with an incremental initial number of
hidden neurons starting from one and stopping when Eval no longer decreases
after a few tries. Every time we try a new architecture, we randomly initialize
all the weights.</p>
      <p>Constructive algorithm The architecture is improved by using a basic
constructive algorithm. It uses the same principle as the initialization algorithm
except that every time we add a hidden neuron, we keep all the trained
weights attached to the former neurons.
Pruning algorithm The purpose of this step is to remove useless links. To
do so we introduce the notion of link e ciency. To compute the e ciency
of a speci c link, we multiply its weight by the weights of every other link
starting from (or ending to) the same hidden neuron it ends to (or starts
from). In other words, the e ciency of a link quanti es the best contribution
among all the paths going through this link. It is therefore logical to remove
links with low e ciency because they have less e ects on the predictions
compared to others. We use a simple dichotomous search to remove as many
links as possible without increasing Etrain. After the pruning algorithm has
been run, if some hidden neurons have lost all their links to the output layer
or all their links from the input layer, they can be removed.</p>
      <p>Extracting rules from the fully connected NN right after the steps 1.(a) and 1.(b)
is possible. However, as shown in the experimental results, the rules extracted
after the simpli cation step are both simpler and more accurate than those
extracted before. In addition, thanks to the simpli cation (step 2.2), the rule
extraction process is less time consuming.</p>
      <p>
        Step 2 - Extraction of the rules. To extract the rules underlying the transition
system from the NN, each output neuron oi is considered independently. First
the sub-NN Ni, made of oi plus all the input and hidden neurons that can reach
oi and their connections to each other, is extracted from the main N N . Then,
Ni is used as a blackbox to construct the rules. All possible input vectors are
fed to Ni and only those that activate oi are kept. The union of these vectors
is converted into a DNF formula F that is then simpli ed using a tool called
primer [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. For example, let us consider a system (x1(t); : : : ; x4(t)) and the
NN N obtained by applying Step 1 of NN-LFIT on it. We focus on the neuron
o2 of N and consider that, due to the pruning algorithm, o2 only depends on
i1, i2 and i4. We start by querying all the possible combinations of (i1; i2; i4)
inputs, keeping only the ones that activate o2. Now consider that o2 is activated
only in the following cases: (1) i1 is on, i2 and i4 are o ;(2) i1 and i2 are on
and i4 is o ; (3) i1, i2 and i4 are on. Then o2 is represented by the formula:
F = (i1 ^ :i2 ^ :i4) _ (i1 ^ i2 ^ :i4) _ (i1 ^ i2 ^ i4). Finally, primer returns
the simpler but equivalent formula F 0 = (i1 ^ :i4) _ (i1 ^ i2). Going back to the
original transition system, the rule describing the evolution of x2(t) extracted
from N is thus: x2(t + 1) (x1(t) ^ :x4(t)) _ (x1(t) ^ x2(t)).
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimental results</title>
      <p>
        The benchmarks used in this paper are three Boolean networks from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] also
used for evaluating LFIT in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], describing the cell cycle regulation of budding
yeast, ssion yeast and mammalians. We randomly assign the 2nvar transitions
describing these networks into the test set and training set (that includes the
validation set). Although it is standard to put around 80% of the available data
in the training set, we want to simulate the fact that real world biological data are
incomplete hence we start by analyzing the in uence of the size of the training
set on the accuracy of the NN (see Fig. 2)3 It is measured by Etest and averaged
over 30 random allocations of the data in the di erent sets. We observe that
each successive sub-step of NN-LFIT improves the accuracy of the model and
that, as expected, Etest decreases when the size of the training set increases. It
reaches an error rate of only 1% while training only on 15% of the data and
becomes negligible when the training covers 50% of the data. In comparison,
LFIT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has a nearly constant error rate on the test set (resp. 36% and 33%
on the mammalian and ssion benchmarks) for all sizes of the training set. The
following experiments are conducted allocating 15% of the data to the training
set and the results are also averaged over 30 random allocations.
      </p>
      <p>Table 1 shows the parameters of the NN architectures produced by
NNLFIT and their corresponding Etest. The numbers of neurons and links decrease
signi cantly during the pruning step (16% less hidden neurons and 65% less
links) along with Etest (29% reduction) showing that the simpli cation step not
only reduces the complexity of the NN but also improves the model performances
through an e cient generalization.</p>
      <p>Finally we evaluate the correctness and simplicity of the rules learned by
NNLFIT. For each variable xi, we write Ri the corresponding inferred rule and Ri
the original rule. Considering each term D in Ri and D in Ri , we identify three
3 The results for the budding benchmark are omitted due to space limitations.
categories: true positives (valid) when D^Ri is true; false positives (wrong) when
D ^ :Ri is true; and false negatives (missing) when D ^ :Ri is true. For each
variable, the distribution of these categories after the construction and pruning
steps of NN-LFIT are shown on Fig. 34. The pruning step reduces the number
of terms (true and false positives) in almost all the rules which means they are
simpler. Moreover the proportion of false positives and negatives diminishes,
re ecting the increase of the accuracy of the rules observed on Fig. 2.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        In this paper, we present NN-LFIT, a method using feed-forward NNs to extract
logic programs, describing the dynamics of systems from state measurements.
Experimental results indicate good overall performances in term of correctness
and simplicity of the obtained rules, even when handling only a fraction of the
data. Improvements and extensions of NN-LFIT exploiting more capacities of
NNs are planned. One such improvement is to extract the rules using a
decompositional approach as in, e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which details a sound but incomplete extraction
algorithm improving the complexity quality trade-o . Considered extensions
also include the handling of noisy data and systems with continuous variables
which can be naturally handled by feed-forward NNs. It should also be possible
to use recursive NNs to model systems with delays where x(t) depends not only
on x(t 1) but also on some x(t k) for k greater than one. Equipped with such
extensions, the eld of application of NN-LFIT would encompass problems such
as those found in the Dream challenges [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], including real-life data.
4 Note that a rule of a logic program as de ned in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a term here, except for constant
rules, e.g., x1 in 3b which is always false and thus contains no term.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Caferra</surname>
          </string-name>
          .
          <article-title>Logic for computer science and arti cial intelligence</article-title>
          . John Wiley &amp; Sons,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Cherkassky</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jerome H Friedman</surname>
          </string-name>
          ,
          <article-title>and Harry Wechsler. From statistics to neural networks: theory and pattern recognition applications</article-title>
          , volume
          <volume>136</volume>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jean-Paul Comet</surname>
            , Jonathan Fromentin, Gilles Bernot, and
            <given-names>Olivier</given-names>
          </string-name>
          <string-name>
            <surname>Roux</surname>
          </string-name>
          .
          <article-title>A formal model for gene regulatory networks with time delays</article-title>
          .
          <source>In Computational SystemsBiology and Bioinformatics</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          13. Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Elena</given-names>
            <surname>Dubrova</surname>
          </string-name>
          and
          <string-name>
            <given-names>Maxim</given-names>
            <surname>Teslenko</surname>
          </string-name>
          .
          <article-title>A sat-based algorithm for nding attractors in synchronous boolean networks</article-title>
          .
          <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)</source>
          ,
          <volume>8</volume>
          (
          <issue>5</issue>
          ):
          <volume>1393</volume>
          {
          <fpage>1399</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Artur</surname>
            <given-names>S Avila</given-names>
          </string-name>
          <string-name>
            <surname>Garcez</surname>
            and
            <given-names>Gerson</given-names>
          </string-name>
          <string-name>
            <surname>Zaverucha</surname>
          </string-name>
          .
          <article-title>The connectionist inductive learning and logic programming system</article-title>
          .
          <source>Applied Intelligence</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>59</volume>
          {
          <fpage>77</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>AS d'Avila Garcez</surname>
          </string-name>
          , Krysia Broda, and
          <string-name>
            <surname>Dov M Gabbay.</surname>
          </string-name>
          <article-title>Symbolic knowledge extraction from trained neural networks: A sound approach</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>125</volume>
          (
          <issue>1</issue>
          ):
          <volume>155</volume>
          {
          <fpage>207</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Alex Green eld, Aviv Madar, Harry Ostrer, and Richard Bonneau. Dream4:
          <article-title>Combining genetic and dynamic information to identify biological networks and dynamical models</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>5</volume>
          (
          <issue>10</issue>
          ):
          <fpage>e13397</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Hornik</surname>
          </string-name>
          , Maxwell Stinchcombe,
          <string-name>
            <given-names>and Halbert</given-names>
            <surname>White</surname>
          </string-name>
          .
          <article-title>Multilayer feedforward networks are universal approximators</article-title>
          .
          <source>Neural networks</source>
          ,
          <volume>2</volume>
          (
          <issue>5</issue>
          ):
          <volume>359</volume>
          {
          <fpage>366</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Katsumi</given-names>
            <surname>Inoue</surname>
          </string-name>
          , Tony Ribeiro, and
          <string-name>
            <given-names>Chiaki</given-names>
            <surname>Sakama</surname>
          </string-name>
          .
          <article-title>Learning from interpretation transition</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>94</volume>
          (
          <issue>1</issue>
          ):
          <volume>51</volume>
          {
          <fpage>79</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>SM Kamruzzaman and Md Monirul Islam</article-title>
          .
          <article-title>An algorithm to extract rules from arti cial neural networks for medical diagnosis problems</article-title>
          .
          <source>International Journal of Information Technology</source>
          ,
          <volume>12</volume>
          (
          <issue>8</issue>
          ),
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Andreas D Lattner</surname>
            , Andrea Miene, Ubbo Visser, and
            <given-names>Otthein</given-names>
          </string-name>
          <string-name>
            <surname>Herzog</surname>
          </string-name>
          .
          <article-title>Sequential pattern mining for situation and behavior prediction in simulated robotic soccer</article-title>
          .
          <source>In Robot Soccer World Cup</source>
          , pages
          <volume>118</volume>
          {
          <fpage>129</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jens</surname>
            <given-names>Lehmann</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Bader</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>Extracting reduced logic programs from arti cial neural networks</article-title>
          .
          <source>Applied intelligence</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ):
          <volume>249</volume>
          {
          <fpage>266</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. David Mart nez, Guillem Alenya, Carme Torras, Tony Ribeiro, and
          <string-name>
            <given-names>Katsumi</given-names>
            <surname>Inoue</surname>
          </string-name>
          .
          <article-title>Learning relational dynamics of stochastic domains for planning</article-title>
          .
          <source>Proceedings of ICAPS 2016</source>
          , pages
          <fpage>235</fpage>
          {
          <fpage>243</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Stephen</surname>
            <given-names>Muggleton</given-names>
          </string-name>
          , Luc De Raedt, David Poole, Ivan Bratko, Peter Flach, Katsumi Inoue, and
          <string-name>
            <given-names>Ashwin</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          .
          <article-title>Ilp turns 20{biography and future challenges</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>86</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>23</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Alessandro</surname>
            <given-names>Previti</given-names>
          </string-name>
          , Alexey Ignatiev, Antonio Morgado, and
          <string-name>
            <surname>Joao</surname>
          </string-name>
          Marques-Silva.
          <article-title>Prime compilation of non-clausal formulae</article-title>
          .
          <source>In Proceedings of the 24th International Conference on Arti cial Intelligence</source>
          , pages
          <year>1980</year>
          {
          <year>1987</year>
          . AAAI Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tony</surname>
            <given-names>Ribeiro</given-names>
          </string-name>
          , Morgan Magnin, Katsumi Inoue, and
          <string-name>
            <given-names>Chiaki</given-names>
            <surname>Sakama</surname>
          </string-name>
          .
          <article-title>Learning delayed in uences of biological systems</article-title>
          .
          <source>Frontiers in bioengineering and biotechnology, 2</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Daniel Svozil, Vladimir Kvasnicka, and
          <string-name>
            <given-names>Jiri</given-names>
            <surname>Pospichal</surname>
          </string-name>
          .
          <article-title>Introduction to multi-layer feed-forward neural networks</article-title>
          .
          <source>Chemometrics and intelligent laboratory systems</source>
          ,
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>62</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>