<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Propositional Rules from Feed-forward Neural Networks - A New Decompositional Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Bader</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steffen H o¨lldobler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Mayer-Eichberger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Center for Computational Logic Technische Universita ̈t Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a new decompositional approach for the extraction of propositional rules from feed-forward neural networks of binary threshold units. After decomposing the network into single units, we show how to extract rules describing a unit's behavior. This is done using a suitable search tree which allows the pruning of the search space. Furthermore, we present some experimental results, showing a good average runtime behavior of the approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION AND MOTIVATION
As the knowledge stored in a neural network is hidden in its
weight, humans have no direct access to it. The goal of rule
extraction is to obtain a human readable description of the
output units behavior with respect to the input units. This is
usually done using if-then rules describing under which
conditions the output units will be active. Throughout this paper,
we will use networks of bipolar binary threshold units and
show how to extract propositional if-then rules.</p>
      <p>Rule extraction attempts can be divided into pedagogical
and decompositional approaches. The first conceives the
network as a black box, while the latter decomposes the network,
constructs intermediate rules and recomposes those. For a
general introduction into the field we refer to [Tickle et al.,
1998].</p>
      <p>Example 1. Figure 1 and 2 show a simple network and the
results of a naive pedagogical extraction. All possible inputs
are presented to the network and the network is evaluated.
For each active output unit, a rule is constructed such that its
precondition matches the current input.</p>
      <p>Obviously, the naive pedagogical approach presented in
Example 1 has an exponential complexity, as there are
exponentially many different inputs, even for a single unit. In
this paper, we will present new algorithms which allow for an
efficient extraction. Even though, the problem itself is worst
case exponential, our algorithms show a good average-case
behavior. The approach presented here is closely related to
the work by Krishnan et al [Krishnan et al., 1999]. But we
use a modified search tree, that can be constructed as need
arises. Furthermore, we use a different set of pruning rules.
A
B</p>
      <p>This paper is organized as follows: After reviewing some
preliminary notions in Section 2, we present our approach in
Section 3. This is followed by a presentation of some
experimental results in Section 4. Finally we draw some
conclusions and discuss further work in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>PRELIMINARIES</title>
      <p>In this section we will briefly introduce some required
notations from artificial neural networks and propositional rules
and programs. For a more detailed introduction we refer to
[Bishop, 1995; Rojas, 1996] and [Lloyd, 1988] respectively.
2.1</p>
      <sec id="sec-2-1">
        <title>Artificial Neural Networks</title>
        <p>A connectionist system (also called artificial neural network)
is a directed graph of simple computational units (see
Figure 1). Every unit has an activation value which is
propagated along its weighted output connections to other units.
Some distinguished units serve as input units, whose
activation will be set from outside. All other units compute their
activation based on the activation of their predecessor units
and the weights of the connection. In our case, we will deal
with so called bipolar binary threshold units only, i.e. units
whose activation is either +1 or − 1. A unit will be active
(+1) if its current input exceeds its threshold, and inactive
otherwise. Some of the units serve as output units. In
Figure 1, those are marked with little outgoing arrows (G and
H). Throughout this paper, we will use wAB to refer to the
weight of the connection from unit A to unit B and assume
the weight to be 0, if there is no such connection.
In this paper we will show how to extract propositional if-then
rules from a neural network. These rules consist of a
precondition and a consequence. We will consider rules with atomic
consequences only, i.e. the consequence of a rule is a
propositional variable. Furthermore, we will restrict the
preconditions to be conjunction of (possibly negated) propositional
variables only. We will treat rules with empty preconditions
as facts, i.e. the consequence is assumed to be true without
any condition. As propositional logic programs are just sets
of those rules, we will use some notations customary in the
logic programming community, as exemplified in Example 2.
Example 2. Here is a simple propositional logic program,
i.e. a set of propositional if-then rules, which will serve as a
running example. The intended meaning of the rules is given
on the right.</p>
        <p>P2 = {G ←</p>
        <p>G ←
H.}</p>
        <p>A¯ ∧ B.</p>
        <p>A ∧ B¯.</p>
        <p>% G is true, if A is false and B is true
% G is true, if A is true and B is false
% H is always true
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>THE COOP APPROACH</title>
      <p>In this section, we will describe a new decompositional
approach for the extraction of propositional rules from
feedforward neural networks of bipolar binary threshold units.
First, we will show how to decompose a feed-forward
network and introduce the required notations. In Section 3.2,
we will show how to extract rules from a single perceptron.
Afterwards, those intermediate results are composed.
3.1</p>
      <sec id="sec-3-1">
        <title>Decomposition</title>
        <p>As mentioned above, we will decompose the network into
simple perceptrons, i.e. a single unit together with its
incoming connections. To simplify the notions below, we introduce
the positive and negative form of a perceptron.</p>
        <p>Definition 3 (Perceptron). A network consisting of a set of
input units connected to a single output unit A is called a
perceptron (denoted PA).</p>
        <p>Definition 4 (Positive and negative form). The positive
form PA+ of a given perceptron PA is obtained by multiplying
all negative weights by − 1 and negating the corresponding
D
E
F
input symbols. The negative form PA− is obtained by
multiplying all positive weights by − 1 and negating the
corresponding inputs.</p>
        <p>Example 5. Figure 3 depicts two simple perceptrons, which
can be seen as parts of the network shown in Figure 1. The
positive and negative forms are depicted in Figure 4. For PG
we have PG+ = PG, as there are no negative weights.</p>
        <p>In the sequel, we will often need to clamp some of the input
units U of a given perceptron to be active. This will be done
by input patterns. The intended meaning is, that all units
occurring in an input pattern I ⊆ U are considered to be
active while the states of the non-included input units is not
fixed. I.e. remaining units in U \ I might be active or inactive.
Therefore, an input pattern defines an upper and lower bound
on the possible input of the perceptron.</p>
        <p>Definition 6 (Input pattern). Let PA be a perceptron and U
be the set of input units. A subset I ⊆ U is called an input
pattern. The minimal and maximal input wrt. the input pattern
I are defined as imin(I) = Pi∈I wiA − Pu∈U\I |wuA| and
imax(I) = Pi∈I wiA + Pu∈U\I |wuA|, respectively.
Example 7. For PG+ from Figure 4 and I = {C, D} we have
imin(I) = 1.0 + 2.0 − 3.0 − 5.0 = − 5.0 and imax(I) =
1.0 + 2.0 + 3.0 + 5.0 = 11.0.</p>
        <p>Next, we will introduce coalitions and oppositions as
special types of input patterns. While coalitions can be seen as
conditions to make a perceptron active, opposition prevent a
perceptron from getting active. If some input patterns is a
coalition then the perceptron will be active, independent of
the state of all non-clamped input units. If it is an opposition,
the perceptron will always by inactive.</p>
        <p>Definition 8 (Coalition). Let PA be a perceptron with
threshold θ , I be some input pattern for PA and imin(I) be
the corresponding minimal input. I is called a coalition, if
imin(I) ≥ θ . A coalition I is called minimal, if none of its
subsets I0 ⊂ I is a coalition.</p>
        <p>Definition 9 (Opposition). Let PA be a perceptron with
threshold θ , I be some input pattern for PA and imax(I) be
the corresponding maximal input. I is called an opposition,
if imax(I) &lt; θ . An opposition I is called minimal, if none of
its subset I0 ⊂ I is an opposition.</p>
        <p>Example 10. For PG+ from Figure 4, we find I = {C, D, F }
to be a coalition, as imin(I) = 1.0 + 2.0 + 5.0 − 3.0 =
5.0 &gt; 4.0. For PG− , we find J = { F¯ } to be an opposition, as
imax(J ) = 1.0 + 2.0 + 3.0 − 5.0 = 1.0 &lt; 4.0.</p>
        <p>The set of coalitions and the set of oppositions can be used
to describe the behavior of a perceptron. Furthermore, it is
sufficient to consider the set of minimal coalitions and
minimal oppositions respectively, which are uniquely determined.
Those are the results of the extraction algorithm presented in
the next section.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Extracting Coalitions and Oppositions</title>
        <p>Here, we will show how to construct the set of minimal
coalitions for a given perceptron. To keep notions and algorithms
simple, we will first consider positive perceptrons only. At
the end of the section we will show how to extract the set of
minimal oppositions from a negative perceptron, and
furthermore, how to apply the algorithms to arbitrary perceptrons.
Before presenting the algorithms, we will try to convey some
underlying intuitions. For positive perceptrons with inputs
from {− 1, +1} only, we observe the following:
1. The empty input pattern (no unit needs to be active)
generates the smallest minimal input.
2. The full input pattern (all inputs are active) generates the
biggest minimal input.
3. If an input pattern is a coalition, all supersets are
coalitions as well.</p>
        <p>Starting from the empty input pattern (observation 1),
input symbols are added according to their weights, such that
inputs with larger weights are added first. If all inputs are
added, but no coalition is found, we can conclude that there
is none (observation 2). As soon as a coalition is found, all
supersets are known to be coalitions as well (observation 3)
and the algorithm can continue with adding the next-smaller
input instead. Algorithm 1 constructs a search tree used to
guide the extraction. Each node of the tree represents the
input pattern containing all symbols on the path to the root.
Example 11. For the perceptron PG+ shown in Figure 3, we
have wCG = 1.0, wDG = 2.0, wEG = 3.0 and wF G = 5.0,
therefore, we determine the order F E D C.
Applying Algorithm 1, we obtain the search tree shown in Figure 5
on the left. For PH+, we have w D¯G = 3.0 and wF G = 2.0,
therefore, we determine the order D¯ F and obtain the tree
shown in Figure 5 on the right.</p>
        <p>As mentioned above, while looking for a coalition we will
generate input patterns by adding symbols according to their
Input: A positive perceptron PA+.</p>
        <p>Output: A coalition search tree suitable for Alg. 2.
1 Fix an order on the input units such that: if</p>
        <p>wBA ≥ wCA then B C.
2 Create a root node (representing the empty pattern).
3 Add a child labeled X for each input symbol X (sorted
left to right descending wrt. ).
4 foreach new child labeled Y do
5 add a new sub-child for every symbol Z with</p>
        <p>Y Z (descending sorted wrt. ).</p>
        <p>Algorithm 1: Construction of the coalition search tree.
weights. This can be done by left-depth-first search using
the tree just constructed. The following two rules are used to
prune the tree and hence the search space:
1. If the minimal input of a node is above threshold, cut all
children.
2. If the minimal inputs of a node and all its descendants
are below the threshold, cut all right siblings.</p>
        <p>Rule 1 reflects Observation 3 from above, because child
nodes represent supersets of the corresponding input pattern.
If a certain node and none of its children is a coalition, we
can cut all right-hand siblings, as their minimal inputs will
be even smaller. This follows from the order of the symbols
used while constructing the tree. The complete extraction of
the smallest set of minimal coalitions is given as Algorithm 2.</p>
        <p>Example 12. For the perceptrons PG+ and PH+ shown in
Figure 4, Algorithm 2 returns CG = {{E, F }, {C, D, F }} and
CH = {{ D¯}, {F }}.</p>
        <p>Even though the worst case complexity is exponential1, the
algorithm performs reasonably well, as demonstrated in
Section 4. This follows from the effectiveness of the pruning
rules, and as a consequence, from the fact that the search tree
does not need to be computed completely.</p>
        <p>While using +1 and − 1 as activation values and the
positive form for the extraction of coalitions, we find that the
extraction of opposition is “dual” while working on the negative
form of the perceptron. Therefore, we will list the differences
only:
• For oppositions negative perceptrons are used as inputs.
• In Algorithm 1, the order must be reversed, i.e. if wBA ≤
wCA then B C.
• In Algorithm 2, instead of computing the minimal
input, we would compute the maximal input and check
whether it is below the threshold.</p>
        <p>Example 13. Applying the modified algorithm to the
perceptron PD (i.e. unit D from Figure 1 with its incoming
connections) yields OD = {{A}, { B¯}}.</p>
        <p>We used the positive form of a perceptron to extract
coalitions and the negative form to extract oppositions. For the
sequel we will understand a negated input symbol occurring
in some input pattern to clamp the corresponding input unit
as inactive. Note that thus an input pattern can be used for the
1Assume a perceptron with n equal weights and with a threshold
of 0. Then there are `dnn/2e´ coalitions.</p>
        <sec id="sec-3-2-1">
          <title>Unit Minimal Coalitions Minimal Oppositions C</title>
          <p>D
E
F
G
H
{{A}, {B}}
{{ A¯, B}}
{{A, B¯}}
{∅}
{{E, F }, {C, D, F }}
{{ D¯}, {F }}
{{ A¯, B¯}}
{{A}, { B¯}}
{{ A¯}, {B}}
∅
not needed
not needed
original perceptron as well as for the positive and negative
form. This allows us to apply the algorithms to arbitrary
perceptrons2. Table 1 shows all intermediate extraction results.
In this section, we will show how to compose the intermediate
results to obtain a description of the output unit’s behavior
wrt. the input units.</p>
          <p>The intended meaning of a set of coalitions like CG =
{{E, F }, {C, D, F }} is, that ”E and F ”, or ”C, D and F ”
should be active in order to make neuron G active, this can be
represented as the propositional formula ((E ∧ F ) ∨ (C ∧ D ∧
F )). We will refer to the propositional formula obtained from
a set of coalitions CF as pf(CF ). If there is no coalition for a
given perceptron PF , i.e. CF = ∅, we can conclude that there
is no input such that F will be active, hence pf(CF ) = false.
In contrast, for CF = {∅}, we can conclude that F will always
be active, hence pf(CF ) = true. Analogously, the intended
meaning of a set of oppositions like OD = {{A}, { B¯}} is,
that whenever A is active or B is inactive, the neuron D will
be inactive. This can be represented as (A ∨ B¯). Again, we
will refer to the corresponding formula as pf(OF ).</p>
          <p>Algorithm 3 takes a feed-forward network and one output
unit A and returns a propositional formula describing A’s
behavior with respect to the network’s input units. It will create
and manipulate a propositional formula F , which finally can
be rewritten as a logic program.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Input: A network N and an output unit A.</title>
          <p>Output: A formula describing A’s behavior.
1 Initialize F as F = pf(CA).
2 while there occur symbols in F referring to non-input
units of N do
3 Pick one occurrence of a (possibly negated)
non-input symbol B.
4 if B is negated then Replace B¯ with pf(OB) else</p>
          <p>Replace B with pf(CB).
5 Return F .</p>
          <p>Algorithm 3: Extracting one unit of a given network
2As mentioned above, the positive and negative forms were
introduced to keep notions and algorithms simple. They will actually
never be constructed in a real implementation. Instead, the
algorithms could be modified by adding case distinctions.</p>
          <p>H = ( D¯ ∨ F )
= (A ∧ B¯) ∨ ( A¯ ∧ B)
= ((A ∨ B¯) ∨ true)
= true
Example 14. Applying Algorithm 3 to the network from
Figure 1 we obtain3:</p>
          <p>G = (E ∧ F ) ∨ (C ∧ D ∧ F )</p>
          <p>= ((A ∧ B¯) ∧ true) ∨ ((A ∨ B) ∧ ( A¯ ∧ B) ∧ true)
Note that the formulae G = (A ∧ B¯)∨( A¯ ∧B) and H = true
could also be represented as program P2 from Example 2.</p>
          <p>The non-determinism introduced in line 3 of Algorithm 3
is a don’t-care non-determinism, i.e. we are free to choose
any symbol without changing the result. But an “informed
heuristic” could speed up the extraction. In Example 14 we
applied the usual laws of propositional logic after applying
Algorithm 3. In fact, those rules can also be applied before,
i.e. directly after replacing a symbol with the corresponding
coalition or opposition.</p>
          <p>Proposition 15. Algorithm 3 is sound and complete.</p>
          <p>I.e. for a given feed-forward network and a given output
unit A, the algorithm always returns a correct formula
describing A’s behavior wrt. to the network’s input units. The
proposition follows from the fact that the network contains no
cycles and from the correctness of the laws of propositional
logic.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>DISCUSSION</title>
      <p>In this section, we will briefly discuss some related work, the
extension of the approach to non-binary units and report on
some preliminary experimental results.
4.1</p>
      <sec id="sec-4-1">
        <title>Related Work</title>
        <p>As mentioned in the introduction, our approach is closely
related to the COMBO algorithm introduced in [Krishnan et al.,
1999]. But, in contrast to that approach, the search tree can be
built incrementally. Each level of the combination trees
constructed for the COMBO algorithm needs to be sorted wrt.
the minimal input. This involves the construction, evaluation
and sorting of possibly exponentially many nodes of the tree,
even though they might be cut of.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Extension to Non-Binary Units</title>
        <p>We are currently investigating the applicability of the
COOPapproach to non-binary units. A first approach, which was
taken in the experiments described below, employes bipolar
sigmoidal (tanh) units instead of binary ones. Consequently,
the network becomes trainable by standard learning
methods, like back-propagation. After some iterations, all weights
were multiplied by 2, yielding steeper activation functions.
We stopped the training process whenever the error made by
3Note, that several replacements were done in one line and
parentheses were omitted if unnecessary. The last lines were obtained by
applying the usual laws of propositional logic.
the network did not increase significantly after multiplying
the weights. In this case units behave like binary ones, i.e.
small activation values do not occur any more, only values
close to − 1 and +1. Therefore, we can treat those units as
binary threshold units and apply our algorithms directly.</p>
        <p>As a second approach, we try to adapt the following idea
from [d’Avila Garcez et al., 2001]. A unit is considered
active (inactive), whenever its activation value is above (below)
some threshold amin (amax). For a given amin, we can
compute a necessary minimal input imin. For a given perceptron
with threshold θ , we could now apply our algorithm, by
using θ − imin as threshold and instead of using +1 and − 1 as
activation vales, we use amin and amax. This allows to apply
the algorithms described above. We believe, that soundness
will be preserved, but whether this holds for completeness as
well will be investigated in the future.
To evaluate the average runtime behavior of the algorithm,
we generated random perceptrons for which we computed
the number of visited nodes wrt. the number of input
symbols. This, together with the total number of nodes in the
tree, is depicted in Figure 6 on the left.4 The plot shows that
only a small fraction of nodes is visited. Nevertheless, the
number of visited nodes seems to grow exponentially. This
is not surprising as the number of minimal coalitions grows
exponentially as well.</p>
        <p>Furthermore, we computed the ratio of visited nodes and
coalitions found – again wrt. the number of input symbols.
The results are shown in Figure 6 on the right. This test
indicates, that the number of visited nodes wrt. found coalitions
seems to increase for a larger number of input symbols. For
the variant “coop2”, this ratio seems to stabilize around 4, i.e.
the algorithm needs to visit 4 times as many nodes as it finds
coalitions. In this variant we used some more techniques to
improve the pruning rules, which are beyond the scope of this
paper. E.g., we cached the minimal input values necessary
before entering a node in the tree. If this input is not reached,
the subtree can be pruned. Furthermore, we tried to identify
equivalent sub trees.</p>
        <p>4Instead of measuring the time, we used the number of nodes,
because we used a very preliminary implementation in Prolog only.
P1 = {cl ←
cl ←
cl ←
cl ←
P2 = {cl ←</p>
        <p>cl ←
P3 = {cl ←
cl ←
e1.}
d1 ∧ e3
b¯3 ∧ e¯4}
The monks problems as described in [Thrun et al., 1991], are
learning problems where robots are described by the
following six attributes:
• head shape is {round (a1), square (a2), octagon (a3)},
• body shape is {round (b1), square (b2), octagon (b3)},
• is smiling is {yes (c1), no (c2)},
• holding is {sword (d1), balloon (d2), flag (d3)},
• colour is {red (e1), yellow (e2), green (e3), blue (e4)},
• has tie is {yes (f1), no (f2)}.</p>
        <p>The following three classifications are to be learned:
1. head shape = body shape or the colour is red
2. exactly two of the six attributes take their first value
3. colour = green and holding = sword, or the colour 6=
blue and body shape 6= octagon
We used bipolar sigmoidal networks with 17 input units
(labeled a1, . . . , f2) a single output unit (labeled cl) and either
1 (problem 1) or 2 (problem 2 and 3) hidden units.
Furthermore, we allowed shortcut connections from the input to the
output layer. These architectures were chosen to minimize
the size of the networks. We used a single train-test set,
containing all available examples. After training the networks
and multiplying the weights as described above, we applied
the COOP algorithm to extract the single perceptrons.
Afterwards, the results were composed as described above and
further refined using the integrity constraints resulting from the
encoding (i.e., e1 and e2 can not be active simultaneously).
Finally, we obtained the following logic programs:
complete, i.e. every rule extracted from the network is
correct and all contained rules are extracted. Even though our
running example is a 3 layered feedforward network, the
approach is not limited to layered architectures, but rather to
cycle-free networks.</p>
        <p>For the extraction algorithm of a single perceptron
(Section 3.2) we will further investigate, whether ideas
underlying the “M-of-N” approach by Towel and Shavlik [Towell and
Shavlik, 1993] can help to speed up the system. Furthermore,
we will try to develop some dedicated “informed heuristics”,
as mentioned in Section 3.3, to guide the extraction on the
level of whole networks. Another candidate for further
improvement is the caching of intermediate results while
composing the coalitions and oppositions.</p>
        <p>First experiments, presented in Section 4, indicate that
our approach shows a good average-complexity, but a
detailed analysis needs to be done in the future. Furthermore,
we would like to evaluate our algorithm and compare it to
other approaches using benchmark problems, like the
Monksproblem or problems from molecular biology as described in
[d’Avila Garcez et al., 2001].</p>
      </sec>
      <sec id="sec-4-3">
        <title>Acknowledgments</title>
        <p>We would like to thank two anonymous referees for their
valuable comments on a preliminary version of this paper.
Sebastian Bader is supported by the GK334 of the German
Research Foundation (DFG).
a1 ∧ b1 ∧ c¯1 ∧ d¯1 ∧ e¯1 ∧ f¯1
cl ← a1 ∧ b¯1 ∧ c1 ∧ d¯1 ∧ e¯1 ∧ f¯1</p>
        <p>. . . 11 clauses more
cl ← a¯1 ∧ b¯1 ∧ c¯1 ∧ d1 ∧ e¯1 ∧ f1</p>
        <p>a¯1 ∧ b¯1 ∧ c¯1 ∧ d¯1 ∧ e1 ∧ f1}</p>
        <p>The programs describe the required classifications. E.g.
program P1 encodes: head shape = body shape (a1 ∧ b1 or
a2 ∧ b2 or a3 ∧ b3) or the colour is red (e1). This shows, that
the COOP-approach is able to extract meaningful rules from a
trained neural network, even though this is just a preliminary
experiment on some artificial domain.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS</title>
      <p>In this paper, we presented a new decompositional approach
to extract propositional if-then rules from a feed-forward
network of binary threshold units. Our approach is sound and</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Bishop</source>
          , 1995]
          <string-name>
            <surname>Christopher</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bishop</surname>
          </string-name>
          .
          <article-title>Neural Networks for Pattern Recognition</article-title>
          . Oxford University Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>[d'Avila Garcez</surname>
          </string-name>
          et al.,
          <year>2001</year>
          ] Artur S.
          <string-name>
            <surname>d'Avila Garcez</surname>
          </string-name>
          , Krysia Broda, and
          <string-name>
            <surname>Dov</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gabbay</surname>
          </string-name>
          .
          <article-title>Symbolic knowledge extraction from trained neural networks: A sound approach</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>125</volume>
          :
          <fpage>155</fpage>
          -
          <lpage>207</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Krishnan et al.,
          <year>1999</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          , G. Sivakumar, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          .
          <article-title>A search technique for rule extraction from trained neural networks</article-title>
          .
          <source>Non-Linear Anal.</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>273</fpage>
          -
          <lpage>280</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Lloyd</source>
          ,
          <year>1988</year>
          ] John W. Lloyd.
          <source>Foundations of Logic Programming</source>
          . Springer, Berlin,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Rojas</source>
          , 1996]
          <string-name>
            <given-names>Raul</given-names>
            <surname>Rojas</surname>
          </string-name>
          .
          <source>Neural Networks</source>
          . Springer,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>[Thrun</surname>
          </string-name>
          et al.,
          <year>1991</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          et al.
          <article-title>The MONK's problems: A performance comparison of different learning algorithms</article-title>
          .
          <source>Technical Report CMU-CS-91-197</source>
          , Carnegie Mellon University, Computer Science Department, Pittsburgh, PA,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Tickle et al.,
          <year>1998</year>
          ] Alan.
          <string-name>
            <given-names>B.</given-names>
            <surname>Tickle</surname>
          </string-name>
          , Robert Andrews, Mostefa Golea, and
          <string-name>
            <given-names>Joachim</given-names>
            <surname>Diederich</surname>
          </string-name>
          .
          <article-title>The truth will come to light: directions and challenges in extracting the knowledge embedded within mined artificial neural networks</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,,
          <volume>9</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1057</fpage>
          -
          <lpage>1068</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Towell and Shavlik</source>
          , 1993] Geoffrey G. Towell and
          <string-name>
            <given-names>Jude W.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          .
          <article-title>Extracting refined rules from knowledge-based neural networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>13</volume>
          :
          <fpage>71</fpage>
          -
          <lpage>101</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>