Extracting Propositional Rules from Feed-forward Neural Networks — A New Decompositional Approach Sebastian Bader and Steffen Hölldobler and Valentin Mayer-Eichberger International Center for Computational Logic Technische Universität Dresden, Germany Abstract C 0 In this paper, we present a new decomposi- 1.0 1.0 tional approach for the extraction of propositional A D G -2.0 2.0 4 rules from feed-forward neural networks of binary 5.0 2 3.0 threshold units. After decomposing the network 2.0 5.0 into single units, we show how to extract rules de- E 1.0 scribing a unit’s behavior. This is done using a suit- 1.0 4 -3.0 B H able search tree which allows the pruning of the -3.0 -2 search space. Furthermore, we present some ex- 1.0 F 2.0 perimental results, showing a good average runtime -4 behavior of the approach. Figure 1: A simple 3 layer feed-forward connectionist sys- 1 INTRODUCTION AND MOTIVATION tem. Depicted are the thresholds of the nodes (inside the As the knowledge stored in a neural network is hidden in its circles) and the weights of the connections. The rectangular weight, humans have no direct access to it. The goal of rule nodes (A and B) serve as input units. extraction is to obtain a human readable description of the output units behavior with respect to the input units. This is This paper is organized as follows: After reviewing some usually done using if-then rules describing under which con- preliminary notions in Section 2, we present our approach in ditions the output units will be active. Throughout this paper, Section 3. This is followed by a presentation of some exper- we will use networks of bipolar binary threshold units and imental results in Section 4. Finally we draw some conclu- show how to extract propositional if-then rules. sions and discuss further work in Section 5. Rule extraction attempts can be divided into pedagogical and decompositional approaches. The first conceives the net- 2 PRELIMINARIES work as a black box, while the latter decomposes the network, constructs intermediate rules and recomposes those. For a In this section we will briefly introduce some required nota- general introduction into the field we refer to [Tickle et al., tions from artificial neural networks and propositional rules 1998]. and programs. For a more detailed introduction we refer to [Bishop, 1995; Rojas, 1996] and [Lloyd, 1988] respectively. Example 1. Figure 1 and 2 show a simple network and the results of a naive pedagogical extraction. All possible inputs 2.1 Artificial Neural Networks are presented to the network and the network is evaluated. A connectionist system (also called artificial neural network) For each active output unit, a rule is constructed such that its is a directed graph of simple computational units (see Fig- precondition matches the current input. ure 1). Every unit has an activation value which is prop- Obviously, the naive pedagogical approach presented in agated along its weighted output connections to other units. Example 1 has an exponential complexity, as there are ex- Some distinguished units serve as input units, whose activa- ponentially many different inputs, even for a single unit. In tion will be set from outside. All other units compute their this paper, we will present new algorithms which allow for an activation based on the activation of their predecessor units efficient extraction. Even though, the problem itself is worst and the weights of the connection. In our case, we will deal case exponential, our algorithms show a good average-case with so called bipolar binary threshold units only, i.e. units behavior. The approach presented here is closely related to whose activation is either +1 or −1. A unit will be active the work by Krishnan et al [Krishnan et al., 1999]. But we (+1) if its current input exceeds its threshold, and inactive use a modified search tree, that can be constructed as need otherwise. Some of the units serve as output units. In Fig- arises. Furthermore, we use a different set of pruning rules. ure 1, those are marked with little outgoing arrows (G and A B G H C P1 = { H ← Ā ∧ B̄. -1 -1 -1 +1 1.0 -1 +1 +1 +1 G ← Ā ∧ B. H ← Ā ∧ B. D D 2.0 -3.0 +1 -1 +1 +1 G ← A ∧ B̄. H ← A ∧ B̄. 3.0 4 G 2.0 -2 H +1 +1 -1 +1 E F H ← A ∧ B.} 5.0 Figure 2: A naive pedagogical extraction of the network F shown in Figure 1. The table shows all possible inputs to- gether with the corresponding outputs. The logic program Figure 3: Two simple perceptrons PG and PH corresponds exactly to the 1’s of the output patterns. C C̄ H). Throughout this paper, we will use wAB to refer to the 1.0 -1.0 D D̄ weight of the connection from unit A to unit B and assume 2.0 -2.0 G G the weight to be 0, if there is no such connection. 3.0 4 -3.0 4 E Ē 5.0 -5.0 2.2 Propositional Rules and Logic Programs F F̄ In this paper we will show how to extract propositional if-then rules from a neural network. These rules consist of a precon- dition and a consequence. We will consider rules with atomic D̄ D 3.0 -3.0 consequences only, i.e. the consequence of a rule is a propo- H H 2.0 -2 -2.0 -2 sitional variable. Furthermore, we will restrict the precon- F F̄ ditions to be conjunction of (possibly negated) propositional variables only. We will treat rules with empty preconditions + − + − as facts, i.e. the consequence is assumed to be true without Figure 4: From left to right: PG , PG , PH and PH any condition. As propositional logic programs are just sets of those rules, we will use some notations customary in the input symbols. The negative form PA − is obtained by mul- logic programming community, as exemplified in Example 2. tiplying all positive weights by −1 and negating the corre- Example 2. Here is a simple propositional logic program, sponding inputs. i.e. a set of propositional if-then rules, which will serve as a Example 5. Figure 3 depicts two simple perceptrons, which running example. The intended meaning of the rules is given can be seen as parts of the network shown in Figure 1. The on the right. positive and negative forms are depicted in Figure 4. For PG + P2 = {G ← Ā ∧ B. % G is true, if A is false and B is true we have PG = PG , as there are no negative weights. G ← A ∧ B̄. % G is true, if A is true and B is false In the sequel, we will often need to clamp some of the input H.} % H is always true units U of a given perceptron to be active. This will be done by input patterns. The intended meaning is, that all units occurring in an input pattern I ⊆ U are considered to be 3 THE COOP APPROACH active while the states of the non-included input units is not In this section, we will describe a new decompositional ap- fixed. I.e. remaining units in U \ I might be active or inactive. proach for the extraction of propositional rules from feed- Therefore, an input pattern defines an upper and lower bound forward neural networks of bipolar binary threshold units. on the possible input of the perceptron. First, we will show how to decompose a feed-forward net- Definition 6 (Input pattern). Let PA be a perceptron and U work and introduce the required notations. In Section 3.2, be the set of input units. A subset I ⊆ U is called an input we will show how to extract rules from a single perceptron. pattern. The minimal and maximal input wrt. P the input pattern Afterwards, those intermediate results are composed. P I are defined as imin (I) = i∈I wiA − u∈U \I |wuA | and P P 3.1 Decomposition imax (I) = i∈I wiA + u∈U \I |wuA |, respectively. As mentioned above, we will decompose the network into Example 7. For PG + from Figure 4 and I = {C, D} we have simple perceptrons, i.e. a single unit together with its incom- imin (I) = 1.0 + 2.0 − 3.0 − 5.0 = −5.0 and imax (I) = ing connections. To simplify the notions below, we introduce 1.0 + 2.0 + 3.0 + 5.0 = 11.0. the positive and negative form of a perceptron. Next, we will introduce coalitions and oppositions as spe- Definition 3 (Perceptron). A network consisting of a set of cial types of input patterns. While coalitions can be seen as input units connected to a single output unit A is called a conditions to make a perceptron active, opposition prevent a perceptron (denoted PA ). perceptron from getting active. If some input patterns is a Definition 4 (Positive and negative form). The positive coalition then the perceptron will be active, independent of + form PA of a given perceptron PA is obtained by multiplying the state of all non-clamped input units. If it is an opposition, all negative weights by −1 and negating the corresponding the perceptron will always by inactive. Definition 8 (Coalition). Let PA be a perceptron with + Input: A positive perceptron PA . threshold θ, I be some input pattern for PA and imin (I) be Output: A coalition search tree suitable for Alg. 2. the corresponding minimal input. I is called a coalition, if imin (I) ≥ θ. A coalition I is called minimal, if none of its 1 Fix an order  on the input units such that: if subsets I 0 ⊂ I is a coalition. wBA ≥ wCA then B  C. 2 Create a root node (representing the empty pattern). Definition 9 (Opposition). Let PA be a perceptron with 3 Add a child labeled X for each input symbol X (sorted threshold θ, I be some input pattern for PA and imax (I) be left to right descending wrt. ). the corresponding maximal input. I is called an opposition, 4 foreach new child labeled Y do if imax (I) < θ. An opposition I is called minimal, if none of 5 add a new sub-child for every symbol Z with its subset I 0 ⊂ I is an opposition. + Y  Z (descending sorted wrt. ). Example 10. For PG from Figure 4, we find I = {C, D, F } to be a coalition, as imin (I) = 1.0 + 2.0 + 5.0 − 3.0 = Algorithm 1: Construction of the coalition search tree. − 5.0 > 4.0. For PG , we find J = {F̄ } to be an opposition, as imax (J) = 1.0 + 2.0 + 3.0 − 5.0 = 1.0 < 4.0. weights. This can be done by left-depth-first search using The set of coalitions and the set of oppositions can be used the tree just constructed. The following two rules are used to to describe the behavior of a perceptron. Furthermore, it is prune the tree and hence the search space: sufficient to consider the set of minimal coalitions and mini- mal oppositions respectively, which are uniquely determined. 1. If the minimal input of a node is above threshold, cut all Those are the results of the extraction algorithm presented in children. the next section. 2. If the minimal inputs of a node and all its descendants are below the threshold, cut all right siblings. 3.2 Extracting Coalitions and Oppositions Here, we will show how to construct the set of minimal coali- Rule 1 reflects Observation 3 from above, because child tions for a given perceptron. To keep notions and algorithms nodes represent supersets of the corresponding input pattern. simple, we will first consider positive perceptrons only. At If a certain node and none of its children is a coalition, we the end of the section we will show how to extract the set of can cut all right-hand siblings, as their minimal inputs will minimal oppositions from a negative perceptron, and further- be even smaller. This follows from the order of the symbols more, how to apply the algorithms to arbitrary perceptrons. used while constructing the tree. The complete extraction of Before presenting the algorithms, we will try to convey some the smallest set of minimal coalitions is given as Algorithm 2. underlying intuitions. For positive perceptrons with inputs from {−1, +1} only, we observe the following: + Input: A positive perceptron PA with threshold θA . 1. The empty input pattern (no unit needs to be active) gen- Output: The set of minimal coalitions. erates the smallest minimal input. + 1 Construct the search tree for PA using Algorithm 1. 2. The full input pattern (all inputs are active) generates the 2 Make the root node the current node. biggest minimal input. 3 while there is a current node do 3. If an input pattern is a coalition, all supersets are coali- 4 Compute imin for the current node. tions as well. 5 if imin ≥ θA then 6 Mark the current node as coalition and cut all Starting from the empty input pattern (observation 1), in- children (Pruning rule 1). put symbols are added according to their weights, such that 7 else inputs with larger weights are added first. If all inputs are 8 if the current node has no child then added, but no coalition is found, we can conclude that there 9 while the parent of the current node is the is none (observation 2). As soon as a coalition is found, all direct predecessor do supersets are known to be coalitions as well (observation 3) 10 Cut all unvisited siblings of the parent. and the algorithm can continue with adding the next-smaller 11 Use parent as current node. input instead. Algorithm 1 constructs a search tree used to guide the extraction. Each node of the tree represents the in- 12 Make the next unvisited node (left-depth-first) the put pattern containing all symbols on the path to the root. + current node. Example 11. For the perceptron PG shown in Figure 3, we 13 if there is no unvisited node then stop. have wCG = 1.0, wDG = 2.0, wEG = 3.0 and wF G = 5.0, 14 Return the set of coalitions corresponding to the therefore, we determine the order F  E  D  C. Apply- marked nodes. ing Algorithm 1, we obtain the search tree shown in Figure 5 on the left. For PH + , we have wD̄G = 3.0 and wF G = 2.0, Algorithm 2: Constructing minimal coalitions. therefore, we determine the order D̄  F and obtain the tree shown in Figure 5 on the right. + Example 12. For the perceptrons PG and PH + shown in Fig- As mentioned above, while looking for a coalition we will ure 4, Algorithm 2 returns CG = {{E, F }, {C, D, F }} and generate input patterns by adding symbols according to their CH = {{D̄}, {F }}. Unit Minimal Coalitions Minimal Oppositions · -11 C {{A}, {B}} {{Ā, B̄}} -1 -5 F E D C D {{Ā, B}} {{A}, {B̄}} 5 3 0 -1 E {{A, B̄}} {{Ā}, {B}} E D C D C C {F,E} F {∅} ∅ D C C 5 C 1 G {{E, F }, {C, D, F }} not needed {F,D,C} H {{D̄}, {F }} not needed C (a) Search tree for PG Table 1: Minimal coalitions and oppositions for the network · -5 from Figure 1. 1 -1 D̄ {D̄} F {F } original perceptron as well as for the positive and negative form. This allows us to apply the algorithms to arbitrary per- F ceptrons2 . Table 1 shows all intermediate extraction results. (b) Search tree for PH 3.3 Composition of the Intermediate Results In this section, we will show how to compose the intermediate results to obtain a description of the output unit’s behavior Figure 5: The search trees (a) for the extraction of the per- wrt. the input units. ceptrons G (with θG = 4 and F  E  D  C); and (b) The intended meaning of a set of coalitions like CG = for H (with θH = −2 and D̄  F ). (Minimal) coalitions {{E, F }, {C, D, F }} is, that ”E and F ”, or ”C, D and F ” are marked with a gray background (and a thick border) and should be active in order to make neuron G active, this can be unvisited nodes are shaded. Every node corresponds to the represented as the propositional formula ((E ∧ F ) ∨ (C ∧ D ∧ input pattern containing all symbols on the path to the root, F )). We will refer to the propositional formula obtained from as exemplified for the two minimal coalitions. The numbers a set of coalitions CF as pf(CF ). If there is no coalition for a denote the corresponding minimal input. The triangular lines given perceptron PF , i.e. CF = ∅, we can conclude that there show the path taken by Algorithm 2, is no input such that F will be active, hence pf(CF ) = false. In contrast, for CF = {∅}, we can conclude that F will always Even though the worst case complexity is exponential1 , the be active, hence pf(CF ) = true. Analogously, the intended algorithm performs reasonably well, as demonstrated in Sec- meaning of a set of oppositions like OD = {{A}, {B̄}} is, tion 4. This follows from the effectiveness of the pruning that whenever A is active or B is inactive, the neuron D will rules, and as a consequence, from the fact that the search tree be inactive. This can be represented as (A ∨ B̄). Again, we does not need to be computed completely. will refer to the corresponding formula as pf(OF ). While using +1 and −1 as activation values and the posi- Algorithm 3 takes a feed-forward network and one output tive form for the extraction of coalitions, we find that the ex- unit A and returns a propositional formula describing A’s be- traction of opposition is “dual” while working on the negative havior with respect to the network’s input units. It will create form of the perceptron. Therefore, we will list the differences and manipulate a propositional formula F, which finally can only: be rewritten as a logic program. • For oppositions negative perceptrons are used as inputs. Input: A network N and an output unit A. • In Algorithm 1, the order must be reversed, i.e. if wBA ≤ Output: A formula describing A’s behavior. wCA then B  C. 1 Initialize F as F = pf(CA ). • In Algorithm 2, instead of computing the minimal in- 2 while there occur symbols in F referring to non-input put, we would compute the maximal input and check units of N do whether it is below the threshold. 3 Pick one occurrence of a (possibly negated) Example 13. Applying the modified algorithm to the percep- non-input symbol B. tron PD (i.e. unit D from Figure 1 with its incoming connec- 4 if B is negated then Replace B̄ with pf(OB ) else tions) yields OD = {{A}, {B̄}}. Replace B with pf(CB ). We used the positive form of a perceptron to extract coali- 5 Return F. tions and the negative form to extract oppositions. For the Algorithm 3: Extracting one unit of a given network sequel we will understand a negated input symbol occurring in some input pattern to clamp the corresponding input unit as inactive. Note that thus an input pattern can be used for the 2 As mentioned above, the positive and negative forms were in- troduced to keep notions and algorithms simple. They will actually 1 Assume a perceptron ´ n equal weights and with a threshold ` n with never be constructed in a real implementation. Instead, the algo- of 0. Then there are dn/2e coalitions. rithms could be modified by adding case distinctions. 600000 10 Example 14. Applying Algorithm 3 to the network from Fig- # total # visited 9 coop coop2 ure 1 we obtain3 : 500000 8 # visited / # coalitions # nodes in the tree 400000 7 G = (E ∧ F ) ∨ (C ∧ D ∧ F ) 300000 6 5 200000 = ((A ∧ B̄) ∧ true) ∨ ((A ∨ B) ∧ (Ā ∧ B) ∧ true) 4 100000 3   = (A ∧ B̄) ∨ (Ā ∧ B) 0 2 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 H = (D̄ ∨ F ) # input symbols # input symbols = ((A ∨ B̄) ∨ true)   Figure 6: The plot on the left shows the total number of nodes = true in the tree and the average number of visited nodes wrt. the Note that the formulae G = (A∧ B̄)∨(Ā∧B) and H = true number of input symbols. The plot on the right shows the could also be represented as program P2 from Example 2. ratio of visited nodes and found coalitions. The non-determinism introduced in line 3 of Algorithm 3 is a don’t-care non-determinism, i.e. we are free to choose the network did not increase significantly after multiplying any symbol without changing the result. But an “informed the weights. In this case units behave like binary ones, i.e. heuristic” could speed up the extraction. In Example 14 we small activation values do not occur any more, only values applied the usual laws of propositional logic after applying close to −1 and +1. Therefore, we can treat those units as Algorithm 3. In fact, those rules can also be applied before, binary threshold units and apply our algorithms directly. i.e. directly after replacing a symbol with the corresponding As a second approach, we try to adapt the following idea coalition or opposition. from [d’Avila Garcez et al., 2001]. A unit is considered ac- tive (inactive), whenever its activation value is above (below) Proposition 15. Algorithm 3 is sound and complete. some threshold amin (amax ). For a given amin , we can com- I.e. for a given feed-forward network and a given output pute a necessary minimal input imin . For a given perceptron unit A, the algorithm always returns a correct formula de- with threshold θ, we could now apply our algorithm, by us- scribing A’s behavior wrt. to the network’s input units. The ing θ − imin as threshold and instead of using +1 and −1 as proposition follows from the fact that the network contains no activation vales, we use amin and amax . This allows to apply cycles and from the correctness of the laws of propositional the algorithms described above. We believe, that soundness logic. will be preserved, but whether this holds for completeness as well will be investigated in the future. 4 DISCUSSION 4.3 Some Preliminary Experimental Results In this section, we will briefly discuss some related work, the To evaluate the average runtime behavior of the algorithm, extension of the approach to non-binary units and report on we generated random perceptrons for which we computed some preliminary experimental results. the number of visited nodes wrt. the number of input sym- bols. This, together with the total number of nodes in the 4.1 Related Work tree, is depicted in Figure 6 on the left.4 The plot shows that As mentioned in the introduction, our approach is closely re- only a small fraction of nodes is visited. Nevertheless, the lated to the COMBO algorithm introduced in [Krishnan et al., number of visited nodes seems to grow exponentially. This 1999]. But, in contrast to that approach, the search tree can be is not surprising as the number of minimal coalitions grows built incrementally. Each level of the combination trees con- exponentially as well. structed for the COMBO algorithm needs to be sorted wrt. Furthermore, we computed the ratio of visited nodes and the minimal input. This involves the construction, evaluation coalitions found – again wrt. the number of input symbols. and sorting of possibly exponentially many nodes of the tree, The results are shown in Figure 6 on the right. This test indi- even though they might be cut of. cates, that the number of visited nodes wrt. found coalitions seems to increase for a larger number of input symbols. For 4.2 Extension to Non-Binary Units the variant “coop2”, this ratio seems to stabilize around 4, i.e. We are currently investigating the applicability of the COOP- the algorithm needs to visit 4 times as many nodes as it finds approach to non-binary units. A first approach, which was coalitions. In this variant we used some more techniques to taken in the experiments described below, employes bipolar improve the pruning rules, which are beyond the scope of this sigmoidal (tanh) units instead of binary ones. Consequently, paper. E.g., we cached the minimal input values necessary be- the network becomes trainable by standard learning meth- fore entering a node in the tree. If this input is not reached, ods, like back-propagation. After some iterations, all weights the subtree can be pruned. Furthermore, we tried to identify were multiplied by 2, yielding steeper activation functions. equivalent sub trees. We stopped the training process whenever the error made by 3 Note, that several replacements were done in one line and paren- 4 theses were omitted if unnecessary. The last lines were obtained by Instead of measuring the time, we used the number of nodes, applying the usual laws of propositional logic. because we used a very preliminary implementation in Prolog only. 4.4 The Monks Problem complete, i.e. every rule extracted from the network is cor- The monks problems as described in [Thrun et al., 1991], are rect and all contained rules are extracted. Even though our learning problems where robots are described by the follow- running example is a 3 layered feedforward network, the ap- ing six attributes: proach is not limited to layered architectures, but rather to cycle-free networks. • head shape is {round (a1 ), square (a2 ), octagon (a3 )}, For the extraction algorithm of a single perceptron (Sec- • body shape is {round (b1 ), square (b2 ), octagon (b3 )}, tion 3.2) we will further investigate, whether ideas underly- • is smiling is {yes (c1 ), no (c2 )}, ing the “M-of-N” approach by Towel and Shavlik [Towell and Shavlik, 1993] can help to speed up the system. Furthermore, • holding is {sword (d1 ), balloon (d2 ), flag (d3 )}, we will try to develop some dedicated “informed heuristics”, • colour is {red (e1 ), yellow (e2 ), green (e3 ), blue (e4 )}, as mentioned in Section 3.3, to guide the extraction on the • has tie is {yes (f1 ), no (f2 )}. level of whole networks. Another candidate for further im- provement is the caching of intermediate results while com- The following three classifications are to be learned: posing the coalitions and oppositions. 1. head shape = body shape or the colour is red First experiments, presented in Section 4, indicate that 2. exactly two of the six attributes take their first value our approach shows a good average-complexity, but a de- tailed analysis needs to be done in the future. Furthermore, 3. colour = green and holding = sword, or the colour 6= we would like to evaluate our algorithm and compare it to blue and body shape 6= octagon other approaches using benchmark problems, like the Monks- We used bipolar sigmoidal networks with 17 input units (la- problem or problems from molecular biology as described in beled a1 , . . . , f2 ) a single output unit (labeled cl) and either [d’Avila Garcez et al., 2001]. 1 (problem 1) or 2 (problem 2 and 3) hidden units. Further- Acknowledgments more, we allowed shortcut connections from the input to the We would like to thank two anonymous referees for their output layer. These architectures were chosen to minimize valuable comments on a preliminary version of this paper. the size of the networks. We used a single train-test set, con- Sebastian Bader is supported by the GK334 of the German taining all available examples. After training the networks Research Foundation (DFG). and multiplying the weights as described above, we applied the COOP algorithm to extract the single perceptrons. After- wards, the results were composed as described above and fur- References ther refined using the integrity constraints resulting from the [Bishop, 1995] Christopher M. Bishop. Neural Networks for encoding (i.e., e1 and e2 can not be active simultaneously). Pattern Recognition. Oxford University Press, 1995. Finally, we obtained the following logic programs: [d’Avila Garcez et al., 2001] Artur S. d’Avila Garcez, P1 = {cl ← a1 ∧ b1 . Krysia Broda, and Dov M. Gabbay. Symbolic knowl- edge extraction from trained neural networks: A sound cl ← a2 ∧ b2 . approach. Artificial Intelligence, 125:155–207, 2001. cl ← a3 ∧ b3 . [Krishnan et al., 1999] R. Krishnan, G. Sivakumar, and cl ← e1 .} P. Bhattacharya. A search technique for rule extrac- P2 = {cl ← a1 ∧ b1 ∧ c¯1 ∧ d¯1 ∧ e¯1 ∧ f¯1 tion from trained neural networks. Non-Linear Anal., 20(3):273–280, 1999. cl ← a1 ∧ b¯1 ∧ c1 ∧ d¯1 ∧ e¯1 ∧ f¯1 [Lloyd, 1988] John W. Lloyd. Foundations of Logic Pro- . . . 11 clauses more gramming. Springer, Berlin, 1988. cl ← a¯1 ∧ b¯1 ∧ c¯1 ∧ d1 ∧ e¯1 ∧ f1 [Rojas, 1996] Raul Rojas. Neural Networks. Springer, 1996. cl ← a¯1 ∧ b¯1 ∧ c¯1 ∧ d¯1 ∧ e1 ∧ f1 } [Thrun et al., 1991] S. Thrun et al. The MONK’s prob- P3 = {cl ← d1 ∧ e3 lems: A performance comparison of different learning al- cl ← b¯3 ∧ e¯4 } gorithms. Technical Report CMU-CS-91-197, Carnegie Mellon University, Computer Science Department, Pitts- The programs describe the required classifications. E.g. burgh, PA, 1991. program P1 encodes: head shape = body shape (a1 ∧ b1 or [Tickle et al., 1998] Alan. B. Tickle, Robert Andrews, a2 ∧ b2 or a3 ∧ b3 ) or the colour is red (e1 ). This shows, that Mostefa Golea, and Joachim Diederich. The truth will the COOP-approach is able to extract meaningful rules from a come to light: directions and challenges in extracting trained neural network, even though this is just a preliminary the knowledge embedded within mined artificial neu- experiment on some artificial domain. ral networks. IEEE Transactions on Neural Networks,, 9(6):1057–1068, 1998. 5 CONCLUSIONS [Towell and Shavlik, 1993] Geoffrey G. Towell and Jude W. In this paper, we presented a new decompositional approach Shavlik. Extracting refined rules from knowledge-based to extract propositional if-then rules from a feed-forward net- neural networks. Machine Learning, 13:71–101, 1993. work of binary threshold units. Our approach is sound and