<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rules Acquisition from Classic, Deep and Neuro-Fuzzy Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexey Averkin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey Yaryshev</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Educational and Scientific Laboratory of Artificial Intelligence, Neuro-technologies and Business Analytics, Plekhanov Russian University of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Federal Research Centre of Informatics and Computer Science of RAS</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Plekhanov Russian University of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>This article attempts to give an overview of several algorithms for extracting rules from an artificial neural network. The goal of this article is to find critical links three important parts of artificial intelligence - production models, fuzzy logic and deep learning. Such an approach will stimulate researchers in the ifeld of soft computing to develop applied systems in the field of explanational artificial intelligence and machine learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This article presents the basic methods of machine learning and explanational artificial
intelligence that can help in the issue of extracting rules and other models of knowledge
representation not only from data, but from the artificial neural networks themselves. The paper discusses
classification methods for rule-based learning methods for neural networks and the current
state of technologies for extracting rules from neural networks. Next, we formulate the main
problems that arise when extracting rules from neural networks, as well as the main methods
for solving them. A number of rule extraction algorithms are described in detail below. The
last part discusses specific issues when working with deep neural networks and neuro-fuzzy
systems. This step also proposes algorithms that can eficiently extract rules from these more
complex and neuromorphic neural networks.</p>
      <p>Artificial neural networks are well-known parallel computing models that are highly
efective in solving complex artificial intelligence problems such as pattern recognition and text
analysis. However, many users are afraid to use them in critical situations due to the fact that
they are a "black box". This means that explaining how a neural network makes a particular
decision is a very dificult problem.</p>
      <p>This is a serious problem because it is dificult to trust solutions to a neural network that
solves real problems without the ability to explain the decisions made. This is particularly true
for security-critical tasks in which hidden errors can lead to dangerous human consequences
or large military, political or economic losses. Moreover, understanding how neural networks
extract, accumulate and modify formal knowledge is important and necessary for the evolution
of machine learning methods and explanational artificial intelligence. For example, increasing
the transparency of neural networks reveals ’hidden dependencies’ that are not present in the
input data but appear because of processing by the neural network. To overcome these
shortcomings of neural networks, data scientists have come up with the idea of extracting rules
directly from neural networks, which is one of the methods of artificial intelligence. In this
way, we establish an additional link between symbolic and connectional (sub-symbolic)
models of knowledge representation in artificial intelligence.</p>
      <p>Most authors are focused on extracting the most understandable rules, and at the same time
they have failed to mimic neural network maintenance as accurately as possible. After the
appearance in 1992 in Jang’s doctoral dissertation of a method of isomorphic representation
of fuzzy rules in the form of a neuro-fuzzy system, tremendous work was done in this area,
which ended with the creation of the directions of soft computing and computational artificial
intelligence. Since then, many methods for extracting rules from neural networks have been
developed and critically investigated, and in most cases excellent results have been obtained.</p>
      <p>But while there are currently quite a few efective algorithms for extracting rules directly
from neural networks, none have ever been explicitly tested in deep neural networks. In
addition, most authors focus on networks with very few hidden layers. In the past few years
alone, several innovative analyses of specific methods for extracting rules from modified deep
networks had emerged, and some approaches were presented that could accomplish that task.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods for Extracting Rules from the Neural Network</title>
      <p>In artificial intelligence problems, neural networks and machine learning methods based on
knowledge representation represent two diferent approaches to solving classification
problems. Both methods are the methods of designing models that create the classes for the
experimental data. For most of the tasks, neural network training methods are very accurate.</p>
      <p>However, neural networks have one major weakness: for a neural network, the ability to
understand what event it models is weaker than for approaches based on knowledge
representation models. The data used by neural networks for training is more dificult to understand,
because it is represented in a neural network using a huge number of parameters[Crav94].</p>
      <p>Increasing the intelligibility of neural networks by extracting knowledge representation
models has two important advantages. First, it gives the user a clear understanding of how
the neural network uses input data to make decisions. Second, it can reveal hidden functions
in the neural network to explain the work of individual neurons. Identifying important
attributes or identifying the causes of neural network errors is also part of the understanding
process. In an attempt to make black boxes of neural networks more understandable, methods
for extracting knowledge representation models reduce the discrepancy between accuracy and
comprehensibility[Joh06].</p>
      <p>More understandable presentation of the results of the solution is required if, for example,
the neural network is to be used in security critical applications such as military operations
or nuclear power plants. In such situations it is important for the system user to be able to
implement the scenario of verification of the artificial neural network output with all possible
input data [Andr95].</p>
      <p>To formalize the task of drawing rules from a neural network, a description of the network
hypothesis is usually created, which is understandable, but its behavior approaches the
network prediction[Crav96].</p>
      <p>To distinguish diferent approaches to rule extraction from neural networks a
multidimensional taxonomy is used [Andr95]. The first parameter that it describes is the expressive force
of extracted rules (e.g., IF-THEN rules or fuzzy productive rules).</p>
      <p>The second parameter is called transparency and describes the strategy that follows the
results of the rule extraction algorithm. If the method uses a neural network only in black
box quality, we call it a pedagogical approach. If the algorithm takes into account the neural
network topology, we call this approach decompositional. If the algorithm uses elements of
both pedagogical and decompositional methods, this approach is called the eclectic one. The
third parameter is the quality of the extracted rules. When quality is a rather general term, it
is divided into several criteria, namely: accuracy, fidelity, consistency and comprehensibility.
While accuracy measures the ability to correctly classify previously unknown examples,
fidelity measures the extent to which rules can mimic the narrative of a neural network[Joh06].
Consistency can only be measured when an algorithm for attracting rules involves learning
about the learning process of a neural network other than learning about an already learned
neural network. The resulting set of rules is considered consistent, when it correctly classifies
the test data for diferent training samples. Clarity here is regarded as a measure of the rule
size. Short and few rules are considered more comprehensible[Andr95].</p>
      <p>In this review we will focus only on three described parameters. We will focus on methods
that do not impose special requirements on how the neural network was trained before the
rules were extracted[Thrun93]. In addition, we will investigate only algorithms capable of
extracting rules from forward propagation neural networks. In accordance with[Crav99] we
believe that the algorithm implies a high level of generalization.</p>
      <p>Let us consider some methods of rule extraction, which correspond to the above description,
starting with the decomposition approach. As mentioned above, decomposition approaches
to rule extraction from neural networks work at the neuronal level. Usually decomposition
method analyzes each neuron and forms rules imitating the behavior of this neuron. Among the
possible decomposition approaches we consider the KT algorithm, the Tsukimoto polynomial
algorithm and the rule extractor through the induction of the decision tree.</p>
      <p>The KT algorithm was one of the first decomposition approaches for extracting rules from
neural networks[Fu94]. The KT algorithm describes each neuron with IF-THEN rules by
heuristic search of combinations of input attributes exceeding the neuron threshold. To find suitable
combinations the KT method applies a tree search, i.e. a rule (represented as a node in a tree) at
this level generates its child nodes by adding an additional available attribute [Tsuki00]. In
addition, the algorithm uses several heuristics to stop the tree from growing in situations where
further improvement is impossible.</p>
      <p>The polynomial algorithm to extract rules from the neural network is very similar to the KT
method. It also uses a multilevel decomposition algorithm to extract IF-THEN rules for each
neuron and monitors the strategy for finding inputs that exceed the neuron’s threshold. The
main advantage of the Tsukimoto method is its computational complexity, which is polynomial,
while the KT method is exponential[Fu94]. The algorithm achieves polynomial complexity by
searching for corresponding terms using the space of multiline functions. In the second stage,
these terms are used to create IF-THEN rules. At the last stage, the Tsukimoto algorithm tries
to optimize comprehensibility by removing insignificant attributes from the rules.</p>
      <p>Another method of rule extraction by induction of the solution tree was introduced in[Tsuki01].
Their CRED algorithm converts each output vertex of a neural network into a solution where
tree nodes are tested using hidden layer nodes and leaves are a class. The intermediate rules
are then extracted. Then another solution tree is created for each branching point used in these
rules, using branching points on the input layer of the neural network. Extracting rules from
the second solution tree leads us to the description of the state of hidden neurons that depend
on input variables. As a last step, intermediate rules describing the output layer through the
hidden layer and rules describing the hidden layer based on neural network input data are
replaced. Then they are combined into constructive rules describing the output of the neural
network based on its input data.</p>
      <p>The main class of pedagogical approaches of rule extraction from the neural network include
validity interval analysis, approaches for rule using sampling and rule by reverse engineering.</p>
      <p>Pedagogical approaches do not consider the internal structure of the neural network. The
basis of pedagogical approaches is the attitude towards the trained neural network as a single
object or a "black box"[Tick98]. The main idea is to extract rules by directly displaying inputs
to outputs[Thrun95].</p>
      <p>Pedagogical approaches usually have access only to neural network function. This
function makes the output-exit of the neural network dependent on the input but does not give
an understanding of the inner structure of the neural network or any weights. This class of
algorithms tries to find a relationship between possible input and output variations created by
the neural network, some of them using given learning data, and some do not.</p>
      <p>Rule extraction based on interval analysis uses validity interval analysis to extract rules
that simulate neural network behavior[Crav96]. The main idea of this method is to find input
intervals at which the neural network output signal is stable, i.e., the predicted class matches
for small changes in inputs. Thus, interval analysis provides the basis for precise, reliable rules.</p>
      <p>Obtaining rules by sampling is a series of methods that follow the same strategy for
extracting rules from a neural network with the help of sampling, i.e., they create an extensive set of
data as a basis for extracting rules. After that, the selected set of data is passed to the
standard learning algorithm for generating rules that simulate the behavior of the neural network.
In[Joh06] it is proved that the use of sample data exceeds the use of conventional tutorial data
in rules extraction tasks.</p>
      <p>One of the first methods, which followed this strategy, was Trepan’s algorithm [Taha99].It
is very similar to the "divide and conquer" algorithm of C4.5[Quin93] by searching for points
of division into teaching data for separate instances of diferent classes. The main diferences
from the "divide and conquer" method are the best expansion strategy of the tree structure,
additional branching points and the possibility to select teaching examples in deeper tree nodes.
As a result, the algorithm also creates a decision tree, which can be transformed into a set of
rules.</p>
      <p>Another of these very common pedagogical approaches that use sampling to extract rules
from a neural network is presented in[Crav96]. An algorithm called binarized input-output rule
extraction can only handle neural network with binary or binarized input attributes. Binarized
input-output rule extraction creates all possible input combinations and requests the results
from the network. Using the neural network output, a truth table is created for each example.
From the truth table, if necessary, it is just as easy to go to the rules.</p>
      <p>The ANN-DT method is another decision-based sampling method to describe neural network
behavior[Taha99]. The general algorithm is based on CART method with some changes in the
initial implementation. ANN-DT uses a sampling method to extend the learning set so that
most of the learning set remains representative. This is achieved by using the nearest-neighbor
method, which calculates the distance from the sampling point to the nearest point of the
learning set [Taha99] and compares it with the original value. The STARE algorithm[Towell93]implements
the principle of creating a large set of examples at the first stage. By analogy with BIO-RE, the
STARE method also builds large truth tables for learning. The advantage of STARE is its ability
to work with continuous input data. To generate truth tables, the algorithm rearranges the
input data, and for each continuous attribute, it is required to sample it at high frequency across
all values. An example of pedagogical approach using educational data sampling is KDRuleEx
[Sethi12]. Similar to Trepan, the KDRuleEx algorithm generates an additional teaching
sample when the bases for the next branching points are insuficient. KDRuleEx uses evolutional
methods to create new learning examples. The technology leads to a solution table, which can
be easily converted into IF-THEN rules.</p>
      <p>The eclectic approach to rule extraction includes elements of both pedagogical and
decomposition approaches [Crav99]. In particular, the eclectic approach uses knowledge about the
internal architecture and neural network weight vectors to complement the symbolic learning
algorithm [Andr95]. It tries to identify corresponding latent neurons as well as corresponding
inputs to the network. For this purpose, the solution tree is built using the well-known C4.5
algorithm. The rule extraction process leads to the generation of M-of-N and IF-THEN rules.
With a set of correctly classified learning examples, FERNN analyses the activation values of
each hidden unit. For each hidden vertex, the activation values are sorted in ascending
order. The C4.5 algorithm is then used to find the best branching point to form a decision tree.
The problems of rule extraction from artificial neural networks are only a small part of the
problem of explainability based on subversive models (e.g. deep neural networks), which was
not present in classic AI (namely, rule-based expert systems and models). These problems are
included in the field of eXplainable AI (XAI), which is admittedly a crucial part of practical
deployment of AI models [Arr19] and [Arya19].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Extracting Rules from Deep Neural Networks and</title>
    </sec>
    <sec id="sec-4">
      <title>Neuro-Fuzzy Networks</title>
      <p>At present, the direction of rule extraction using neural fuzzy models is actively developing.
Systems based on fuzzy rules (FRBS), developed using fuzzy logic, have become a rapidly
growing field over the past few years. These algorithms have proven their strengths in such tasks
as controlling complex systems, creating fuzzy controls. The relationship between the two
approaches (ANN and FRBS) has been carefully studied and results have been obtained on
their mutual correspondence [Aver18]. This leads to two extremely important conclusions.
First, we can apply the methods used in one model, to the other model. Secondly, we can
present the knowledge embedded in the neural network in a more understandable algebraic
language of fuzzy production rules. In other words, we can get algebraic interpretation of
neural networks[Pilato18],[Aver18].</p>
      <p>Since 2012, we have started a stormy neural network of deep learning. One of the first deep
revolutionary neural networks is Alexnet, which won the annual Imagenet competition and
was trained on the Imagenet data set containing 15 million images. One of the last winners in
2016 is the Chinese University of Hong Kong neural network containing 269 layers.</p>
      <p>In order to obtain a clear semantic interpretation for in-depth networks, it is possible to
use fuzzy neural networks instead of the last full-connected neural network on the last layer.
For example, ANFIS (Adaptive Neural Fuzzy Inference System)[Jang]is a multilayer direct
distribution network. This architecture has five layers, such as a fuzzy layer, a product layer, a
normalized layer, a defuzzification layer, and a common output. ANFIS has the property of a
neural network and a fuzzy logic system. The goal of fusion of fuzzy logic architectures and
neural networks is to design an architecture that uses fuzzy logic to demonstrate knowledge
in a clear way while the neural network maximizes its parameters in training. ANFIS is used
in many applications such as function approximation, intelligent management and time series
forecasting. Deep neural networks and fuzzy neural networks can be combined in diferent
ways. A hypothetical system can be created using two components[Bon17]. The first is the
generation of a deep learning function, which can be used to create representative features
directly from text. Initially, the deep learning system will be trained to work with undetected
data. Once these elements are extracted from the in depth-learning system, they will be
integrated into systems with fuzzy findings. These systems may include both elements found in
the in depth-learning process and subjective information from the analyst. These two parts
together can be used for classification purposes. In this way, the final system will be able to
report both on the classification results and on the specific features and rules that have been
activated for the system to be completed. In addition, the resulting system can be further used
by the analyst as a feedback form.</p>
      <p>A very interesting approach is proposed in [Fan17], where the author established a
fundamental connection between two important areas of artificial intelligence, i.e., deep study
and fuzzy logic. He shows the benefit that deep study can bring to comparative research by
rethinking many of the heuristics of traces and errors in the lens of fuzzy logic, and thus distilling
essential ingredients with a strict foundation. The author proposed deep generalized hamming
network (GHN), which not only can be thoroughly analyzed and interpreted within the
framework of fuzzy logic theory, but also demonstrates fast learning speed, well-controlled behavior,
and state-of-the-art understanding of various learning objectives. The [Zilke16] presents
another approach for including such rule-based methodology in neural networks by embedding
systems of fuzzy conclusions in networks of deep learning.</p>
      <p>Thanks to the theory of fuzzy sets, using fuzzy relationships and rules, you can create an
effective model for predicting time series with a large number of inputs and one output (forecast).
Such an approach allows us to make a kind of justification for the operation of an artificial
neural network using neural-fuzzy models on the one hand and fuzzy cognitive maps on the other.
We have developed a hybrid modular forecasting model that combines the theory of fuzzy
logic, cognitive maps and artificial neural networks. The modular system as a whole consists
of several specialized modules. In general, these modules have the following characteristics:
1. System modules are specific and have specialized computing architectures to recognize and
respond to specific subtasks of a large common task. 2. Each module, as a rule, is independent
of other modules in its functioning and does not afect other operation of other modules. 3.
Modules have a simpler architecture compared to the system as a whole. Thus, the module
is faster than a complex monolithic system. 4. The results of each module individually are
combined using a special integration module (in our case, the forecast consensus module), due
to which the highest fore-cast accuracy of the entire system is achieved. The system has three
main modules responsible for the forecasting task. The ANFIS neuro-fuzzy network performs a
time series forecast based on numerical indicators and gives us the so-called quantitative
forecast, the results of which pass through a verification system (assessment of the adequacy of the
forecast), if the fore-cast corresponds to the necessary accuracy, then it is transmitted to the
next module. In parallel with the neuro-fuzzy network, a module with a fuzzy cognitive map
is working, which receives data on the event efect on the time series as an input, a cognitive
map is constructed in which all factors of influence on a specific predicted indicator are taken
into account. At the output, the cognitive map gives us a forecast with the probability of its
fulfillment, that is, with the consonance of a factor that tells us whether the forecast will be
fulfilled or not. Further, all the data received from these modules is sent to the third module,
which operates on the basis of the ANFIS network, which aggregates the information received
from the previous modules and gives the final consensus forecast. Figure 1 presents a model
of a forecasting system.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>This paper attempts to provide a review of several rule extraction algorithms from an
artificial neural network. Some of the state-of-the-art algorithms are discussed from each category
named as decompositional, pedagogical, and eclectics. Currently, deep learning provides an
acceptable solution for lots of problems. It is a new machine learning area which is believed to
move machine learning a step ahead. But it is still a black box system.</p>
      <p>Quite a number of authors are trying to establish a link between two important areas of
artificial intelligence - deep learning and fuzzy logic. Until recently, fuzzy logic has been poorly used
in machine learning. Extracting fuzzy rules is one way to help semantically interpret neural
networks. This research will allow researchers of fuzzy logic to develop artificial intelligence
applications and solve problems that are of interest to machine learning.
4.0.1. Acknowledgements
The paper is partially support by grants of RFBR 20-07-00770 A and 20-010-00828 A.
[Seti00] D. Setiono. FERNN: An algorithm for fast extraction of rules from neural networks.</p>
      <p>
        Applied Intelligence, 12(
        <xref ref-type="bibr" rid="ref9">1-2</xref>
        ):15–25, 2000.
[Aver18] A.N. Averkin. Hybrid Neural Networks and Time Series Forecasting. Springer,
934(1):230–239, 2018.
[Pilato18] G. Pilato. Prediction and Detection of User Emotions Based on Neuro-Fuzzy Neural
      </p>
      <p>Networks in Social Networks. Springer, 875(2):118–126, June 1979.
[Aver18] A.N. Averkin. The ubiquitous b-tree. Computing Surveys, 11(2):121–137, June 1979.
[Zilke16] J.R. Zilke. DeepRED - Rule Extraction from Deep Neural Networks. Springer, 457–
473, 2016.
[Fan17] L. Fan. Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU
with Generalized Hamming Network. NIPS, 1923–1932, 2017.
[Quin93] J. R. Quinlan. The ubiquitous b-tree. Morgan Kaufmann, volume 1:4–5, 1993.
[Bon17] D. Bonanno. An approach to explainable deep learning using fuzzy inference.
Next</p>
      <p>Generation Analyst V, 102070D, 2017.
[Jang] S. R. Jang. ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Trans. On</p>
      <p>Systems, Man, and Cybernetics, 23:665-685, 1992.
[Arr19] A.B. Arrieta. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies,
Opportunities and Challenges toward Responsible AI. Information Fusion, arXiv–1910,
2019.
[Arya19] V. Arya. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI
Explainability Techniques. arXiv preprint, arXiv:1909.03012, 2019.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Crav94]
          <string-name>
            <given-names>M.</given-names>
            <surname>Craven</surname>
          </string-name>
          .
          <article-title>Using sampling and queries to extract rules from trained neural networks</article-title>
          . ICML,
          <fpage>37</fpage>
          -
          <lpage>45</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Joh06]
          <string-name>
            <given-names>U.</given-names>
            <surname>Johansson</surname>
          </string-name>
          .
          <article-title>Rule ex-traction from opaque models-a slightly diferent perspective</article-title>
          .
          <source>Machine Learning and Applications</source>
          . ICMLA'
          <volume>06</volume>
          . 5th International Conference,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Crav99]
          <string-name>
            <given-names>D.</given-names>
            <surname>Comer</surname>
          </string-name>
          .
          <article-title>Rule extraction: Where do we go from here</article-title>
          . University of Wisconsin Machine Learning Research Group Working Paper,
          <fpage>99</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>June 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Sethi12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sethi. KDRuleEx</surname>
          </string-name>
          :
          <article-title>A novel approach for enhancing user comprehensibility using rule extraction</article-title>
          .
          <source>Intelligent Systems, Modelling and Simulation (ISMS)</source>
          ,
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Andr95]
          <string-name>
            <given-names>R.</given-names>
            <surname>Andrews</surname>
          </string-name>
          .
          <article-title>Survey and critique of techniques for extracting rules from trained artificial neural networks</article-title>
          .
          <source>Knowledge-based systems</source>
          ,
          <volume>8</volume>
          (
          <issue>6</issue>
          ):
          <fpage>373</fpage>
          -
          <lpage>389</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Crav96]
          <string-name>
            <given-names>M.W.</given-names>
            <surname>Craven</surname>
          </string-name>
          .
          <article-title>Extracting comprehensible models from trained neural networks</article-title>
          .
          <source>PhD thesis</source>
          , University of Wisconsin-Madison,
          <year>June 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Thrun93]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          .
          <article-title>Extracting provably correct rules from artificial neural networks</article-title>
          .
          <source>Technical report</source>
          , University of Bonn, Institut für Informatik III,
          <year>1993</year>
          . [Fu94]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fu</surname>
          </string-name>
          .
          <source>The ubiquitous b-tree. Systems, Man and Cybernetics</source>
          , IEEE Transactions,
          <volume>24</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1114</fpage>
          -
          <lpage>1124</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Tsuki00]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tsukimoto</surname>
          </string-name>
          .
          <article-title>Extracting rules from trained neural networks</article-title>
          .
          <source>Neural Networks, IEEE Transactions</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <fpage>377</fpage>
          -
          <lpage>389</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Tsuki01]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tsukimoto</surname>
          </string-name>
          .
          <article-title>Rule extraction from neural networks via decision tree induction</article-title>
          - Volume
          <volume>3</volume>
          , pages
          <fpage>1870</fpage>
          -
          <lpage>1875</lpage>
          / Neural Networks.
          <source>Iternational Joint Conference</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Tick98]
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Tickle</surname>
          </string-name>
          .
          <article-title>The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <volume>9</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1057</fpage>
          -
          <lpage>1068</lpage>
          ,
          <year>June 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Thrun95]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          .
          <article-title>Extracting rules from artificial neural networks with distributed representations</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>505</volume>
          -
          <fpage>512</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Crav96]
          <string-name>
            <given-names>M.W.</given-names>
            <surname>Craven</surname>
          </string-name>
          .
          <article-title>Extracting tree-structured representations of trained networks</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>24</volume>
          -
          <fpage>30</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Taha99]
          <string-name>
            <given-names>I.A.</given-names>
            <surname>Taha</surname>
          </string-name>
          .
          <article-title>Symbolic interpretation of artificial neural networks. Knowledge and Data Engineering</article-title>
          , IEEE Transactions,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <fpage>448</fpage>
          -
          <lpage>463</lpage>
          ,
          <year>June 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Towell93]
          <string-name>
            <given-names>G.G.</given-names>
            <surname>Towell</surname>
          </string-name>
          .
          <article-title>Extracting refined rules from knowledge-based neural networks</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <fpage>71</fpage>
          -
          <lpage>101</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>