Two ways of using artificial neural networks
            in knowledge discovery from chemical materials data⋆

                                                     Martin Holeňa

                      Institute of Computer Science, Academy of Sciences of the Czech Republic
                                       Pod Vodárenskou věžı́ 2, 18207 Prague
                                     martin@cs.cas.cz, web:cs.cas.cz/~martin

Abstract. In the application area of chemical materials,       to serve as universal approximators in very general
data mining methods have been used for more than a de-         function spaces [12, 14, 18]. This ability is particularly
cade. By far most popular have from the very beginning         valuable in the context of the highly nonlinear nature
been methods based on artiﬁcial neural networks. However,      of the dependencies encountered in catalysis (cf. Fig-
they are frequently used without awareness of the diﬀerence    ure 1).
between the numeric nature of knowledge obtained from
                                                                   However, it seems to be little awareness, among
data by neural network regression, and the symbolic nature
of knowledge obtained by some other data mining meth-          researchers using artiﬁcial neural networks in catal-
ods. This paper explains that within the surrogate model-      ysis, of the diﬀerence among the symbolic nature of
ling approach, which plays an important role in this area,     the knowledge obtained from data by analysis of vari-
using numeric knowledge is justiﬁed. At the same time,         ance and decision trees, and the numeric nature of the
it recalls the possibility to obtain symbolic knowledge from   knowledge obtained by neural network regression.
neural networks in the form of logical rules and describes
a recently proposed method for the extraction of Boolean
rules in disjunctive normal form. Both ways of using neural
networks are illustrated on examples from this application
area.


1    Introduction
The search for new chemical materials, e.g., catalytic
materials for a plethora of chemical reactions, pro-
duces large amounts of data. To discover useful knowl-
edge from those data, statistical as well as machine-
learning data mining methods have been used in this
area since the late 1990s, the former represented in
particular by the analysis of variance, decision trees
and support vector regression, the latter by main vari-
ants of feed-forward neural networks.
    This paper summarizes experience from nearly ten
years using and developing neural-networks based data
mining methods for catalytic data. Artiﬁcial neural
networks are the most popular regression model in this         Fig. 1. A 3-dimensional cut of a neural-network regression
application area. In the survey [10], more than 20 pub-        of the yield of a reaction product on the composition of
lished applications of multilayer perceptrons (MLPs)           the catalytic material.
to catalytic data have been listed, as well as several ap-
plications of radial basis function networks. The role of    Incited by the situation just outlined, the paper
feed-forward neural nets as a regression model predict-  presents two strategies for the application of artiﬁcial
ing catalytic performance of materials (such as yield,   neural networks to data about chemical materials. The
conversion, selectivity) is due partially to their preced-
                                                         ﬁrst strategy relies on the numeric knowledge from
ing success in other areas, but mostly to their ability  neural network regression. Although numeric knowl-
⋆
  The research reported in this paper has been supported edge is much less understandable to humans than sym-
  by the grant No. 201/08/1744 of the Grant Agency of bolic knowledge (in terms of [4], it has a high ”data
  the Czech Republic and partially supported by the In- ﬁt”, but a low ”mental ﬁt”), we show that in this appli-
  stitutional Research Plan AV0Z10300504.                cation area, it can be very useful if directly integrated
18      Martin Holeňa

with the optimization of materials performance in an         as surrogate modelling [17, 20, 23, 27]. Needless to say,
approach called surrogate modelling. In that context,        the time and costs needed to evaluate a regression mo-
also the possibility to increase the accuracy of neu-        del are negligible compared to time and costs needed
ral network regression by means of boosting is men-          to evaluate empirical functions such as yield or con-
tioned. The other strategy, on the other hand, relies        version. However, it must not be forgotten that the
on employing rules extraction methods to obtain, from        agreement between the results obtained with a surro-
trained neural networks, symbolic knowledge.                 gate model and those obtained with the original func-
    These two strategies determine also the structure        tion depends on the accuracy of the model.
of the paper. In Section 2, the surrogate modelling ap-          The fact that feed-forward neural networks are the
proach is described. Section 3 then explains a method        most frequent regression models in catalysis suggests
for the extraction of logical rules from trained neural      them as the most natural candidate for surrogate mod-
networks. Both strategies are in the respective sections     els in this area. Indeed, several nice examples of the
illustrated using real-world examples.                       application of neural-network based surrogate model-
                                                             ling to the optimization of performance of catalytic
2    Neural networks used as surrogate                       materials have been published during the last ﬁve
     models                                                  years [3, 6, 21, 25]. Within the overall context of the
                                                             application of artiﬁcial neural networks to mining cat-
From the point of view of theoretical computer sci-          alytic data, however, they are still rare.
ence, the search for most suitable chemical materials            Although surrogate modelling has been also ap-
entails complex optimization tasks. As objective func-       plied to conventional optimization [5], it is most fre-
tions, those tasks use various properties of the mate-       quently encountered in connection with evolutionary
rials, e.g. in the case of catalytic materials, properties   algorithms because for them, the approach leads to the
quantifying their catalytic performance, such as yield,      approximation of the ﬁtness function, whose usefulness
conversion, or selectivity. A crucial feature of such ob-    in evolutionary computation is already known [13, 19].
jective functions is that they cannot be expressed an-       For the progress of evolutionary optimization, most
alytically, their values must be obtained empirically.       important criteria are on the one hand points that in-
For their optimization, it is not possible to employ         dicate closeness to the global optimum (through high-
most common optimization methods, such as steepest           est values of the ﬁtness function), on the other hand
descent, conjugate gradient methods or the Levenberg-        points that most contribute to the diversity of the pop-
Marquardt method. Indeed, to obtain suﬃciently pre-          ulation.
cise numerical estimates of gradients or second order            In the literature, various possibilities of combin-
derivatives of the empirical objective function, those       ing evolutionary optimization with surrogate model-
methods need to evaluate the function in points some         ling have been discussed [17, 24, 27]. Nevertheless, all
of which would have a smaller distance than is the em-       of them are controlled by one of two basic approaches:
pirical error of catalytic measurements. That is why         A. The individual-based-control consists in choosing
methods not requiring any derivatives have been used            between the evaluation of the empirical objective
to solve such optimization tasks, such as the simplex           function and the evaluation of its surrogate model
method, and most frequently genetic and other evolu-            individual-wise, basically in the following steps:
tionary algorithms [2]. To compensate for missing in-           (i) An initial set E of individuals is collected, in
formation about derivatives, these methods need quite                which the considered empirical ﬁtness η was
large number of objective function evaluations. In the               evaluated (for example, the population of sev-
context of catalysis, this is quite disadvantageous be-              eral ﬁrst generations of the evolutionary algo-
cause the evaluation of the empirical objective func-                rithm).
tions used in the search for optimal catalysts is of-           (ii) The surrogate model is constructed using the
ten costly and time-consuming. Testing a generation                  set of pairs {(x, η(x)) : x ∈ E}.
of catalytic materials proposed by an evolutionary al-          (iii) The evolutionary algorithm is run with the
gorithm typically needs several days of time and costs               ﬁtness η replaced by the model for one gener-
thousands of euros.                                                  ation with a population Q of size qP , where
    The usual approach to decreasing the cost and time               P is the desired population size for the opti-
of optimization of empirical objective functions is to               mization of η, and q is a prescribed ratio (e.g.,
evaluate the function only in points considered to be                q = 10 or q = 100).
most important for the progress of the employed opti-           (iv) A subset P ⊂ Q of size P is selected so as to
mization method, and to evaluate its suitable regres-                contain those individuals from Q that are most
sion model otherwise. That model is termed surrogate                 important according to the considered criteria
model of the function, and the approach is referred to               for the progress of optimization.
                                                               Two ways of using artiﬁcial neural networks . . .   19

   (v) For x ∈ P, the empirical ﬁtness is evaluated. (ii’c) A k-fold crossvalidation of regression boosting
   (vi) The set E is replaced by E ∪ P and the algo-         is performed, and the error of the boosting ap-
        rithm returns to the step (ii).                      proximation is in each iteration measured with
                                                             the prescribed error measure on the validation
B. The generation-based-control consists in choosing         data.
   between both kinds of evaluation generation-wise, (ii’d) The ﬁrst iteration i in which the average error
   basically in the following steps:                         of the boosting approximation on the validation
   (i) An initial set E of individuals in which the          data is lower than in the i + 1-th iteration is
        considered empirical ﬁtness η was evaluated is       taken as the ﬁnal iteration of boosting.
        collected like with the individual-based con- (ii’e) Boosting using the complete set {(x, η(x)) :
        trol.                                                x ∈ E} is performed up to the ﬁnal iteration
   (ii) The surrogate model is constructed using the         found in step (ii’d), and the result of the ap-
        set of pairs {(x, η(x)) : x ∈ E}.                    plication of the employed boosting method in
   (iii) Relying on the error of the surrogate mo-           each such iteration of boosting is taken as the
        del, measured with a prescribed error measure        boosted surrogate model in that iteration.
        (e.g., mean squared error, MSE, or mean abso-
        lute error, MAE), an appropriate number gm            2.1   An illustration
        of generations is chosen, during which η should
        be replaced by the model.                             A particular method for MLP boosting has been pre-
    (iv) The evolutionary algorithm is run with the           sented in [11]. That method will now be employed in
        ﬁtness η replaced by the model for gm genera-         surrogate modelling with data from the investigation
        tions with populations P1 , . . . , Pgm of size P .   of catalytic materials for the high-temperature synthe-
    (v) The evolutionary algorithm is run with the            sis of hydrocyanic acid (HCN) [16]. The composition
        empirical ﬁtness η for a prescribed number ge         of most of those materials was designed by means of
        of generations (frequently, ge = 1) with popu-        a speciﬁc genetic algorithm (GA) for heterogeneous
        lations Pgm +1 , . . . , Pgm +ge .                    catalysis [26]. As usually in evolutionary optimization
                                                              of catalytic materials, the GA conﬁguration was de-
    (vi) The set E is replaced by E ∪ Pgm +1 ∪ . . .
                                                              termined by the experimental conditions in which the
        · · · ∪ Pgm +ge and the algorithm returns to the
                                                              optimization was performed: number of channels of the
        step (ii).
                                                              reactor in which the materials were tested, as well as
                                                              time and ﬁnancial resources available for those expen-
    The agreement between the results that are ob-            sive tests. In the reported investigation, the algorithm
tained with a surrogate model and those that would be         was running for 7 generations of population size 92,
obtained if the empirical objective function were evalu-      and in addition 52 other catalysts with manually de-
ated depends on the accuracy of the model. A popular          signed composition were investigated. Consequently,
approach to increasing the accuracy of learning meth-         data about 696 catalytic materials were available. The
ods is boosting, i.e., construction of a strong learner       considered MLPs had 14 input neurons: 4 of them cod-
through combining weak learners. It is important to           ing catalyst support, the other 10 corresponding to the
realize that boosted surrogate models are only par-           proportions of 10 metal additives forming the active
ticular kinds of surrogate models and their interaction       shell, and 3 output neurons, corresponding to 3 kinds
with optimization algorithms in optimization tasks fol-       of catalytic activity considered as ﬁtness functions.
lows the same rules as the interaction of surrogate               For boosting, only data about catalysts from the
models in general. In particular in the above outlines of     1.-6. generation of the GA and about the 52 catalysts
individual-based and generation-based control, boost-         with manually designed composition were used, thus
ing is always performed in the step (ii), which has to        altogether data about 604 catalytic materials. Data
be replaced with:                                             about catalysts from the 7. generation were completely
                                                              excluded and left out for testing. The set of architec-
(ii’a) The set {(x, η(x)) : x ∈ E} is divided into k          tures to which boosting was applied was restricted to
       disjoint subsets of size ⌊ |E|      |E|
                                   k ⌋ or ⌈ k ⌉, where | |    MLPS with 1 and 2 hidden layers and was delimited by
       denotes the cardinality of a set, ⌊ ⌋ the lower in-    means of the heuristic pyramidal condition: the num-
       teger bound of a real number, and ⌈ ⌉ its upper        ber of neurons in a subsequent layer must not exceed
       integer bound.                                         the number of neurons in a previous layer. Let nI , nH
(ii’b) For each j = 1, . . . , k, a surrogate model F1j is    and nO denote the numbers of input, hidden and out-
       constructed, using only data not belonging to          put neurons, respectively, and nH1 and nH2 denote
       the j-th subset.                                       the numbers of neurons in the ﬁrst and second hid-
20      Martin Holeňa

den layer, respectively. Then the pyramidal condition          2. In each iteration up the ﬁnal iteration of boosting,
entails the following 90 architectures:                           the boosted surrogate model was constructed for
(i) one hidden layer and 3 ≤ nH ≤ 14 (12 architec-                the trained MLP, according to the step (ii’e).
    tures);                                                    3. From the values predicted by the boosted surro-
(ii) two hidden layers and 3 ≤ nH2 ≤ nH1 ≤ 14                     gate model for the 92 materials from the 7. gen-
    (78 architectures).                                           eration of the GA, and from the measured values,
As was mentioned above, boosting can be combined                  the boosting MSE was calculated.
both with the individaul-based and with the genera-               The results are summarized in Figure 3, decom-
tion-based control of surrogate modelling. In the re-         posed to the properties corresponding to the MLP
ported investigation of catalytic materials for HCN           outputs – conversions of CH4 and NH3 and yield of
synthesis, the indiviual-based control was employed.          HCN. They clearly conﬁrm the usefulness of boost-
    The error measure employed in the crossvalidation         ing for the ﬁve considered architectures. For each of
in the step (ii’c) was MSE. The distribution of the ﬁ-        them, boosting leads to an overall decrease of MSE of
nal iterations of boosting, found for MLPs with the           the conversion of CH4 and HCN yield, on new data
90 considered architectures in the step (ii’d), is de-        from the 7th generation of the GA, which is uninter-
picted in Figure 2. We can see that only for 16 MLPs,         rupted or nearly uninterrupted till the ﬁnal boosting
already the 1st iteration was the ﬁnal. For the remain-       iteration. On the other hand, boosting did not lead
ing 74 MLPs, boosting improved the average MSE on             to any decrease of the error of the conversion of NH3,
the validation data for at least 1 iteration. The mean        which on the other hand is already from the beginning
and median of the distribution of the ﬁnal iterations         much lower than the two other performance measures
were 6.6 and 5, respectively.                                 (notice that the scale of the y-axis is 10-times ﬁner
                                                              for the conversion of NH3 than for the conversion of
                                                              CH4 and HCN yield). The explanation for the diﬀer-
                                                              ent behavior of the conversion of NH3 is the substan-
                                                              tially lower variability of its values in the seventh gen-
                                                              eration of the GA, used for validating the usefulness
                                                              of boosting (standard deviation, SD: 2.8, interquar-
                                                              tile range, IQR: 1.6), compared to the conversion of
                                                              CH4 (SD: 26.1, IQR: 45.0) and HCN yield (SD: 20.1,
                                                              IQR: 35.9). Due to so low variability, the conversion
                                                              of NH3 appears eﬀectively as nearly constant during
                                                              the validation of boosting, which in turn accounts for
                                                              a nearly constant MSE.


Fig. 2. Distribution of the ﬁnal iterations of boosting of
                                                              3     Neural-network based rules
the 90 MLPs with 1-hidden-layer architectures fulﬁlling             extraction from data
3 ≤ nH ≤ 14 and 2-hidden-layer architectures fulﬁlling
3 ≤ nH2 ≤ nH1 ≤ 14.                                           The architecture of a trained neural network and the
                                                              weights and biases that determine the regression mo-
                                                              del computed by the network inherently represent the
    For testing with the data from the 7th generation         knowledge contained in the data used to train the net-
of the evolutionary algorithm, we used only the ﬁve           work. As was already mentioned in the introduction,
MLPs most promising from the point of view of the             such a representation is not comprehensible to hu-
average MSE on the validation data in the ﬁnal itera-         mans, being very far from the symbolic, modular and
tion of boosting. These were the following MLPs:              often vague way they represent knowledge by them-
                                                              selves. Therefore, methods for the extraction of sym-
 – a 1-hidden-layer MLP, with nH = 11 and the
                                                              bolic knowledge from trained neural networks have
   3rd iteration of boosting being the ﬁnal iteration,
 – four 2-hidden-layers MLPs, with (nH1 , nH2 ) =             been investigated since the late 1980s. Most frequently,
   = (10, 4), (10, 6), (13, 5), (14, 8) and the ﬁnal itera-   the extracted knowledge has the form of a Boolean im-
   tions of boosting 19, 32, 31 and 29, respectively.         plication:

For each of them, the validation proceeded as follows:            IF the input variables fulﬁl an input condition CI
 1. In each iteration up to the ﬁnal, a single MLP
                                                                  THEN the output variables are likely
    was trained with data about all the 604 catalytic
    materials used for boosting.                                                to fulﬁl an output condition CO . (1)
                                                            Two ways of using artiﬁcial neural networks . . .   21


Fig. 3. History of the boosting MSE on the data from the 7th generation of the GA for MLPs with the 5 architectures
included in the validation of boosting, decomposed to the properties corresponding to the MLP outputs.


In addition, also implications and equivalences of im-      – An m-dimensional rectangular area R with bor-
portant kinds of fuzzy logic are frequently ex-               ders perpendicular to the m coordinate axes has
tracted [8, 15]. In general, extracted formulas of a for-     to be chosen in advance in the output space of
mal logic are called rules. Over the last two decades,        a trained MLP with sigmoid activation functions.
various rules extraction methods have been proposed           The reason for choosing such an area is that in
for neural networks, but so far none of them has be-          the space of evaluations of m free variables, each
come a common standard (cf. the survey pa-                    m-dimensional rectangular area is the validity set
pers [1, 15, 22] and the monograph [7]). Here, a method       of the conjunction of some m univariate Boolean
for the extraction of Boolean implications from mul-          predicates. That conjunction then serves as the
tilayer perceptrons with n inputs and m outputs will          consequent of the rule to extract.
be sketched that ﬁnds to each output condition of the       – The activation functions in the hidden neurons are
form:                                                         approximated with piecewise-linear sigmoid acti-
                                                              vation functions. This can be done with an arbi-
  CO : the value y of the output variables                    trary precision.
               lies in a rectangular area R ⊂ Rm     (2)    – The products of individual linearity intervals of
                                                              all the activation functions determine areas in the
one or more input conditions of the form                      input space in which the ﬁnal approximating map-
                                                              ping computed by the multilayer perceptron is lin-
  CI : the value x of the input variables                     ear.
                                                            – In each such area, all points mapped to R form
                   lies in a polyhedron P ⊂ Rn       (3)
                                                              a polyhedron, which may eventually be empty or
   Hence, this method extracts rules of the form:             may be concatenated with polyhedra from some
                                                              of the neighboring areas to a larger polyhedron.
               IF x ∈ P THEN y ∈ R.                  (4)    – The union of all the nonempty concatenated poly-
                                                              hedra P1 , . . . , Pq deﬁnes the antecedent of a rule
A detailed explanation of the method can be found             in a combined form
in [9]. Its main principles can be summarized as fol-
lows:                                                                  IF x ∈ P1 ∪ · · · ∪ Pq THEN y ∈ R,       (5)
22      Martin Holeňa

     which is equivalent to a logical disjunction of         – The conditional empirical distribution of the input
     q rules of the simple form (4):                           variables in the available data, conditioned by P .
            IF x ∈ P1 THEN y ∈ R                                  Rules of the form (7) are also very convenient from
                     ...                                  (6) the visualization point of view: Since cuts of rectangu-
                                                              lar areas coincide with the corresponding projections
                IF x ∈ Pq THEN y ∈ R.
                                                              of those areas, the values of no variables need to be
    To increase the comprehensibility of the extracted ﬁxed.
rules, visualization by means of 2- or 3-dimensional
cuts of the set P1 ∪ · · · ∪ Pq can be used (Figure 4).
                                                              3.1 An illustration
    Usually, logical rules of the form (4) are the ﬁ-
nal results of this rule-extraction method. Nonethe- As an example, Figure 5 shows three-dimensional cuts
less, there is one exception – when the polyhedron P determining the antecedents of conjunctive-form rules
is also rectangular with borders perpendicular to axes, extracted from a trained MLP with 5 input neurons
or more generally, when P can be approximately re- and 1 output neuron such that:
placed with such a rectangular area RI in the input (i) the input neurons correspond to variables that re-
space. Then the above rule (4) can be approximately               cord the molar proportions of the oxides of Fe, Ga,
expressed in the conjunctive form                                 Mg, Mn and Mo in the catalytic material;
                                                              (ii) the output neuron corresponds to a variable re-
     IF x1 ∈ I1 & . . . & xnI ∈ InI THEN y ∈ R.           (7)
                                                                  cording propene yield.
Here, I1 , . . . , InI are intervals that constitute the pro- The extracted rules are listed in Table 1.
jections of RI into the nI input dimensions. Each such
interval can be restricted both from below and from
above, restricted only from below or only from above,
or ﬁnally can be even the complete set of real num-
bers. However, dimensions for which the corresponding
projection of RI equals the complete real axis are usu-
ally not included in (7), since they would not provide
any new knowledge. Finally, observe that due to (5)
and (7), the ﬁnal extracted rule is in the disjunctive
normal form.
    In the rule-extraction method outlined above, the
possibility of replacing a polyhedron P with a rectan-
gular area RI is assessed according to the following
principles:
 1. The resulting dissatisfaction with points that
    either belong to P but do not belong to RI , or be-
    long to RI but do not belong to P (i.e., with points
    from the symmetric diﬀerence RI ∆P ), has to re-        Fig. 4. A two-dimensional cut of the union of polyhedra
    main within a prescribed tolerance ε and RI has         from the antecedent of a rule of the form (5) extracted from
    to be minimal in the input space among rectangu-        a trained MLP. The cut corresponds to input variables
    lar areas of some speciﬁed kind with dissatisfacion     recording the molar proportions of oxides of Mn and Ga in
    within that tolerance.                                  the catalytic material, for the consequent ”propene yield
 2. The dissatisfaction with points from RI ∆P de-          > 8%”.
    pends solely on those points and is increasing with
    respect to inclusion. Consequently, it can be mea-
    sured using some monotone measure on the input
    space, possibly depending on P .                        4     Conclusion
 3. To be eligible for replacement, P has to cover at
    least one point of the available data.             The paper dealt with employing feed-forward neural
                                                       networks for knowledge discovery from data about che-
    For 2., the most attractive monotone measures, due
                                                       mical materials. It has shown that in this application
to their straightforward interpretability, are:
                                                       area, obtaining numeric knowledge by neural-network
 – The joint empirical distribution of the input vari- regression is justiﬁed, in spite of the fact that numeric
    ables in the available data.                       knowledge is substantially less human-understandable
                                                                Two ways of using artiﬁcial neural networks . . .      23

        Rule                                  Antecedent                                Consequent
         1             24% < Ga proportion < 33% & 31% < Mg proportion < 39%
                             & Mo proportion < 7% & Fe, Mn proportions = 0
          2                Ga proportion ≈ 36% & 28% < Mg proportion < 38%           C3 H6 yield > 8%
                                      & Fe, Mn, Mo proportions = 0
          3    Fe proportion < 12% & Ga proportion ≈ 38% & 29% < Mg proportion < 36%
                              & Mo proportion < 9% & Mn proportion = 0

Table 1. Antecedents of the rules of the form (7) extracted using the method described in this section for the consequent
”propene yield > 8%” from a trained MLP with 5 input neurons and 1 output neuron, assuming that the above
interpretation of the variables to which those neurons correspond is described by (i) and (ii).


                                                                2. M. Baerns and M. Holeňa: Combinatorial development
                                                                   of solid catalytic materials. Design of high-throughput
                                                                   experiments, data analysis, data mining. World Scien-
                                                                   tiﬁc, Singapore, 2009.
                                                                3. L.A. Baumes, D. Farrusseng, M. Lengliz, and
                                                                   C. Mirodatos: Using artiﬁcial neural networks to boost
                                                                   high-throughput discovery in heterogeneous catalysis.
                                                                   QSAR and Combinatorial Science, 23, 2004, 767–778.
                                                                4. M. Berthold and D. Hand (eds): Intelligent data anal-
                                                                   ysis. An introduction. Springer Verlag, Berlin, 1999.
                                                                5. A.J. Brooker, J. Dennis, P.D. Frank, D.B. Seraﬁni,
                                                                   Torczon V., and M. Trosset: A rigorous framework
                                                                   for optimization by surrogates. Structural and Multi-
                                                                   disciplinary Optimization, 17, 1998, 1–13.
                                                                6. D. Farrusseng, F. Clerc, C. Mirodatos, N. Azam,
Fig. 5. A three-dimensional projection of the union of rect-
                                                                   F. Gilardoni, J.W. Thybaut, P. Balasubramaniam, and
angular areas that replace, following the method described
                                                                   G.B. Marin: Development of an integrated informatics
in this section, the union of of polyhedra from the an-
                                                                   toolbox: HT kinetic and virtual screening. Combina-
tecedent of a combined form rule extracted from a trained
                                                                   torial Chemistry and High Throughput Screening, 10,
MLP. The projection corresponds to input variables re-
                                                                   2007, 85–97.
cording the molar proportion of oxides of Ga, Mg and Mo
                                                                7. A.S.A. Garcez, L.C. Lamb, and D.M. Gabbay: Neural-
in a catalytic material. The numbers 1, 2, 3 refer to the
                                                                   symbolic cognitive reasoning. Springer Verlag, Berlin,
antecedents of the rules in Table 1.
                                                                   2009.
                                                                8. M. Holeňa: Extraction of fuzzy logic rules from data
                                                                   by means of artiﬁcial neural networks. Kybernetika,
than symbolic knowledge. Its justiﬁcation consists in              41, 2005, 297–314.
the possibility to use such knowledge in the optimiza- 9. M. Holeňa: Piecewise-linear neural networks and their
tion tasks entailed by search for new materials in the      relationship to rule extraction from data. Neural Com-
surrogate modelling approach.                               putation,   18, 2006, 2813–2853.
    In addition to justifying the speciﬁc need for nu-  10. M.   Holeňa  and M. Baerns: Computer-aided strategies
meric knowledge from neural network regression in           for  catalyst  development. In: G. Ertl, H. Knözinger,
this application area, the paper recalled the possibil-     F.  Schüth, and  J. Eitkamp, (eds), Handbook of Hetero-
ity to obtain symbolic knowledge in the form of logical     geneous   Catalysis,  Wiley-VCH, Weinheim, 2008, 66-81.
rules from trained neural networks. It explained a re-  11. M.   Holeňa,  D. Linke, and N. Steinfeldt: Boosted neu-
cently proposed method for the extraction of Boolean        ral  networks   in evolutionary computation. In: Neural
rules in disjunctive normal form, and illustrated it on     Information     Processing. Lecture  Notes in Computer
                                                            Science 5864, Springer Verlag, Berlin, 2009, 131–140.
data about catalytic materials.
                                                               12. K. Hornik: Approximation capabilities of multilayer
                                                                   neural networks. Neural Networks, 4, 1991, 251–257.
References                                                     13. Y. Jin, M. Hüsken, M. Olhofer, and B. Sendhoﬀ: Neu-
                                                                   ral networks for ﬁtness approximation in evolutionary
 1. R. Andrews, J. Diederich, and A.B. Tickle: Survey and          optimization. In Y. Jin, (ed.), Knowledge Incorpora-
    critique of techniques for extracting rules from trained       tion in Evolutionary Computation, Springer Verlag,
    artiﬁcical neural networks. Knowledge Based Systems,           Berlin, 2005, 281–306.
    8, 1995, 378–389.
24      Martin Holeňa

14. V. Kůrková: Neural networks as universal approxima-
    tors. In: M. Arbib, (ed.), Handbook of Brain Theory
    and Neural Networks, MIT Press, Cambridge, 2002,
    1180–1183.
15. S. Mitra and Y. Hayashi: Neuro-fuzzy rule generation:
    Survey in soft computing framework. IEEE Transac-
    tions on Neural Networks, 11, 2000, 748–768.
16. S. Möhmel, N. Steinfeldt, S. Endgelschalt, M. Holeňa,
    S. Kolf, U. Dingerdissen, D. Wolf, R. Weber, and
    M. Bewersdorf:       New catalytic materials for the
    high-temperature synthesis of hydrocyanic acid from
    methane and ammonia by high-throughput approach.
    Applied Catalysis A: General, 334, 2008, 73–83.
17. Y.S. Ong, P.B. Nair, A.J. Keane, and K.W. Wong:
    Surrogate-assisted evolutionary optimization frame-
    works for high-ﬁdelity engineering design problems. In:
    Y. Jin, (ed.), Knowledge Incorporation in Evolution-
    ary Computation, Springer Verlag,Berlin,2005,307-331.
18. A. Pinkus: Approximation theory of the MPL model
    in neural networks. Acta Numerica, 8, 1998, 277–283.
19. A. Ratle: Accelerating the convergence of evolution-
    ary algorithms by ﬁtness landscape approximation.
    In: A.E. Eiben, T. Bäck, M. Schoenauer, and H.-
    P. Schwefel, (eds), Parallel Problem Solving from Na-
    ture, Springer Verlag, Berlin, 1998, 87–96.
20. A. Ratle: Kriging as a surrogate ﬁtness landscape in
    evolutionary optimization. Artiﬁcial Intelligence for
    Engineering Design, Analysis and Manufacturing, 15,
    2001, 37–49.
21. U. Rodemerck, M. Baerns, and M. Holeňa: Applica-
    tion of a genetic algorithm and a neural network for
    the discovery and optimization of new solid catalytic
    materials. Applied Surface Science, 223, 2004, 168-174.
22. A.B. Tickle, R. Andrews, M. Golea, and J. Diederich:
    The truth will come to light: Directions and challenges
    in extracting rules from trained artiﬁcial neural net-
    works. IEEE Transactions on Neural Networks, 9,
    1998, 1057–1068.
23. H. Ulmer, F. Streichert, and A. Zell: Model-assisted
    steady state evolution strategies. In: GECCO 2003: Ge-
    netic and Evolutionary Computation, Springer Verlag,
    Berlin, 2003, 610–621.
24. H. Ulmer, F. Streichert, and A. Zell: Model assisted
    evolution strategies. In: Y. Jin, (ed.), Knowledge Incor-
    poration in Evolutionary Computation, Springer Ver-
    lag, Berlin, 2005, 333–355.
25. S. Valero, E. Argente, V. Botti, J.M. Serra, P. Serna,
    M. Moliner, and A. Corma: DoE framework for cat-
    alyst development based on soft computing techniques.
    Computers and Chemical Engineering, 33, 2009, 225-
    238.
26. D. Wolf, O.V. Buyevskaya, and M. Baerns: An evo-
    lutionary approach in the combinatorial selection and
    optimization of catalytic materials. Applied Catalyst
    A: General, 200, 2000, 63–77.
27. Z.Z. Zhou, Y.S. Ong, P.B. Nair, A.J. Keane, and K.Y.
    Lum: Combining global and local surrogate models to
    accellerate evolutionary optimization. IEEE Transac-
    tions on Systems, Man and Cybernetics. Part C: Ap-
    plications and Reviews, 37, 2007, 66–76.