=Paper= {{Paper |id=Vol-2438/paper9 |storemode=property |title=Fine Tuning Run Parameter Values in Rule-Based Machine Learning |pdfUrl=https://ceur-ws.org/Vol-2438/paper9.pdf |volume=Vol-2438 |authors=Sotiris Moschoyiannis,Vasily Shcherbinin |dblpUrl=https://dblp.org/rec/conf/ruleml/MoschoyiannisS19 }} ==Fine Tuning Run Parameter Values in Rule-Based Machine Learning== https://ceur-ws.org/Vol-2438/paper9.pdf
              Fine tuning run parameter values
               in rule-based machine learning?

                  Sotiris Moschoyiannis1 and Vasily Shcherbinin2
         1
           Department of Computer Science, University of Surrey, Guildford,
               Surrey, GU2 7XH (s.moschoyiannis@surrey.ac.uk)
         2
           Department of Computer Science, University of Surrey, Guildford,
                 Surrey, GU2 7XH (v.shcherbinin@surrey.ac.uk)



        Abstract. Rule-based machine learning focuses on learning or evolving
        a set of rules that represents the knowledge captured by the system.
        Due to its inherent complexity, a certain amount of fine tuning is re-
        quired before it can be applied to a particular problem. However, there
        is limited information available to researchers when it comes to setting
        the corresponding run parameter values. In this paper, we investigate
        the run parameters of Learning Classifier Systems (LCSs) as applied to
        single-step problems. In particular, we study two LCS variants, XCS for
        reinforcement learning and UCS for supervised learning, and examine
        the effect that different parameter values have on enhancing the model
        prediction, increasing accuracy and reducing the resulting rule set size.

        Keywords: condition-action rules · learning classifier systems · rein-
        forcement learning · supervised learning · fine tuning LCS


1     Introduction
The key feature behind Rule-based machine learning (RBML) is that it evolves
arounds rules and therefore the outcome is interpretable by both computers
and humans. RBML approaches include learning classifier systems (LCSs), as-
sociation rule learning and artificial immune systems. The focus here is on LCSs
which have been used in a variety of settings, from games [1] to industrial control
[2] to policy [3] to transport [4] to controlling Boolean networks [5].
    However, rule-based approaches involve a number of parameters and the qual-
ity of the outcome largely relies on assigning them appropriate values. There is
distinct lack in guidance on how the run parameters should be set in order
to build a good LCS, for both reinforcement learning and supervised learning
problems. Here, the investigation is aimed at single-step problems; that is, learn-
ing problems where the reward from effecting the action on the environment is
known immediately rather than after several steps, as in multi-step problems.
?
    This research was partly funded by EIT Digital IVZW, Real–Time Flow project,
    activity 18387–SGA2018. We thank Matthew R. Karlsen for fruitful discussions.
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
    In this paper we study the interplay of the LCS with the data being learned
in benchmark (single-step) problems. In particular, we focus the investigation
on XCS, the most popular LCS-variant for reinforcement learning, and UCS
the classic variant for supervised learning problems. UCS was specifically built
for knowledge extraction (learning patterns in complex datasets) and provides a
contrast to reinforcement learning XCS, which allows to see the effects of various
parameter values from a different LCS angle.
    We focus on the run parameters suggested in seminal work in the field of LCS
such as [6],[7] and use the values therein as a starting point. Neither work gives
information on the source of the values. We aim to fill this gap in this paper by
providing explicit guidelines on how to set the values of these key parameters of
LCS for both reinforcement and supervised learning. We test both algorithms
against balanced and unbalanced, generalisable problems.
    The remainder of the paper is structured as follows. Section 2 provides a
brief overview of LCS, XCS and UCS. Section 3 introduces the two benchmark
problems used in the study while Section 4 outlines the methodology and ex-
periments we carried out. In Section 5 we discuss the results for both learning
algorithms and provide explicit guidelines for setting their run parameters. We
conclude and consider future work in Section 6.


2    Rule-based Machine Learning: an overview of LCSs

Learning classifier systems (LCSs) evolve around rules - matching, predicting,
updating, discovering. A rule follows an IF : THEN expression and comprises
two parts; the condition (or state) and the action (or class or phenotype).
    A rule is associated with a number of parameters: (i) the reward prediction
p estimates the reward if this rule’s action is eventually effected on the environ-
ment, (ii) the prediction error , and (iii) a fitness F which reflects the quality of
the rule. These parameters are updated in every iteration of the learning cycle
(cf Section 2.1, 2.2). A classifier is a rule with associated parameters.
    The condition of a rule is expressed in a ternary alphabet; an attribute can
have a value of 0 or 1, but also a # which stands for the ’don’t care’ value, i.e.,
it can match both 0 or 1. This is what allows to generalise rules and make them
applicable to more than one instance - this in turn allows to capture useful data
relationships without overfitting the rules to training data. For example, the
rules 110:1 and 111:1 can be generalised to 11#:1 (e.g., cloudy and windy points
to taking an umbrella irrespective of evening or morning). The action part of a
rule represents the predicted or recommended action.
    The LCS functional cycle, shown in Figure 1 comprises three components:
the environment, the learning machine and, optionally, rule compression (post-
processing). The environment provides as input a series of sensory situations –
typically abstracted to a dataset containing so-called training instances.
    The first step of the learning cycle is to receive input from the environment
(1). The next step is to identify and choose all rules in the rule population [P]
whose condition part matches that of the training instance (environment input)
           Fig. 1. The learning cycle for a generic LCS, adapted from [7].



(2). Matching compares the current input to all rules in the population [P]. If
all specified attributes match, the rule is moved to the match set [M] (3).
    The next step produces the action set [A] (4) which comprises all actions
associated with rules whose condition matched the input. This set forms the
input used in the GA and also specifies which rules are updated after each
iteration. We will have more to say on this in Sections 2.1 and 2.2.
    In addition to the parameters p, , F discussed earlier, rules in [P] are asso-
ciated with parameters: θage - the number of iterations that occurred after the
rule was created; θexp - the number of times the rule belonged to the action set
[A]; actionSetSize - the size of the action set to which the rule belonged; n -
the number of copies of a rule appearing in the population, called numerosity.
    If the action set is empty, or the number of actions in it do not exceed the
pre-set maximum number of actions, θmna , the Covering mechanism is applied
(5) – this allows to introduce new rules to the population [P] (rule discovery).
During the process, some of the attributes in the rule’s condition and action
are generated using values taken from the current environment input; the rest
are set to #, with probability P #. Since LCS rule populations start off empty,
covering is also used for population initialisation. If there is at least one action
in the action set, the next step is to update the rule parameters for the classifiers
that have made it into the action set [A] (6) (cf Sections 2.1 and 2.2).
    Additionally, a mechanism called Subsumption is often applied (7) in which
pairs of rules are examined and whenever a rule (subsumer) matches all attributes
of the other rule, is more general and just as accurate it absorbs this other, more
specific, rule. An error threshold 0 determines the maximum error a rule may
have and still be able to subsume another.
    Next, a genetic algorithm (GA) is applied (8). It operates on the action set
[A], with a frequency determined by θga to produce new rules by crossover and
mutation. When activated, the GA runs just once. Parent rules are selected
to produce two child rules. With probability χ a two-point crossover operates
on these offspring rules. Each integer pair is then mutated with a probability µ.
Several selection mechanisms exist - recent literature suggests preference towards
tournament selection, where a portion of classifiers is randomly selected from [A]
and the classifier with the highest fitness is chosen to be the parent.
    The last step involves a deletion mechanism which is applied to cap the
population at N rules (9). Whenever the population size, which is the sum of
all rule numerosities, exceeds N , rules are deleted from the population until
the size limit is met. Classifier experience θexp must exceed a threshold θdel or
fitness to impact deletion likelihood. The population cap, combined with the
selective deletion of rules, results in an overall drive to improve the quality of
the population.
    Optionally, a post-processing step is applied to the rule population after
training. This is known as rule compaction (see Fig. 1) and it aims to remove
rules that are inexperienced, inaccurate, have a low fitness but whose error 
does not exceed the specified limit 0 .


2.1   eXtended Classifier System (XCS)

LCS can function by receiving only the utility of the function effected on the
environment (reinforcement learning). The eXtended Classifier System (XCS)
is an accuracy-based LCS developed originally by [8] to target reinforcement
learning problems. The key differentiating characteristics of XCSs follow.
    Firstly, once the match set [M] has been created, a prediction array is cre-
ated, which supplies the fitness-weighted predicted rewards for each of the actions
within [M]. This is used in action selection – in explore mode, with probability
pexpl , the system randomly selects from the available actions in [M]; otherwise,
in exploit mode it chooses the action with the highest predicted reward p as indi-
cated by the array. Certain rules within the match set (more than one possibly)
contain the action that was selected. These rules form the action set [A] (Fig 1).
    Secondly, there is a feedback mechanism which detects the effect of the se-
lected action when executed on the environment and grants a particular reward
or payoff. On a single-step problem such as those addressed in this paper, this
reward is used directly in updating the rule parameters.
    The updating can only happen in [A]. Parameter updates follow the time-
weighted recency average (TWRA) principle [7], which can be described as:

 AverageN ew = AverageCurrent + β(V alueCurrent − AverageCurrent )             (1)

This is directly inspired by the Widrow-Hoff update, which is commonly used in
artificial neural networks (ANNs) for updating the weights of the inputs:

                           νi+1 = νi + β(ui − νi )                             (2)
When the value ui is received, the equation above is used to record the value of
variable νi at each iteration i. The value of νi+1 would thus be a combination of
the old value of ν and the newly received value of u. The split balance between
using old and new values is controlled by β which is called the learning rate
parameter. If β is 0, then no learning takes place; if β = 1, then only the new
value is stored and there is no recollection of the past values. The optimal value
for β lays therefore in the range from [0, 1]. The reward prediction p is given by:

                 p ← p + β ∗ (r − p),                  0≤β≤1                       (3)

where r is the reward received from the environment. The error  between the
reward received by the environment and the predicted reward is given by:
                      
                       + (|r − p| − ),
                      
                                              if θexp <
                                                        1
                 ←             θexp                    β                (4)
                       + β ∗ (|r − p| − ), otherwise
                      

following the same concept of split balance described earlier for p, but this time
applied to  (cf Fig. 6). Fitness of a rule is based on accuracy [6], given by:
                             
                             1,              if  < 0
                        κ=            −ν                                       (5)
                             α ∗ ( ) , otherwise
                                    0
where 0 is a predefined error threshold, as discussed before. If the error  asso-
ciated with this rule (classifier’s error) is greater than the error threshold 0 , its
accuracy decreases, based on α and ν. When α is set to a low value (approaching
0), there is going to be an acute decrease in the accuracy value; when α is set
to a higher value, ν drives the rate of increase in accuracy. If ν is set to a high
value, the accuracy will decrease fast; if ν is set to a low value, accuracy will
drop slowly. A form of fitness-sharing is applied in XCS given by:

                                       κ
                                 κ0 = P                                            (6)
                                                 κcl
                                        cl∈[A]


where cl denotes a classifier in the action set [A], κcl the accuracy associated
with cl, and κ the accuracy of the present classifier.
   The fitness classifier (cf Fig. 6) can thus be calculated and updated using:

                              F ← F + β(κ0 − F )                                   (7)

2.2   sUpervised Classifier System (UCS)
The sUpervised Classifier System (UCS) is an accuracy-based LCS developed by
[9] specifically for supervised learning problems. In supervised learning, the cor-
rect action is available from the environment. This allows UCS to focus on data
mining and knowledge discovery in database type problems, e.g., performance
analysis of CBA in [10]. UCS and XCS share key principles such as a niche GA
and defining fitness based on accuracy, but also have key differences.
    Firstly, since every environment input comes with the correct action in super-
vised scenarios, the match set [M] is split into two sets - a correct set [C] and an
incorrect set [I] (recall step (4) in Fig. 1), depending on whether the population
rule’s action also matches the current input action. If the current action is not in
[M], covering is triggered to create a rule that matches the current action. The
ability to cover without guessing a suitable action in UCS is also beneficial.
    Secondly, the fact the correct action is knows simplifies the reward function.
The measure of accuracy is simplified as the performance of each rule is explic-
itly available. Each classifier in UCS has an additional parameter, correctTrack,
which is incremented every time a classifier is part of the correct set [C]. The
classifier accuracy replaces the reward prediction p in XCS and is calculated as:

                                         correctT rack
                            accuracy =
                                             θexp

Fitness in UCS is simpler than XCS [7] and is given by: f itness = accuracy ν
where ν determines the pressure on the rules to be accurate (recall Sec. 2.1).


3     Benchmark problems

In this section we describe the benchmark problems used in this research.


3.1   Mutliplexer

The Boolean n-bit Multiplexer defines a set of learning problems that conceptu-
ally describe the behaviour of an electronic multiplexer (MUX), a device that
can combine multiple analog or digital input signals and forward them into a
single output signal [11]. The 6-bit multiplexer is shown in Figure 2.




           Fig. 2. 6-bit Multiplexer class determination - example run [7]
   Multiplexer problems exhibit epistasis3 (class is determined via interaction
between multiple features, rather than a single one) and also heterogeneous4
behaviour (for different instances, the class value is determined by a different
subset of features) [7]. This elevates the complexity of the problem due to the
non-linear association between the condition and the corresponding action.
   In addition, multiplexer problems contain knowledge-based patterns. Figure
3 shows the entire domain can be separated into 8 maximally general rules -
theoretically this is the minimum (optimal) number of rules that the LCS must
generate to completely cover the problem. Figure 4 shows the class symmetry,




Fig. 3. 6-bit Multiplexer patterns - address bits on vertical axis; register bits on hori-
zontal; cells - actions. Shades represent rules that cover a pattern within the domain.


which is characteristic of a balanced problem. The multiplexer benchmark prob-




               Fig. 4. 6-bit Multiplexer patterns - domain class symmetry


lem was chosen for use in this project as an example of a balanced, generalisable
problem. Specifically, the 11-bit Multiplexer was used.

3.2     Position
The position problem benchmark is an example of an unbalanced multi-class
problem where, given a binary string of length l as input, the output is deter-
mined by the value of the index position of the left-most one-valued bit. Although
3
    Type of gene interaction - a single gene is under the influence of other distinct genes.
4
    To be understood here as consisting of different or diverse components.
being a generalisable problem, the position benchmark differs from other bench-
marks as it is unbalanced - this is clearly seen in Figure 5, where there are
much more values for class 0 than for class 2, for example. The 9-bit Position
benchmark was used in this paper.




    Fig. 5. 6-bit Position problem - example of an unbalanced generalisable problem




4     Experiments: methodology and implementation5

We started with the ”recommended” parameter values suggested by [7,6] and
then altered every parameter consecutively, changing value from the smallest to
the highest recommended value whilst keeping other parameters constant. For
every experiment, the resulting model cross-validation accuracy, number of rules
in [P] and run time were recorded. Each experiment was repeated 10 times to
minimise the LCS stochastic effect and the averaged values recorded.
    k-Fold Cross-Validation was used to evaluate XCS and UCS models using
a set of training data. The most common values for the number of folds k are 5
and 10 [12], as they have been shown ”empirically to yield test error rate esti-
mates that suffer neither from excessively high bias nor from very high variance”
[13]. In our case, k = 5 was chosen, as it allowed to generate sufficiently large
training and testing data sets, representative of the data being learned.
    The XCS algorithm implementation was created following closely [6], which,
since its publication, has become the standard to follow when implementing
XCS. Minor modifications in our implementation:
1. Order of updating rule parameters changed for convenience - see Fig. 6.
5
    The codebase used for our experiments is available on Gitlab by pointing your
    browser to: https://gitlab.eps.surrey.ac.uk/vs00162/lcs-suite
2. Calculation of accuracy is done via a power law function, as proposed in [14],
rather than via an exponential function used in the original.
3. Covering occurs based on classifiers in [M], rather than the mean prediction
of the population suggested in [15], to increase program execution speed.
4. Genetic algorithm is applied to the action set, as has been described in [16].
For UCS, the implementation follows [9].




        Fig. 6. Updating XCS prediction payoff, prediction error and fitness


    The run parameters investigated in this work can be seen in Figure 7.




                Fig. 7. Run parameters for LCS under investigation




5    Results - Guidelines for run parameters values

Parameter I - number of iterations.
The iteration parameter (I) specifies the maximum number of learning iterations
before the evaluation of the algorithm commences. During a single iteration, a
input is received from the environment and the LCS performs a learning cycle
over it. The learning continues until either a specific number of iterations is
reached or the training accuracy reaches a certain threshold. Here, we opted for
the former as this allows for a more fair test and comparison.
   Results shown in Figures 8 and 9 demonstrate the relationship between num-
ber of iterations and accuracy, rule set size, and time for both benchmark prob-
lems, for both XCS (reinforcement learning) and UCS (supervised learning). It




x

     Fig. 8. I in XCS (left) and UCS (right) on the 11-bit multiplexer problem.




       Fig. 9. I in XCS (left) and UCS (right) on the 9-bit Position problem


can be seen that the number of iterations and the time it takes to complete the
learning cycle are directly proportional. This is true for both benchmark prob-
lems and is not affected by the algorithm type. The increase in the number of
iterations causes a decrease in the amount of rules generated, indicating that the
population is becoming more optimised and generalised. It also has a positive
effect on the accuracy of the algorithm.
    Guideline. For problems where the input has a length of 20 attributes or less,
acceptable results may be achieved with [10,000-50,000] iterations. For larger
and more complex problems, a larger number of iterations should be considered.
Parameter N - Rule population size.
The maximum population size parameter (N ) specifies the allowed maximum
number of rules in the population. Once N is exceeded, the deletion mechanism
is activated. It is usually one of the first parameters to be tweaked in an LCS
and has an important influence on the result obtained by the system.
    Results shown in Figures 10, 11 demonstrate the relationship between number
of iterations and accuracy, rule set size, and time for both benchmark problems.




    Fig. 10. N in XCS (left) and UCS (right) on the 11-bit multiplexer problem.




      Fig. 11. N in XCS (left) and UCS (right) on the 9-bit Position problem.


    It can be seen that an increase in the population size has yielded a dramatic
increase in accuracy, following a logarithmic pattern. In the position problem it
is almost constant (see Fig. 11 while a sharp rise is observed when N is smaller
than 500. The results allow to draw several conclusions:
(1) Parameter N plays a vital role in a learning classifier system and it is crucial
to get the value correct for the system to yield adequate results.
(2) If N is set too low, the LCS will not be able to maintain good classifiers and
generate a good model, resulting in low accuracy of prediction of new instances.
(3) If N is set too high, unnecessary computational resources will be consumed
without any benefit. This is not only inefficient but can also lead to longer time
for the algorithm to converge to an optimal solution.
The challenge therefore is to find such value of N that it is as low as possible,
but at the same time provides adequate accuracy.
    Guideline. N in the range of [1000, 4000] will yield adequate accuracy for
most classification tasks, with smaller problems requiring the lower-end values
and more large and complex problems the higher-end. If rule compression is
applied, the rule population size will need to be set on the generous side of the
proposed range.
Parameter P # - ”don’t care” probability.
The parameter (P #) specifies the probability of inserting a ”don’t care” symbol,
denoted by #, into a rule classifier that has been generated during the covering
stage. Results shown in Figures 12 and 13 demonstrate the relationship between
P #, algorithm accuracy, rule set size and time elapsed - without compression.




   Fig. 12. P # in XCS (left) and UCS (right) on the 11-bit multiplexer problem.




     Fig. 13. P # in XCS (left) and UCS (right) on the 9-bit Position problem.


The challenge is to set P# to a value which will enable the system to generate
rules in the population that are not overly-general (such classifiers may maintain
high fitness and succeed in certain cases, but may lead to degradation of overall
performance [5]), but at the same time not overly-specific, since this will require
more rules to provide a result that covers the entire problem map, which in turn
might require an increase of N , leading to unnecessary increase in running time.
    Guideline. For problems that are not evidently generalisable, P # must be
set low – in the range of [0, 0.3], with more complex situations requiring P # to
be set as low as 0. For problems with cleanly generalisable patterns or where high
generality is needed, P # should be set at the higher end of the range, i.e, [0.7,
0.99]. In situations where determination of P # is too difficult, rule specificity
limit (RSL) [17] is recommended.
Parameter ν - Fitness exponent.
The fitness exponent parameter (ν) is central to updating the fitness of a rule.
Results shown in Figures 14 and 15 demonstrate that optimal values for ν for




    Fig. 14. ν in XCS (left) and UCS (right) on the 11-bit multiplexer problem.




      Fig. 15. ν in XCS (left) and UCS (right) on the 9-bit Position problem.


XCS and UCS differ. For XCS, optimal value appears to be ν = 5; at this value,
best balance between accuracy, rules generated and run time can be achieved.
For UCS, results demonstrate that as the fitness exponent increases, algorithm
accuracy increases, whilst the number of devised rules decreases even when rule
compaction has not been switched on.
    Guideline. Optimal values of ν for XCS and UCS differ. For XCS, optimal
value appears to be ν = 5, whilst for UCS working on cleanly generalisable
problems and data, ν must be set high, closer to ν = 10.
Parameter χ - Crossover probability.
The crossover probability parameter (χ) determines the probability of applying
crossover to some selected classifiers during activation of the genetic algorithm.
Results shown in Figures 16 and 17 demonstrate that increase in χ results in
considerable decrease of the algorithm accuracy - best results are achieved when
χ = 0.7. The number of rules is also decreasing. This can be explained by the
fact that by increasing χ the algorithm is encouraged to generate rules that
    Fig. 16. χ in XCS (left) and UCS (right) on the 11-bit multiplexer problem.




      Fig. 17. χ in XCS (left) and UCS (right) on the 9-bit Position problem.


will be more generalised, so more rules will be subsumed during the compression
stage. The caveat is that more general rules do not necessarily guarantee a better
resulting algorithm accuracy.
    Guideline. If rule compression is not used, values in the range of [0.7, 0.9]
will yield appropriate results; otherwise, the recommended value for χ is 0.7.
Parameter µ - Mutation probability.
The mutation probability parameter (µ) determines the probability of mutation
for every attribute of a classifier generated by the genetic algorithm. A range
of values from [0.2, 0.5] was investigated for this parameter, as per recommen-
dations in [7]. Results shown in Figures 18 and 19 demonstrate that as the




    Fig. 18. µ in XCS (left) and UCS (right) on the 11-bit multiplexer problem.


value of µ increases, the accuracy decreases, the rule count grows and the com-
putation time increases. Also, note that for the multiplexer problem (with or
without compression), the optimum value for µ appears to be 0.05, resulting in
a prediction accuracy of 100%, which is a stronger result than when µ was set
in the [0.2, 0.5] range. A similar trend was observed for the position problem.
    Guideline. µ proved to be domain specific, hence no single value can be rec-
                                                                   1
ommended; rather, µ must be calculated using the formula µ = , as suggested
                                                                   l
in [18], where l is the number of attributes in the condition of the rules.
    Guidelines for other parameters. β = 0.1 must be used for problems
where noise is present, data is unbalanced, complexity is high, whilst β = 0.2
must be used for clean, straight-forward problems. The parameters α must be
set to 0.1. For clean problems, 0 must be set low, in the range [0, 0.01] – for
      Fig. 19. µ in XCS (left) and UCS (right) on the 9-bit Position problem.


complex problems and problems that are expected to yield a high amount of
error for each classifier, 0 must be set higher.


6   Concluding remarks and future work
The reader may notice an alignment of our guidelines for certain parameters with
suggestions for XCS in [7] and [6], which effectively comprise the only closely
related work. Therefore, it is worth noting that i) in addition to XCS, super-
vised learning (UCS) has also been addressed here, and ii) different benchmark
problems have been considered, so the alignment reinforces our guidelines. In ad-
dition, we report on how rule-based machine learning algorithms, such as XCS
and UCS, react to a problem with in-built class imbalance (recall Fig. 5).
    The codebase used for our experiments is available on Gitlab by pointing
your browser to: https://gitlab.eps.surrey.ac.uk/vs00162/lcs-suite .
    Previous work has been concerned with the application of rule-based machine
learning to learning individual passengers’ preferences and making personalised
recommendations for their onward journeys, in the event of major disruption
[4], [19]. In this context, XCS was adapted to work with integer valued rather
than boolean parameters, resulting in the so-called XCSI, as most variables
in the problem domain were not binary. XCS and reinforcement learning has
been applied to the problem of controllability in complex networks and more
specifically to the control of random Boolean networks (RBNs) in [5]. In brief,
XCS can be successfully used to evolve rule sets that can direct the network
from any state to a target attractor state, or switch between attractor states.
The network is understood to operate effectively at attractor states, hence this
notion of controllability in complex networks. The rule-based machine learning
algorithm in this case learns to develop a series of interventions, at most one at
each time step, to successfully control the underlying network in what is a multi-
step problem. The work in [20] then investigates whether the machine learning
algorithm produces optimal paths to the specified attractor state.
    There was little guidance in the literature in setting the run parameter values
for XCSI [21] in the one-step problem (of making the correct recommendation to
a passenger) as well as for XCS in the multi-step problem of directing a random
boolean network to an attractor state. This was in part the motivation for the
work presented in this paper. The lack of guidance on setting the run parameter
values can be a barrier to adopting a rule-based machine learning approach and
hence not being able to capitalise on the fact the outcome in this form of machine
learning is interpretable.
    Latest developments in applications of rule-based machine learning include
ExTRaCS [22], which is a slimmed down version of rule-based machine learn-
ing for supervised learning problems, as well as attempts to control dynamical
systems such as predator-prey models or the well-known S-shaped growth pop-
ulation model [23]. This kind of systems involve real rather than boolean valued
parameters and in a certain important sense control becomes continuous rather
than discrete. In this context, an adaptation of XCS is necessary so as to work
with real valued parameters, the so-called XCSR [14]. The challenge for XCSR
is to identify the initialisation of the model which leads to a desired state. This
is possible after determining the control or driver nodes [24] and the series of
interventions needed for steering the network to that state.
    Preliminary investigations in [25] have shown that XCSR can control non-
linear systems of order 2 by appealing to adaptive action mapping [26]. Further
work is required to overcome issues of scaling up and the action chain learning
problem, which is intrinsic in reinforcement learning in multi-step problems.
    The RuleML hub for interoperating structured knowledge [27] considers
condition-action rules, similar to those found in rule-based machine learning
and LCS. This opens up the spectrum for considering ‘web-based AI’ type of
architectures and applications.
    Another possible direction for future work concerns evolving DMN rules using
XCS. DMN-based rules [28] are expressive but may become extremely complex
for a human to construct. Using XCS to evolve DMN rules could produce complex
and adaptive rule sets without the need for human intervention in the system.

References
 1. K. Shafi and H. Abbass, “A survey of learning classifier systems in games,” IEEE
    Computational Intelligence Magazine, vol. 12, pp. 42–55, 02 2017.
 2. W. N. L. Browne, “The development of an industrial learning classifier system
    for data-mining in a steel hop strip mill,” in Applications of Learning Classifier
    Systems, pp. 223–259, Springer Berlin, 2004.
 3. P. Krause, A. Razavi, S. Moschoyiannis, and A. Marinos, “Stability and complex-
    ity in digital ecosystems,” in IEEE Digital Ecosystems and Technologies (IEEE
    DEST), pp. 85–90, 2009.
 4. M. R. Karlsen and S. Moschoyiannis, “Learning condition–action rules for person-
    alised journey recommendations,” in RuleML + RR 2018, LNCS 11092, pp. 293–
    301, Springer, 2018.
 5. M. R. Karlsen and S. Moschoyiannis, “Evolution of control with learning classifier
    systems,” Applied Network Science, vol. 3, pp. 3–30, Aug 2018.
 6. M. V. Butz and S. W. Wilson, “An algorithmic description of xcs,” Soft Computing,
    vol. 6, no. 3, pp. 144–153, 2001.
 7. R. J. Urbanowicz and W. N. Browne, Introduction to learning classifier systems.
    Berlin, Germany. Springer, 2017.
 8. M. V. Butz and S. W. Wilson, “An algorithmic description of XCS,” in Interna-
    tional Workshop on Learning Classifier Systems, pp. 253–272, Springer, 2000.
 9. E. Bernadó-Mansilla and J. M. Garrell-Guiu, “Accuracy-based learning classifier
    systems: Models, analysis and applications to classification tasks,” Evolutionary
    Computation, vol. 11, no. 3, pp. 209–238, 2003.
10. J. Filip and T. Kliegr, “Classification based on associations (CBA) - A performance
    analysis,” in Proceedings of RuleML+RR 2018 Challenge, 2018.
11. T. Dean, Network+ Guide to Networks. Course Tech Cengage Learning, 2013.
12. M. Kuhn and K. Johnson, Applied predictive modeling. Springer, NY, USA., 2016.
13. G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical
    learning: with applications in R. Springer, New York, NY, USA, 2015.
14. S. W. Wilson, “Get real! XCS with continuous-valued inputs,” in Learning Classi-
    fier Systems, pp. 209–219, Springer, 2000.
15. S. W. Wilson, “Classifier fitness based on accuracy,” Evolutionary Computation,
    vol. 3, pp. 149–175, June 1995.
16. S. W. Wilson, “Generalization in the xcs classifier system,” in 3rd Annual Genetic
    Programming Conference (GP-98), 1998.
17. R. J. Urbanowicz and J. H. Moore, “Exstracs 2.0: description and evaluation of
    a scalable learning classifier system,” Evolutionary Intelligence, vol. 8, no. 2-3,
    p. 89–116, 2015.
18. K. Deb, “Multi-objective optimisation using evolutionary algorithms: An introduc-
    tion,” in Multi-objective Evolutionary Optimisation for Product Design and Man-
    ufacturing, pp. 3–34, Springer, 2011.
19. M. R. Karlsen and S. Moschoyiannis, “Customer Segmentation of Wireless Trajec-
    tory Data,” in Technical Report, University of Surrey, 2019.
20. M. R. Karlsen and S. K. Moschoyiannis, “Optimal control rules for random boolean
    networks,” in International Workshop on Complex Networks and their Applica-
    tions, pp. 828–840, Springer, 2018.
21. S. W. Wilson, “Compact rulesets from XCSI,” in International Workshop on Learn-
    ing Classifier Systems, pp. 197–208, Springer, 2001.
22. R. J. Urbanowicz and J. H. Moore, “Exstracs 2.0: description and evaluation of
    a scalable learning classifier system,” Evolutionary intelligence, vol. 8, no. 2-3,
    p. 89–116, 2015.
23. Business Dynamics: System Thinking and Modeling fro a Complex World,
    publisher=McGraw-HJill Higher Education, author=Sterman, J. D., year=2000.
24. S. Moschoyiannis, N. Elia, A. Penn, D. J. B. Lloyd, and C. Knight, “A web-
    based tool for identifying strategic intervention points in complex systems,” in
    Proceedings of the Games for the Synthesis of Complex Systems (CASSTING’16
    @ ETAPS 2016), vol. 220 of EPTCS, pp. 39–52, 2016.
25. V. B. Georgiev, M. R. Karlsen, L. Schoenenberger, and S. Moschoyiannis, “On
    controlling non-linear systems with XCSR,” in 13th SICC Workshop – Complexity
    and the City, 2018.
26. L. P. T. K. Nakata, M., “Selection strategy for XCS with adaptive action mapping,”
    in 15th Annual Conference on Genetic and Evolutionary Computation, 2016.
27. H. Boley, “The RuleML Knowledge-Interoperation Hub,” in RuleML 2016, LNCS
    9718, pp. 19–33, Springer, 2016.
28. D. Calvanese, M. Dumas, F. M. Maggi, and M. Montali, “Semantic DMN: For-
    malizing decision models with domain knowledge,” in RuleML+RR 2017, LNCS
    10364, pp. 70–86, Springer, 2017.