<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tuning Hyperparameters of Classification Based on Associations (CBA)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomáš Kliegr</string-name>
          <email>tomas.kliegr@vse.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslav Kucharˇ</string-name>
          <email>jaroslav.kuchar@fit.cvut.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Knowledge Engineering, Faculty of Informatics and Statistics University of Economics</institution>
          ,
          <addr-line>W. Churchill Sq. 1938/4, Prague 3</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Web Intelligence Research Group, Faculty of Information Technology Czech Technical University in Prague</institution>
          ,
          <addr-line>Thákurova 9, 160 00, Prague 6</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Classification models composed of crisp rules provide excellent explainability. The limitation of many conventional rule learning algorithms is the separate-andconquer strategy, which may be slow on large data. Association Rule Classifiers (ARC) is an alternative approach that can be very fast on massive datasets but is highly susceptible to the correct choice of metaparameters. Most existing ARC algorithms use default thresholds of 50% for minimum confidence and 1% minimum support, which can result in excessively long rule generation or underperforming models. Due to the high-costs that can be associated with evaluation of single combination, it is impractical to use standard metaparameter optimization approaches. In this paper, we introduce two variant threshold tuning algorithms specifically designed for ARC. Evaluation on 22 standard UCI datasets shows promising results in terms of model size and accuracy in comparison with the default thresholds. The implementation of the proposed algorithms is made available in R packages rCBA and arc, which are available in the CRAN repository.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Association rule classifiers (ARC) are formed by
selecting a subset of rules from a high number of candidates,
which are generated by association rule learning
algorithms known for their excellent performance on big and
sparse datasets. The large base of candidate rules or
frequent itemsets provides opportunities for achieving a good
balance between predictive performance and
interpretability of the produced models.</p>
      <p>
        An ARC algorithm has two fundamental steps:
candidate generation, and building of a classifier by selecting a
subset of the generated candidates. While most research
has focused on the classifier building phase, the candidate
generation phase has not received much attention. Most
ARC algorithms including state-of-the-art approaches like
Interpretable Decision Sets (IDS) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Scalable Bayesian
Rule Lists (SBRL) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or Bayesian Rule Sets (BRS) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
rely on simple heuristics for generating the candidates,
such as step-wise increases in support threshold by 5%
until a fixed desired number of candidate frequent itemsets is
reached.
      </p>
      <p>
        Candidate generation can fundamentally affect all facets
of ARC models, including speed of model building, size
of the generated models, and particularly the predictive
performance. In this paper, we provide two alternative
approaches to rule generation. We focus on approaches
applicable to the rule generation step of the
Classificationbased on Associations (CBA) algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. While there
are newer approaches, CBA is still one of the best
rulebased classification algorithms that concerns balance
between comprehensibility of the model, predictive power
and scalability [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The two tuning algorithms that we describe are based
on different principles. The first approach is a heuristic,
which aims to produce a user-set number of rules by
varying minimum support, minimum confidence, and
maximum antecedent length thresholds. The second approach
is a supervised algorithm, in which each metaparameter
setting is used to create a classifier. Next, it is evaluated
through internal validation. As optimization algorithm we
adopt simulated annealing.</p>
      <p>This paper is organized as follows. Section 2 briefly
introduces the CBA algorithm. Section 3 covers the two
proposed threshold tuning algorithms. Section 4 presents
evaluation and Section 5 summarizes limitations of the
presented work and provides outlook for future extensions.
The conclusions summarize the contributions of our
proposal, briefly discussing possible applications.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Association Rule Classifiers</title>
      <p>
        The first association rule classification algorithm was
Classification based on Associations (CBA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. While
there were multiple follow-up algorithms providing
marginal improvements in classification performance (e.g.
CPAR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], CMAR [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), the structure of most ARC
algorithms follows, with some deviations, that of CBA [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]:
      </p>
      <sec id="sec-2-1">
        <title>1. learn classification association rules,</title>
      </sec>
      <sec id="sec-2-2">
        <title>2. prune the set of rules,</title>
      </sec>
      <sec id="sec-2-3">
        <title>3. classify new objects.</title>
        <p>Rule learning In this phase, some algorithms such as CBA
learn complete association rules of the form antecedent !
consequent. The learning step returns all rules matching
the minimum confidence and minimum support
thresholds. The confidence of a rule is defined as con f (r) =
a=(a + b), where a is the number of correctly classified
objects, i.e. those matching rule antecedent as well rule
consequent, and b is the number of misclassified objects,
i.e. those matching the antecedent, but not the consequent.
The support of a rule is defined as supp(r) = a=n, where n
is the number of all objects (relative support), or simply as
a (absolute support). Additionally, the rule mining setup
is constrained so that only the target class values can occur
in the consequent of the rules.</p>
        <p>In some newer methods, the first step involves
generating frequent itemsets, rather than complete rules. An
example of such method is IDS, which does not impose
the minimum confidence threshold. It takes on the input
already the result of frequent itemset mining (i.e.
conjunctions of conditions). Rules are then formed within IDS by
splitting the frequent itemset into an antecedent and
consequent parts.</p>
        <p>
          In both approaches, adaptations of standard frequent
itemset generation association rule learning algorithms
such as apriori [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] or FP-growth [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] are used.
Rule pruning What is performed during the pruning phase
varies strongly from algorithm to algorithm. CBA uses a
simple and fast heuristic, which first sorts the rules and
then removes redundant rules. Rule is considered as
redundant if it does not correctly classify any instances after
instances covered by rules with higher priority. In contrast,
the IDS algorithm uses computationally intensive
submodular optimization, which provides guarantees in terms of
the optimality of the selected subset of rules with respect
to a chosen balance between predictive performance and
interpretability.
        </p>
        <p>Classification phase The way classification is performed
depends primarily on whether the ARC algorithm
produces rule lists or rule sets. Rule lists are ordered, and
typically only the first matching rule in the rule list is used
to classify an instance. CBA produces rule lists. In
contrast, rule sets are unordered and typically all rules with
matching antecedents contribute to classifying an instance.
CPAR is an example of an algorithm that produces a rule
set.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Automatic Tuning of Mining Parameters</title>
      <p>
        The minimum support threshold is a mandatory
hyperparameter of most, if not all, association rule learning
approaches, yet even the latest algorithms do little to tune
it algorithmically. Minimum confidence threshold is used
in smaller number of algorithms, but when it is used, it
is also not tuned. We suspect that the reason is that these
thresholds are notoriously difficult to optimize due to
exponential complexity of the search space [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Additionally, the classification performance is typically very
sensitive to parameter setting. While generally lower values of
confidence and support and higher values of rule length
produce the best results, the side effect of such setting
can be a disproportionally long time needed to build the
classifier caused by a combinatorial explosion, and
consequently extreme memory requirements.
      </p>
      <p>We considered standard approaches such as pure
random or grid search. Since they do not use any
background knowledge of the algorithm, we found them to be
unsuitable for optimizing the hyperparameters of
association rule learning, because of the sudden steep increases
in space state complexity that can be triggered by small
changes in the value of the hyperparameter.</p>
      <p>In the following, we introduce our two proposals for
hyperparameter tuning for association rule classification.
3.1</p>
      <p>Simulated Annealing Optimization
10
11
12</p>
      <p>
        Algorithms 1 and 2 present our implementation for the
hyperparameter optimization based on simulated
annealing [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ]. The objective criterion which is optimized
against is the accuracy of the model.
else
else
return newSetting
if resultStatus is empty then
// threshold may have been too high
newSetting[p] = newSetting[p] - rand(0,
newSetting[p])
else
      </p>
      <p>newSetting[p] = random(0,1)
case "ruleLength" do
if resultStatus is timeout then
// shorter rule length can fasten execution
newSetting[p] = newSetting[p] - 1
newSetting[p] = rand(1, MAX_LENGTH)</p>
      <p>The algorithm starts as a random search for one valid
initial solution providing non-empty classifier. Each
subsequent classifier is evaluated using nested
crossvalidation. Input data are internally divided into a train
and a validation subset with a stratified split. The
classifier is built with a setting generated on the train set. The
accuracy is computed using the created classifier on the
validation set. If the execution time of the evaluation is
over a predefined threshold we stop the computation and
mark the setting as invalid and set the computed accuracy
to null.</p>
      <p>Algorithm 2: Perturbate - Generating new setting for
SA.
input : Current setting: currentSetting</p>
      <p>Current setting status: resultStatus (timeout, success or
empty rule set)
output: New setting: newSetting
1 begin
2 newSetting = currentSetting
3 // with uniform probability select one parameter
4 p = random("support", "confidence", "ruleLength")
5 switch p do
6 case "support" or "confidence" do
7 if resultStatus is timeout then
8 // increasing threshold can fasten execution
9 newSetting[p] = newSetting[p] + rand(0, 1
newSetting[p])</p>
      <p>The evaluated new setting is accepted as a candidate for
next iteration if: 1) it is a valid setting not leading to a
timeout, 2) the accuracy is better than the current setting, or the
computed probability of acceptance exceeds the random
value. As an optimization we always remember the best
solution found so far so that it can be used if the algorithm
terminates at a sub-optimal place.</p>
      <p>An important part of the algorithm is a generation of a
new setting based on the previous one. Only one parameter
is changed during generation of a new setting, which
composes of support, confidence or rule length. If the current
setting was labeled as invalid, the support or confidence
are increased or rule length decreased to overcome long
computation time and perform more restricted rule
mining. If the setting does not generate any rule or no rule is
applicable, the support or confidence are decreased. For
remaining situations, a random value is generated.
3.2</p>
      <sec id="sec-3-1">
        <title>Heuristic algorithm</title>
        <p>As an alternative to the supervised evolutionary approach,
we also introduce an unsupervised heuristic algorithm.
While the search in the simulated annealing approach uses
accuracy as objective function, the heuristic algorithm
only aims to return a user-set number of rules. This
approach is conceptually faster, since repeated evaluations
of the classification model are not performed.</p>
        <p>
          According to the recommendation in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], CBA
generates best results when the rule generation step returns
at least 60.000 of rules. The experiments performed by
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] also provide recommended values for minimum
confidence (50%) and support (1%) thresholds.
        </p>
        <p>
          The problem that our CBA-RG-auto algorithm
addresses is that on some datasets the combination of the
values suggested in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] fails. The principal reasons are
either not enough rules generated or a combinatorial
explosion generating high number of overly short (and thus
general) rules.
        </p>
        <p>The CBA-RG-auto algorithm (Alg. 3) takes on the
input two principal parameters: the number of desired rules
(targetRuleCount) and preferred time that can be spent
with tuning (totalTimeout). The algorithm then iteratively
refines the minimum support (support) and confidence
(conf ) thresholds. The mining time and risk of
combinatorial explosion is controlled by adjusting the constraint on
the minimum and maximum number of conditions that can
appear in the antecedent of the rules (minLen and maxLen).
To guide the search process, the algorithm takes on input
several additional parameters. According to our
experiments, their values can be typically left at their default
values (we used the same defaults in all experiments reported
in our evaluation).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>In our benchmark, we aim to evaluate the performance of
the two proposed tuning steps against CBA with default
parameters as a baseline.</p>
      <p>For simulated annealing, we report on two setups, one
using default values of metaparameters of simulated
annealing (denoted as sa). To investigate the effect of
metaparameters introduced in the simulated annealing
algorithm, we also involve approach denoted as saopt, which
corresponds to the simulated annealing tuning algorithm
with metaparameter values optimized with random search.
For saopt the configurations were evaluated against test
data to determine the upper bound of attainable accuracy.
As a result, saopt cannot be directly compared with the
remaining evaluated algorithms, which did not have access
to test data during training.</p>
      <p>Algorithm 3: Automatic parameter tuning
heuristic algorithm (CBA-RG-auto)
input : train training data
parameters: main: targetRuleCount, totalTimeout,</p>
      <p>
        supplementary: initSupport = 0.01, initCon f = 0.5,
con f Step = 0.05, suppStep = 0.05, minLen = 2, initMaxlen = 3,
iterTimeout = 2, maxIterations = 40
output : rules - list of rules to be used as input for CBA-CB
with
1 begin
2 startTime currentTime(), supp initSupport, con f
initCon f , maxLen initMaxlen, iterations 0,
maxLenDecreasedDueToT IMEOUT false,
lastRuleCount -1
3 MAX RULELEN number of explanatory attributes
4 while true do
5 iterations iterations + 1
6 if iterations = maxIterations then
7 break
8
Datasets The evaluation was performed on 22 datasets
selected from the UCI repository [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. All selected datasets
were previously used in evaluation of rule learning or
decision tree algorithms in one of the following seminal
papers: [
        <xref ref-type="bibr" rid="ref16 ref17 ref4 ref5">5, 16, 4, 17</xref>
        ]. Numerical attributes with more than 3
values were binned with entropy-based discretization [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Ten-fold crossvalidation was used to generate train-test
splits. The same splits were used for all evaluated
configurations.
      </p>
      <p>Implementation We made available under an open source
licence implementations of all evaluated algorithms. We
used R package rCBA1 (available via CRAN) to obtain
results for the baseline CBA run and for simulated
annealing. R package arc2 (also available via CRAN) was used
to obtain results for the heuristic algorithm. Both
implementations use the apriori algorithm for the rule learning
phase.</p>
      <p>Settings The classifier building phase of CBA does not
have any metaparameters. The rule learning phase requires
setting of rule mining parameters – minimum support,
minimum confidence and maximum rule length. The
starting parameters for the proposed threshold tuning methods
(Algorithms 1 – 3) are also listed below.</p>
      <p>Baseline CBA (base): 50% minimum confidence, 1%
minimum support, maximal rule length 3.</p>
      <p>Heuristic algorithm (heuristic): Default setting
is targetRuleCount=60000, initSu p port = 0.01,
initCon f = 0.5, con f Ste p = 0.05, su p pSte p = 0.05,
minLen = 2, initMaxlen = 3, iterTimeout = 2,
maxIterations = 40
Simulated Annealing (sa): Default setting for the SA
algorithm is INIT _T EMP = 100:0, ALPHA = 0:05,
MAX _LENGT H = 5, T IME_LIMIT = 10
Optimized Simulated
dom search from
INIT _T EMP = 10:0
MAX _LENGT H = 3</p>
      <p>Annealing (saopt):
Ranthe following intervals
100:0, ALPHA = 0:01 0:5,
10, T IME_LIMIT = 1 10
4.2</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>Results are reported in terms of accuracy (Table 1), rule
count (Table 2), average number of conditions in rules in
the model (Table 3), average model size computed as
average number of conditions average rule count (Table 4),
and classifier build time (Table 5). Finally, Table 6
provides for each of the evaluated approaches an aggregate
number of wins in each of the five criteria above.
Baseline CBA The results show that CBA with default
parameter values performs surprisingly well, achieving best
results in terms of overall size of the classifier on most
datasets (14 out of 22), while obtaining the best results on
1Version 0.4.3
2Version 1.2
5 datasets in terms of predictive performance.
Remarkably, there are three datasets (breast-w, credit-g, sonar) for
which the default parameter values generate models that
have best accuracy and at the same time are smallest in
terms of combined rule count and rule length.</p>
        <p>Despite the five wins, base CBA had the worst
average and median accuracy. Detailed examination of Table 1
shows that the default thresholds result in either very low
accuracy or excessive size on several datasets, the drop in
accuracy is particularly strong on glass and letter datasets.
The instability of results is reflected by high standard
deviation for accuracy.</p>
        <p>Heuristic The optimization heuristic provides best
outcome in terms of predictive performance, both in terms
of accuracy and the number of wins against other datasets.
This comes at a cost of creating larger models than
generated by other methods, also the build time is the highest.
One dataset (letter) was not even processed. For accuracy,
the heuristic approach provides the most stable results with
lowest standard deviation.</p>
        <p>Simulated annealing</p>
        <p>When it comes to compact
models, very promising results were obtained by simulated
annealing with default parameters (sa), which produced the
smallest models in terms of rule count on 12 datasets. In
two cases (australian, hepatitis), this algorithm produced
much smaller models than the other methods with a small
gap in terms of accuracy. On the ionosphere dataset, sa
even generated a model which was most accurate and at
the same time smallest.</p>
        <p>The saopt algorithm generated almost consistently
better results than sa. However, this approach is not fully
comparable with the remaining two, because it used the
test set to select the best combination of hyperparameters.
It is included to show a possible effect of tuning
hyperparameters of simulated annealing as opposed to only using
the default values.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Limitations and Future Work</title>
      <p>We acknowledge several limitations affecting our
preliminary study:</p>
      <p>Our benchmark did not account for the tradeoff
between rule count and accuracy.</p>
      <p>For example, 1%
improvement in accuracy may need to be offset by
much higher increase in number of rules, which are
required to cover various specialized cases.</p>
      <p>We have not performed statistical testing on
significance in differences between the algorithms.</p>
      <p>
        The baseline approaches could include some
previously proposed approaches for metaparameter
optimization, such as [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
For the baseline CBA algorithm, we evaluated only
setting with maximum length of antecedent set to 3,
as higher thresholds sometimes led to combinatorial
explosion.
      </p>
      <p>The included datasets are of small or moderate size,
evaluation on large datasets was not performed.
We plan to address some of the limitations noted above
in a larger follow-up study. In future work, it would also
be interesting to adapt the proposed rule tuning heuristics
to the recent generation of association rule classification
algorithms. Unlike CBA, which uses a computationally
lightweight approach to selecting rules for the final
classifier, these algorithms typically subject the input rule set to
much more sophisticated selection process, involving
optimization techniques such as Markov Chain Monte Carlo
(in SBRL), submodular optimization (in IDS) or simulated
annealing (in BRS).</p>
      <p>
        This adaptation may require experimentation with other
metaparameter optimization algorithms, such as
sequential model based optimization approaches (SMBO) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
or other types of nature-inspired algorithms, e.g. F-race
(Irace) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which were experimentally showed to
outperform SMBO on tasks with mixed types of parameters [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we have shown how thresholds used in rule
generation can be tuned in both unsupervised and
supervised way to improve results of association rule
classification algorithms in terms of predictive performance and
size of the resulting model. Our results showed, somewhat
surprisingly, that the default thresholds recommended for
the CBA algorithm (1% minimum support and 50%
minimum confidence thresholds) provide on many datasets
results highly competitive to the best configuration found
with any of the proposed tuning algorithms. Despite this,
using these defaults cannot be unanimously recommended
as the default settings works well on some datasets, but
has abysmal results on others. The proposed
unsupervised heuristic tuning algorithm provides best predictive
accuracy and relatively stable results. The supervised
approach based on simulated annealing has promising results
in terms of generating compact models.</p>
      <p>Possible applications include not only general
classification problems, but particularly the use of associative
classification for anomaly detection, where the results are
known to be very sensitive to the choice of the support
threshold [22].</p>
      <p>The implementation of the proposed algorithms is made
available in R packages rCBA and arc, which are available
in the CRAN repository.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the three anonymous
reviewers for insightful comments that helped to improve
the final version of the paper. This research was supported
by Faculty of Informatics, Czech Technical University in
Prague and by the Faculty of Informatics and Statistics,
University of Economics, Prague by institutional support
for research and grant IGA 12/2019.
cal study on hyperparameter tuning of decision trees. arXiv
preprint arXiv:1812.02207, 2018.
[22] Brauckhoff, D.; Dimitropoulos, X.; Wagner, A.; aj.:
Anomaly extraction in backbone networks using
association rules. In Proceedings of the 9th ACM SIGCOMM
conference on Internet measurement, ACM, 2009, s. 28–34.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Lakkaraju</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Bach,
          <string-name>
            <given-names>S. H.</given-names>
            ;
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Interpretable Decision Sets: A Joint Framework for Description and Prediction</article-title>
          .
          <source>In Proceedings of KDD '16</source>
          , New York, NY, USA: ACM,
          <year>2016</year>
          , ISBN 978-1-
          <fpage>4503</fpage>
          -4232-2, s.
          <fpage>1675</fpage>
          -
          <lpage>1684</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Rudin,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Seltzer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Scalable Bayesian rule lists</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org</source>
          ,
          <year>2017</year>
          , s.
          <fpage>3921</fpage>
          -
          <lpage>3930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Doshi-Velez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>; aj.: A bayesian framework for learning rule sets for interpretable classification</article-title>
          .
          <source>The Journal of Machine Learning Research, rocˇník 18, cˇ. 1</source>
          ,
          <year>2017</year>
          : s.
          <fpage>2357</fpage>
          -
          <lpage>2393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Ma,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Integrating classification and association rule mining</article-title>
          .
          <source>In Proceedings of KDD'98</source>
          ,
          <year>1998</year>
          , s.
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Alcala-Fdez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Alcala,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning</article-title>
          .
          <source>IEEE Transactions on Fuzzy Systems, rocˇník 19, cˇ. 5</source>
          ,
          <year>2011</year>
          : s.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          .: CPAR:
          <article-title>Classification based on Predictive Association Rules</article-title>
          .
          <source>In Proceedings of the SIAM International Conference on Data Mining</source>
          , San Franciso: SIAM Press,
          <year>2003</year>
          , s.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Han,
          <string-name>
            <given-names>J</given-names>
            .;
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules</article-title>
          .
          <source>In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM '01</source>
          , Washington, DC, USA: IEEE,
          <year>2001</year>
          , ISBN 0-7695-1119-8, s.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Vanhoof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Depaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Structure of association rule classifiers: a review</article-title>
          .
          <source>In 2010 International Conference on Intelligent Systems and Knowledge Engineering (ISKE)</source>
          ,
          <year>November 2010</year>
          , s. 9-
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Imielinski,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Swami</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. N.</surname>
          </string-name>
          :
          <article-title>Mining Association Rules between Sets of Items in Large Databases</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>1993</year>
          , s.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          .;
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Yin, Y.; aj.:
          <article-title>Mining Frequent Patterns Without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, rocˇník 8</article-title>
          , cˇ. 1,
          <string-name>
            <surname>Leden</surname>
            <given-names>2004</given-names>
          </string-name>
          <source>: s. 53-87</source>
          , ISSN 1384-5810.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Coenen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Leng</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Zhang, L.:
          <article-title>Threshold tuning for improved classification association rule mining</article-title>
          .
          <source>In PacificAsia Conference on Knowledge Discovery and Data Mining</source>
          , Springer,
          <year>2005</year>
          , s.
          <fpage>216</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Cˇerný</surname>
          </string-name>
          , V.:
          <article-title>Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm</article-title>
          .
          <source>Journal of Optimization Theory and Applications</source>
          , rocˇník
          <volume>45</volume>
          , cˇ. 1,
          <year>1984</year>
          : s.
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          , ISSN 1573-2878.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kirkpatrick</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gelatt</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          ; Vecchi,
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          :
          <article-title>Optimization by Simulated Annealing</article-title>
          . Science, rocˇník
          <volume>220</volume>
          , cˇ.
          <volume>4598</volume>
          ,
          <year>1983</year>
          : s.
          <fpage>671</fpage>
          -
          <lpage>680</lpage>
          , ISSN 00368075, doi:10.1126/science. 220.4598.671.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>D. S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aragon</surname>
            ,
            <given-names>C. R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McGeoch</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          ; aj.:
          <article-title>Optimization by Simulated Annealing: An Experimental Evaluation. Part I, Graph Partitioning</article-title>
          .
          <source>Oper. Res., rocˇník 37, cˇ. 6</source>
          ,
          <string-name>
            <surname>Rˇíjen</surname>
          </string-name>
          <year>1989</year>
          : s.
          <fpage>865</fpage>
          -
          <lpage>892</lpage>
          , ISSN 0030-364X, doi: 10.1287/opre.37.6.865.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Dua</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Graff</surname>
            ,
            <given-names>C.:</given-names>
          </string-name>
          <article-title>UCI Machine Learning Repository</article-title>
          .
          <year>2017</year>
          . Available on: http://archive.ics.uci.edu/ml
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Hühn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hüllermeier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>FURIA: an algorithm for unordered fuzzy rule induction</article-title>
          .
          <source>Data Mining and Knowledge Discovery, rocˇník 19, cˇ. 3</source>
          ,
          <year>2009</year>
          : s.
          <fpage>293</fpage>
          -
          <lpage>319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          :
          <article-title>Improved use of continuous attributes in C4. 5</article-title>
          .
          <source>Journal of artificial intelligence research, rocˇník 4</source>
          ,
          <year>1996</year>
          : s.
          <fpage>77</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Fayyad</surname>
            ,
            <given-names>U. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Irani</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. B.</surname>
          </string-name>
          <article-title>: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning</article-title>
          .
          <source>In 13th International Joint Conference on Uncertainly in Artificial Intelligence (IJCAI93)</source>
          ,
          <year>1993</year>
          , s.
          <fpage>1022</fpage>
          -
          <lpage>1029</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          ; Bardenet,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Bengio, Y.; aj.:
          <article-title>Algorithms for hyper-parameter optimization</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <year>2011</year>
          , s.
          <fpage>2546</fpage>
          -
          <lpage>2554</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Birattari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Balaprakash</surname>
          </string-name>
          , P.; aj.:
          <article-title>F-Race and iterated F-Race: An overview. In Experimental methods for the analysis of optimization algorithms</article-title>
          , Springer,
          <year>2010</year>
          , s.
          <fpage>311</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Mantovani</surname>
            ,
            <given-names>R. G.</given-names>
          </string-name>
          ; Horváth,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Cerri</surname>
          </string-name>
          , R.; aj.:
          <string-name>
            <surname>An</surname>
          </string-name>
          empiri-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>