=Paper= {{Paper |id=None |storemode=property |title=Experimental Evaluation of the e-LICO Meta-Miner |pdfUrl=https://ceur-ws.org/Vol-950/planlearn2012_submission_4.pdf |volume=Vol-950 }} ==Experimental Evaluation of the e-LICO Meta-Miner== https://ceur-ws.org/Vol-950/planlearn2012_submission_4.pdf
      Experimental Evaluation of the e-LICO Meta-Miner
                                                               (Extended Abstract)

                              Phong Nguyen and Alexandros Kalousis and Melanie Hilario 1


1 Introduction                                                              Thus the meta-miner will qualify a candidate operator by its condi-
                                                                            tional probability of being applied given all the preceding operators,
Operator selection is the task of selecting the right operator for build-   and select those that have maximum quality to be applied at a step i.
ing not only valid but also optimal data mining (DM) workflows in           In order to have reliable probabilities, the meta-miner makes use of
order to solve a new learning problem. One of the main achievements         frequent workflow patterns extracted from past DM processes with
of the EU-FP7 e-LICO project2 has been to develop an Intelligent            the help of the DMOP ontology such that the operator quality func-
Data-Mining Assistant (IDA) to assist the DM user in the construc-          tion q is approximated as:
tion of such DM workflows following a cooperative AI-planning ap-                                            n supp(f o |g, m) o
proach [2] coupled with a new meta-learning approach for mining               q(o|T (wi−1 ), g, m) ≈ aggr                i
                                                                                                                                                 (2)
past DM experiments, referred as the e-LICO meta-miner [3]. The                                                 supp(fi−1 |g, m)     fio ∈Fio
idea of meta-mining [1] is to build meta-mined models from the full         where aggr is an aggregation function, Fio is the set of frequent
knowledge discovery process by analysing learning problems and al-          workflow patterns that match the current candidate workflow wio
gorithms in terms of their characteristics and core components within       built with a candidate operator o, and fi−1 is the pattern prefix for
a declarative representation of the DM process, the Data Mining OP-         each pattern fio ∈ Fio . More importantly, the quality of a candidate
timization ontology (DMOP)3 .                                               workflow wio will depend on the support function supp(fio |g, m) of
   In this paper, we provide experimental results to validate the e-        its matching patterns. As described in [3], this support function is
LICO meta-miner’s approach to the operator selection task. We ex-           defined by learning a dataset similarity measure which will retrieve
perimented on a collection of real-world datasets with feature selec-       a dataset’s nearest neighbors ExpN based on the input meta-data m.
tion and classification workflows, comparing our tool with a default        We refer the reader to [3] for more details. In the next section, we will
strategy based on the popularity of DM workflows. The results show          deliver experimental results to validate our meta-mining approach.
the validity of our approach; in particular, that our selection approach
allows to rank appropriately DM workflows with respect to the input
learning problem. In the next section, we briefly review the meta-          3 Experiments
miner. In section 3, we present our results. And in section 4, we con-      To meta-mine real experiments, we selected 65 high-dimensional bi-
clude.                                                                      ological datasets representing genomic or proteomic microarray data.
                                                                            We applied on these bio-datasets 28 feature selection plus classi-
2 The e-LICO Meta-Miner                                                     fication workflows, and 7 classification-only workflows, using ten-
                                                                            fold cross-validation. We used the 4 following feature selection algo-
The role of the AI-planner is to plan valid DM workflows by reason-         rithms: Information Gain, IG, Chi-square, CHI, ReliefF, RF, and re-
ing on the applicability of DM operators at a given step i according to     cursive feature elimination with SVM, SVMRFE; we fixed the num-
their pre/post-conditions. However, since several operators can have        ber of selected features to ten. For classification we used the 7 follow-
equivalent conditions, the number of resulting plans can be in the or-      ing algorithms: one-nearest-neighbor, 1NN, the C4.5 and CART de-
der of several thousands. The goal of the meta-miner is to select at        cision tree algorithms, a Naive Bayes algorithm with normal proba-
a given step i among a set of candidate operators Ai the k best ones        bility estimation, NBN, a logistic regression algorithm, LR, and SVM
that will optimize the performance measure associated with the user         with the linear, SVM l and the rbf, SVM r, kernels. We used the im-
goal g and its input meta-data m in order to gear the AI-planner to-        plementations of these algorithms provided by the RapidMiner data
ward optimal plans. For this, the meta-miner makes use of a quality         mining suite with their default parameters. We ended up with a to-
function Q which will score a given plan w by the quality q of the          tal of 65 × (28 + 7) = 2275 base-level DM experiments, on which
operators that form w as:                                                   we gathered all experimental metadata; folds predictions and per-
                                 |T (w)|                                    formance results, dataset metadata and workflow patterns, for meta-
                                  Y
   Q(w |g, m) = q ∗ (o1 |g, m)             q(oi |T (wi−1 ), g, m)    (1)    mining [1].
                                  i=2
                                                                               We constrain the AI-planner so that it generates feature selection
                                                                            and/or classification workflows only. We did so in order for the past
where T (wi−1 ) = [o1 , .., oi−1 ] is the sequence of previous oper-        experiments to be really relevant for the type of workflows we want to
ators selected so far, and q ∗ is an initial operator quality function.     design. Note that the AI-planner can also select from operators with
1 University of Geneva, Switzerland, email: Phong.Nguyen@unige.ch           which we have not experimented. These are for feature selection,
2 http://www.e-lico.eu                                                      Gini Index, Gini, and Information Gain Ratio, IGR. For classifica-
3 The DMOP is available at http://www.dmo-foundry.org
                                                                            tion, we used a Naive Bayes algorithm with kernel-based probability
estimation, NBK, a Linear Discriminant Analysis algorithm, LDA, a           is around 2%. As before, the meta-miner achieves significantly bet-
Rule Induction algorithm, Ripper, a Random Tree algorithm, RDT,             ter performance than the baseline in a larger number of baselines
and a Neural Network algorithm, NNet.                                       datasets than vice-versa.
                                                                             K=5. The two other workflows selected by the baseline strategy
3.1 Baseline Strategy                                                       additionally to the top-3 are SVMRFE-C4.5 and SVMRFE-SVM l.
                                                                            We give the results of the five best workflows selected by the meta-
In order to assess how well our meta-miner performs, we need to             miner in the last row of table 1, where we observe similar trends as
compare it with some baseline. To define this baseline, we will use as      before; 2% of average performance improvement and statistical dif-
the operators quality estimates simply their frequency of use within        ference in the number of improvement in favor of the meta-mining
the community of the RapidMiner users. We will denote this quality          strategy.
estimate for an operator o by qdef (o). Additionaly, we will denote
the quality of a DM workflow, w, computed using the qdef (o) quality                                           φa               φs
                                                                                                   Qdef      71.92%     11/65
estimations by Qdef (w), thus:                                                           K=1
                                                                                                   Q         77.68%     53/65    p=2e-7
                                 Y                                                                 Qdef      75.04%     22/65
                 Qdef (w) =                 qdef (oi )               (3)                 K=3
                                                                                                   Q         77.28%     41/65    p=0.046
                              oi ∈T (wf )                                                          Qdef      75.18%     18/65
                                                                                         K=5
                                                                                                   Q         77.14%     44/65    p=0.006
   The score qdef (o) focuses on the individual frequency of use of
the DM operators, and does not account for longer term interac-
                                                                             Table 1.   Performance results and comparisons for the top-K workflows.
tions and combinations such as the ones captured by our frequent
patterns. It reflects thus simply the popularity of the individual oper-
ators. In what concerns the most frequently used classification oper-
ators, these were C4.5, followed by NBN, and SVM l. For the feature         3.4 Selected Workflows
selection algorithms, the most frequently used were CHI and SVM-            We will briefly discuss the top-K workflows selected by the meta-
RFE.                                                                        miner. For K = 1, we have on a plurality of datasets the selection of
                                                                            the LDA classifier, an algorithm we have not experimented with. This
                                                                            happens because within the DMOP ontology this algorithm is related
3.2 Evaluation and Comparison Strategy
                                                                            both with the linear, SVM l, and with the NaiveBayes algorithm, both
The evaluation will be done in a leave-one-dataset-out manner, where        of which perform well on our dataset collection. For K = 3 and
we will use our selection strategies on the remaining 64 datasets to        K = 5, we have additionally the selection of the previously unseen
generate workflows for the dataset that was left out. On the left-out       NNet and Ripper classifiers. These operator selections demonstrate
dataset, we will then determine the K best workflows using the base-        the capability of the meta-miner to select new operators based on
line strategy as well as using the meta-miner selection strategy. To        their algorithm similarities given by the DMOP with past ones.
compare the performance of the ordered set of workflows constructed
by each strategy, we will use the average estimated performance of          4 Conclusion and Future Works
the K workflows on the given dataset, which we will denote by φa .
We will report the average of φa over all the datasets. Additionally,       This is a preliminary study, but already we see that we are able to
we will estimate the statistical significance of the number of times        deliver better workflow suggestions, in terms of predictive perfor-
over all the datasets that the meta-miner strategy has a higher φa          mance, compared to the baseline strategy, while at the same time be-
than the baseline strategy; we will denote this by φs . We estimated        ing able to suggest workflows consisting of operators with which we
the neighborhood ExpN of a dataset using N = 5 nearest neighbors.           have never experimented. Future works include more detailed experi-
We will compare the performance of the baseline and of the meta-            mentation and evaluation, and the construction of similarity measures
miner for K = 1, 3, 5 generated workflows in order to have a large          combining both the dataset characteristics and the workflow patterns.
picture of their overall performance.
                                                                            ACKNOWLEDGEMENTS
3.3 Performance Results and Comparisons                                     We would like to thank Jörg-Uwe Kietz and Simon Fischer for their
                                                                            contribution in the development and evaluation of the e-LICO meta-
 K=1. The top-1 workflow selected by the baseline strategy is CHI-          miner.
C4.5. When we compare its performance against the performance
of the top-1 workflow selected by the meta-miner given in the first
                                                                            REFERENCES
row of table 1, we can see that the meta-mining strategy gives an
average performance improvement of around 6% over the baseline              [1] Melanie Hilario, Phong Nguyen, Huyen Do, Adam Woznica, and
                                                                                Alexandros Kalousis, ‘Ontology-based meta-mining of knowledge dis-
strategy. In addition, its improvement over the baseline is statistically
                                                                                covery workflows’, in Meta-Learning in Computational Intelligence,
significant in 53 datasets over 65, while the baseline wins only on 11          eds., N. Jankowski, W. Duch, and K. Grabczewski, Springer, (2011).
datasets.                                                                   [2] Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, and Simon Fis-
                                                                                cher, ‘Towards Cooperative Planning of Data Mining Workflows’, in
 K=3. The two other workflows selected by the baseline strategy                 Proc of the ECML/PKDD09 Workshop on Third Generation Data Min-
additionally to the top-1 are CHI-NBN and CHI-SVM l. When we                    ing: Towards Service-oriented Knowledge Discovery (SoKD-09), (2009).
extend the selection to the three best workflows, we obtain the re-         [3] Phong Nguyen, Alexandros Kalousis, and Melanie Hilario, ‘A meta-
                                                                                mining infrastructure to support kd workflow optimization’, in Proc. of
sults given in the second row of table 1, where we see that the aver-
                                                                                the PlanSoKD-2011 Workshop at ECML/PKDD-2011, (2011).
age predictive performance improvement over the baseline strategy