=Paper=
{{Paper
|id=None
|storemode=property
|title=Experimental Evaluation of the e-LICO Meta-Miner
|pdfUrl=https://ceur-ws.org/Vol-950/planlearn2012_submission_4.pdf
|volume=Vol-950
}}
==Experimental Evaluation of the e-LICO Meta-Miner==
Experimental Evaluation of the e-LICO Meta-Miner (Extended Abstract) Phong Nguyen and Alexandros Kalousis and Melanie Hilario 1 1 Introduction Thus the meta-miner will qualify a candidate operator by its condi- tional probability of being applied given all the preceding operators, Operator selection is the task of selecting the right operator for build- and select those that have maximum quality to be applied at a step i. ing not only valid but also optimal data mining (DM) workflows in In order to have reliable probabilities, the meta-miner makes use of order to solve a new learning problem. One of the main achievements frequent workflow patterns extracted from past DM processes with of the EU-FP7 e-LICO project2 has been to develop an Intelligent the help of the DMOP ontology such that the operator quality func- Data-Mining Assistant (IDA) to assist the DM user in the construc- tion q is approximated as: tion of such DM workflows following a cooperative AI-planning ap- n supp(f o |g, m) o proach [2] coupled with a new meta-learning approach for mining q(o|T (wi−1 ), g, m) ≈ aggr i (2) past DM experiments, referred as the e-LICO meta-miner [3]. The supp(fi−1 |g, m) fio ∈Fio idea of meta-mining [1] is to build meta-mined models from the full where aggr is an aggregation function, Fio is the set of frequent knowledge discovery process by analysing learning problems and al- workflow patterns that match the current candidate workflow wio gorithms in terms of their characteristics and core components within built with a candidate operator o, and fi−1 is the pattern prefix for a declarative representation of the DM process, the Data Mining OP- each pattern fio ∈ Fio . More importantly, the quality of a candidate timization ontology (DMOP)3 . workflow wio will depend on the support function supp(fio |g, m) of In this paper, we provide experimental results to validate the e- its matching patterns. As described in [3], this support function is LICO meta-miner’s approach to the operator selection task. We ex- defined by learning a dataset similarity measure which will retrieve perimented on a collection of real-world datasets with feature selec- a dataset’s nearest neighbors ExpN based on the input meta-data m. tion and classification workflows, comparing our tool with a default We refer the reader to [3] for more details. In the next section, we will strategy based on the popularity of DM workflows. The results show deliver experimental results to validate our meta-mining approach. the validity of our approach; in particular, that our selection approach allows to rank appropriately DM workflows with respect to the input learning problem. In the next section, we briefly review the meta- 3 Experiments miner. In section 3, we present our results. And in section 4, we con- To meta-mine real experiments, we selected 65 high-dimensional bi- clude. ological datasets representing genomic or proteomic microarray data. We applied on these bio-datasets 28 feature selection plus classi- 2 The e-LICO Meta-Miner fication workflows, and 7 classification-only workflows, using ten- fold cross-validation. We used the 4 following feature selection algo- The role of the AI-planner is to plan valid DM workflows by reason- rithms: Information Gain, IG, Chi-square, CHI, ReliefF, RF, and re- ing on the applicability of DM operators at a given step i according to cursive feature elimination with SVM, SVMRFE; we fixed the num- their pre/post-conditions. However, since several operators can have ber of selected features to ten. For classification we used the 7 follow- equivalent conditions, the number of resulting plans can be in the or- ing algorithms: one-nearest-neighbor, 1NN, the C4.5 and CART de- der of several thousands. The goal of the meta-miner is to select at cision tree algorithms, a Naive Bayes algorithm with normal proba- a given step i among a set of candidate operators Ai the k best ones bility estimation, NBN, a logistic regression algorithm, LR, and SVM that will optimize the performance measure associated with the user with the linear, SVM l and the rbf, SVM r, kernels. We used the im- goal g and its input meta-data m in order to gear the AI-planner to- plementations of these algorithms provided by the RapidMiner data ward optimal plans. For this, the meta-miner makes use of a quality mining suite with their default parameters. We ended up with a to- function Q which will score a given plan w by the quality q of the tal of 65 × (28 + 7) = 2275 base-level DM experiments, on which operators that form w as: we gathered all experimental metadata; folds predictions and per- |T (w)| formance results, dataset metadata and workflow patterns, for meta- Y Q(w |g, m) = q ∗ (o1 |g, m) q(oi |T (wi−1 ), g, m) (1) mining [1]. i=2 We constrain the AI-planner so that it generates feature selection and/or classification workflows only. We did so in order for the past where T (wi−1 ) = [o1 , .., oi−1 ] is the sequence of previous oper- experiments to be really relevant for the type of workflows we want to ators selected so far, and q ∗ is an initial operator quality function. design. Note that the AI-planner can also select from operators with 1 University of Geneva, Switzerland, email: Phong.Nguyen@unige.ch which we have not experimented. These are for feature selection, 2 http://www.e-lico.eu Gini Index, Gini, and Information Gain Ratio, IGR. For classifica- 3 The DMOP is available at http://www.dmo-foundry.org tion, we used a Naive Bayes algorithm with kernel-based probability estimation, NBK, a Linear Discriminant Analysis algorithm, LDA, a is around 2%. As before, the meta-miner achieves significantly bet- Rule Induction algorithm, Ripper, a Random Tree algorithm, RDT, ter performance than the baseline in a larger number of baselines and a Neural Network algorithm, NNet. datasets than vice-versa. K=5. The two other workflows selected by the baseline strategy 3.1 Baseline Strategy additionally to the top-3 are SVMRFE-C4.5 and SVMRFE-SVM l. We give the results of the five best workflows selected by the meta- In order to assess how well our meta-miner performs, we need to miner in the last row of table 1, where we observe similar trends as compare it with some baseline. To define this baseline, we will use as before; 2% of average performance improvement and statistical dif- the operators quality estimates simply their frequency of use within ference in the number of improvement in favor of the meta-mining the community of the RapidMiner users. We will denote this quality strategy. estimate for an operator o by qdef (o). Additionaly, we will denote the quality of a DM workflow, w, computed using the qdef (o) quality φa φs Qdef 71.92% 11/65 estimations by Qdef (w), thus: K=1 Q 77.68% 53/65 p=2e-7 Y Qdef 75.04% 22/65 Qdef (w) = qdef (oi ) (3) K=3 Q 77.28% 41/65 p=0.046 oi ∈T (wf ) Qdef 75.18% 18/65 K=5 Q 77.14% 44/65 p=0.006 The score qdef (o) focuses on the individual frequency of use of the DM operators, and does not account for longer term interac- Table 1. Performance results and comparisons for the top-K workflows. tions and combinations such as the ones captured by our frequent patterns. It reflects thus simply the popularity of the individual oper- ators. In what concerns the most frequently used classification oper- ators, these were C4.5, followed by NBN, and SVM l. For the feature 3.4 Selected Workflows selection algorithms, the most frequently used were CHI and SVM- We will briefly discuss the top-K workflows selected by the meta- RFE. miner. For K = 1, we have on a plurality of datasets the selection of the LDA classifier, an algorithm we have not experimented with. This happens because within the DMOP ontology this algorithm is related 3.2 Evaluation and Comparison Strategy both with the linear, SVM l, and with the NaiveBayes algorithm, both The evaluation will be done in a leave-one-dataset-out manner, where of which perform well on our dataset collection. For K = 3 and we will use our selection strategies on the remaining 64 datasets to K = 5, we have additionally the selection of the previously unseen generate workflows for the dataset that was left out. On the left-out NNet and Ripper classifiers. These operator selections demonstrate dataset, we will then determine the K best workflows using the base- the capability of the meta-miner to select new operators based on line strategy as well as using the meta-miner selection strategy. To their algorithm similarities given by the DMOP with past ones. compare the performance of the ordered set of workflows constructed by each strategy, we will use the average estimated performance of 4 Conclusion and Future Works the K workflows on the given dataset, which we will denote by φa . We will report the average of φa over all the datasets. Additionally, This is a preliminary study, but already we see that we are able to we will estimate the statistical significance of the number of times deliver better workflow suggestions, in terms of predictive perfor- over all the datasets that the meta-miner strategy has a higher φa mance, compared to the baseline strategy, while at the same time be- than the baseline strategy; we will denote this by φs . We estimated ing able to suggest workflows consisting of operators with which we the neighborhood ExpN of a dataset using N = 5 nearest neighbors. have never experimented. Future works include more detailed experi- We will compare the performance of the baseline and of the meta- mentation and evaluation, and the construction of similarity measures miner for K = 1, 3, 5 generated workflows in order to have a large combining both the dataset characteristics and the workflow patterns. picture of their overall performance. ACKNOWLEDGEMENTS 3.3 Performance Results and Comparisons We would like to thank Jörg-Uwe Kietz and Simon Fischer for their contribution in the development and evaluation of the e-LICO meta- K=1. The top-1 workflow selected by the baseline strategy is CHI- miner. C4.5. When we compare its performance against the performance of the top-1 workflow selected by the meta-miner given in the first REFERENCES row of table 1, we can see that the meta-mining strategy gives an average performance improvement of around 6% over the baseline [1] Melanie Hilario, Phong Nguyen, Huyen Do, Adam Woznica, and Alexandros Kalousis, ‘Ontology-based meta-mining of knowledge dis- strategy. In addition, its improvement over the baseline is statistically covery workflows’, in Meta-Learning in Computational Intelligence, significant in 53 datasets over 65, while the baseline wins only on 11 eds., N. Jankowski, W. Duch, and K. Grabczewski, Springer, (2011). datasets. [2] Jörg-Uwe Kietz, Floarea Serban, Abraham Bernstein, and Simon Fis- cher, ‘Towards Cooperative Planning of Data Mining Workflows’, in K=3. The two other workflows selected by the baseline strategy Proc of the ECML/PKDD09 Workshop on Third Generation Data Min- additionally to the top-1 are CHI-NBN and CHI-SVM l. When we ing: Towards Service-oriented Knowledge Discovery (SoKD-09), (2009). extend the selection to the three best workflows, we obtain the re- [3] Phong Nguyen, Alexandros Kalousis, and Melanie Hilario, ‘A meta- mining infrastructure to support kd workflow optimization’, in Proc. of sults given in the second row of table 1, where we see that the aver- the PlanSoKD-2011 Workshop at ECML/PKDD-2011, (2011). age predictive performance improvement over the baseline strategy