=Paper=
{{Paper
|id=None
|storemode=property
|title=Closures and Partial Implications in Educational Data Mining
|pdfUrl=https://ceur-ws.org/Vol-876/paper8.pdf
|volume=Vol-876
}}
==Closures and Partial Implications in Educational Data Mining
==
Closures and Partial Implications in Educational Data Mining Diego Garcı́a-Saiz1 , Marta Zorrilla1 , and José L. Balcázar2 1 Mathematics, Statistics and Computation Department, University of Cantabria Avda. de los Castros s/n, Santander, Spain garciasad@unican.es marta.zorrilla@unican.es 2 LSI Department, UPC, Campus Nord, Barcelona jose.luis.balcazar@upc.edu Abstract. Educational Data Mining (EDM) is a growing field of use of data analysis techniques. Specifically, we consider partial implications. The main problems are, first, that a support threshold is absolutely nec- essary but setting it “right” is extremely difficult; and, second, that, very often, large amounts of partial implications are found, beyond what an EDM user would be able to manually inspect. Our program yacaree, recently developed, is an associator that tackles both problems. In an EDM context, our program has demonstrated to be competitive with respect to the amount of partial implications output. But “finding few rules” is not the same as “finding the right rules”. We extend the eval- uation with a deeper quantitative analysis and a subjective evaluation on EDM datasets, eliciting the opinion of the instructors of the courses under analysis to assess the pertinence of the rules found by different association miners. Keywords: Closure Lattices, Partial Implications, Association Rules 1 Introduction Education is evolving at all levels since the appearance of e-learning environ- ments: Learning Content Management Systems (LCMS), Intelligent Tutoring Systems, or Adaptive Educational Hypermedia Systems. These systems log all the activity carried out by students and instructors, and this raw data, ade- quately analyzed, might help instructors to obtain a better understanding of the students and of their learning processes. In remote learning, instructors may never see their students in person. Data analysis techniques could help them to detect problems (lack of motivation, under-performance, drop-out. . . ) and, pos- sibly, to take action. Yet, unless the course itself is on data mining, it is unlikely that the instructors know much about data mining techniques. If we want to help teachers of, say, philology or law, we need to work out data mining tools that do not require much tuning or technical understanding. Closures and Partial Implications in Educational Data Mining 99 Here we focus on the particular case of mining partial implications [1] (a relaxed form of implication analysis in concept lattices [2]), and their close rel- atives: association rules [3]. Most of the available algorithms depend on one or more parameters whose value is to be set by the user, and whose semantics are unlikely to be easy to understand by teachers of other disciplines. We have explored the output of five association algorithms on datasets from educational sources, and evaluated not only the amounts of partial implications found but also the subjective pertinency of the rules obtained. For this last task we kept close cooperation with the end user, namely, the teachers of the online courses from which the datasets were obtained. Our conclusions are in the form of strengths and weaknesses of each of the five algorithms compared. One of the algorithms participating in the evaluation was a contribution of our group, demonstrated at [4] and described in more detail in [5]: the yacaree as- sociation miner. This associator extracts partial implications from the “iceberg” (frequent part of the) FCA lattice [6]; it attempts at offering a more user-friendly, parameter-less interface, through self-tuning the support threshold and a thresh- old on a relative form of confidence studied in [7]: the closure-based confidence boost. In [8], a two-page poster publication, we have provided a preliminary initial description of this study, containing only the quantitative analysis (a part of Table 2 below) but using a version of yacaree which did not report yet rules of confidence 100%. This paper extends it largely with further quantitative analy- ses and a qualitative, user-based, subjective evaluation of the usefulness of the resulting rules. The main question to study is whether a price, in terms of use- fulness of the output for the end user, was being paid for the parameter-less interface. Any parameter-free alternative should stand a comparison of its out- put with that of other, “expert”-oriented algorithms, to clarify whether, for the subjective perception of the teacher, the outcome does make sense and results useful. Actually, our main conclusion is that they do, and that, developed accord- ing to our strategy, a self-tuning associator is able to provide sensible quantities of partial implications that result useful and informative to the end user. 1.1 Related work In the educational context, data mining techniques are used in order to un- derstand learner behaviour [9], to recommend activities or topics [10], to offer learning experiences [11] or to provide instructional messages to learners [12] with the aim of improving the effectiveness of the course, promoting group-based collaborative learning [13], or even predicting students’ performance [9]. Two in- teresting papers which detail and summarize the application of data mining to educational systems are [14] and [15]. The FCA community has also contributed in this arena. We must name Romashkin et al. [16] who used closed sets of students and their marks to reveal some interesting patterns and implications in student assessment data, especially to trace dynamic; and Ignatov et al. [17] who showed that FCA taxonomies are a useful tool for representing object-attribute data which helps to reveal some 100 D. Garcı́a-Saiz et al. frequent patterns and to present dependencies in data entirely at a certain level of details. They carried out the analysis of university applications to the Higher School of Economics as case study. Another interesting work in this research line was previously carried out by Belohlávek et al. [18] in order to evaluate questionnaires. In the particular case of the association rules technique, we find works such as [19] in which association rules are used to find mistakes often made together while students solve exercises in propositional logic, [20] where rules are used to discover the tools which virtual students employ frequently together during their learning sessions, and [21] where association rules and collaborative filtering are used inside an architecture for making recommendations in courseware. However, association rule algorithms still have some drawbacks, as analyzed in [22]: mainly, first, as most often the instructors are not data mining experts, the decisions about setting to useful values the parameters of the algorithms present difficulties. Then, a second difficulty is the large number of rules often obtained as output, most of which are redundant and non-interesting for decision making and, in many occasions, exhibit low understandability. The authors of [22] offer some solutions although none of them is automatized or gathered in an algorithm. For example, they propose to use Predictive Apriori, rather than the implementation of Apriori in Weka [23], since it only requires one parameter which is the number of rules that the user wants to obtain. In [24], it is argued that cosine and added value (or equivalently lift) are well suited to educational data, and that instructors can interpret their results easily. In our opinion, these measures lack actionability since they are symmetric, which reduces the use of the rules in decision making tasks. Orientation is a crucial and very suggestive property of association rules and partial implications, and we consider that it must be preserved in an effective but asymmetric measure, as close as possible to confidence. Many measures of intensity of implication are described e.g. in [25],[26]. 2 Case Studies This section contains our major contributions: we compare the output of five well-known association rule miners on five educational datasets and evaluate the subjective pertinency of the rules obtained in close cooperation with the teachers involved in the two virtual courses analyzed. 2.1 Association rule miners There is a long list of association rule miners; large sets of references and surveys appear e.g. in http://michael.hahsler.net/research/bib/association rules/ and in all main Data Mining reference works. Among them, we have chosen the following algorithms for our comparison: the implementation of Apriori by Borgelt [27], the implementation of Apriori in the Weka package [23], the Predictive Apriori Closures and Partial Implications in Educational Data Mining 101 implementation in Weka [28], the implementation of ChARM [29] available in the Coron System [30], and our own closure-lattice-based associator yacaree [4]. The implementation of Apriori by Borgelt [27] is a representative of the stan- dard usage of association rules in data mining, as per [3], particularly in the way support and confidence parameters are handled, as well as in the restriction to association rules with a single item in the consequent. In this fully standard ap- proach, first, one constructs all frequent sets, and then each item in each frequent set is tried as consequent with the rest of the frequent itemset as antecedent, and the confidence of the rule evaluated; the rule is reported if its confidence is high enough. This implementation is amazingly well streamlined for speed. It offers, additionally, an ample repertory of additional evaluation measures (lift, normalized chi-square. . . ), and we must warn that a specific flag must be set (as we did, “-o”) so that support is computed accordingly with the notion of support in other tools. Weka is one of the oldest and most extended open-source data mining suites, and all implementations there are widely used. The implementation of Apriori in the Weka package is similar to the one just described, employing confidence and support constraints; it departs slightly from [3], though. First, the rules generated can have more than one item in the consequent. Also, instead of fixing the support at the given threshold at once, the user is requested to indicate a number of rules and a “delta” parameter. Then, support is set initially at 100% and iteratively reduced by “delta” until either the support threshold is reached or the requested number of rules is collected. The Predictive Apriori implementation in Weka follows [28]. The advantage of this algorithm is that it only requires from the user to set the number of rules to be discovered, which is appropriate for users that are not data mining experts, provided that, in some sense, “the right rules” are found. The algorithm automatically attempts at balancing optimally support and confidence on the basis of Bayesian criteria related to the so-called expected predictive accuracy. A disadvantage of this method is that it often requires longer running times than the previous ones. These three implementations construct partial implications on the basis of all frequent itemsets. Our other two systems work on the basis of frequent closures, which allow one to know the support of any frequent itemset without storing all of them. The Coron system [30] offers several implementations of different closed-set-based algorithms. These methods return the same set of closure-based partial implications, although they compute them in different ways. We have used ChARM [29], but the specific method is not relevant here because we do not include yet running times in our evaluation: we concentrate on the usefulness of the output. The fifth implementation is our own association miner yacaree [4]. Like ChARM, it is based on closures, and allows for several items in the consequent of the partial implications. In the partial implications output by this system, both antecedent and total set of items in each rule will be closed sets. The cur- rently most recent version 1.2.0 is the first to report rules of confidence 100%. 102 D. Garcı́a-Saiz et al. First, it constructs the Closure Lattice up to a support bound that is adjusted autonomously during the run, on the basis of the technological limitations, so that the user does not need to select it. Second, it constructs a basis of partial implications out of these closures. Third, it filters the partial implications along the way, on the basis of the closure-based confidence boost [7], whereby the con- fidence of an association rule is compared to that of other similar rules: a rule must offer a clear improvement on similar ones to be considered useful. 2.2 Datasets For the case studies, we used the data from two courses offered in the University of Cantabria. Both courses are eminently practical. The first one, entitled “Intro- duction to multimedia methods”, has the objective of teaching the students how to use a particular multimedia tool (in what follows, we refer to it as the mul- timedia dataset) and the second one, “Basic administration of a UNIX-LINUX system” (the Linux dataset) teaches the students the basic utilities and tools to install and configure correctly a LINUX operating system. The multimedia course is designed by means of web pages and includes some video tutorials, flash animations and interactive elements. The students must perform 4 exercises, 2 projects and one final exam online. The course is open to all degrees and the number of students enrolled was 79. Unlike the multimedia course, the Linux course only allows 24 students to be enrolled, all of them from a telecommunications degree. All materials of the course are available since the first day of the course. Furthermore, the contents of a previous edition of the course is also offered in pdf; these files have the advantage that they can be kept locally and used for study in case any technical problem would prevent access to the updated files, but do not include all the contents of the present edition. Additionally, during the course, the students must deliver 6 practical exercises and pass two online exams. The course includes 38 self-tests, one for each topic of the course. The instructor indicates the topics and self-tests that they must perform every week on the calendar. We worked with five datasets. The first one, “linux materials”, gathers the ac- cess logs to materials prepared by the instructor (html pages, pdf files, tests, and so on) as used by each student in each learning session of the Linux course. The datasets “linux resources” and ”multimedia resources” are the session-wise log of the resources and tools used by each student in each learning session(assessment, content-pages, forum, and so on). It was immediately apparent that, in these datasets, one specific resource led to some “noise”: the “organizer” resource acts as front page of most sessions (near 84% in Linux and 85% in multimedia, as the only other alternative is the access through the forum) and hence it appears in many rules and creates many variants, mostly of low information contents. Thus, we prepared two datasets, named “linux resources reduced” and “mul- timedia resources reduced” respectively, which are identical to the second and third dataset, except that the “organizer” resource is fully removed. The number of different items and transactions of each dataset is shown in Table 1. For the Closures and Partial Implications in Educational Data Mining 103 sake of better understanding, we show a diagram of the intents of the concept lattice of the linux dataset above 13% support in Fig. 1. ∅ contentpage assignment assessment discussion mygrades contentpage assignment assessment assignment mygrades discussion discussion assignment assignment mygrades assessment assessment mygrades discussion assignment assessment discussion Fig. 1. Intents of at least 13% support. Table 1. Datasets description Name Transactions Items Dataset1 (linux materials) 407 22 Dataset1 (linux resources) 2486 27 Dataset2 (linux resources reduced) 2346 26 Dataset4 (multimedia resources) 5892 27 Dataset5 (multimedia resources reduced) 5643 26 2.3 Datasets results With the aim of comparing several association programs, one difficulty is always the setting of the parameters, particularly the support, as the value chosen might favor one particular algorithm in larger degree. In our case, there is an extra level of difficulty, as one of the participating algorithms, yacaree, self-tunes the 104 D. Garcı́a-Saiz et al. support on itself. In order to find fair comparison grounds, we performed a brief preprocessing. Running on one of the “Linux resources” dataset, yacaree took about four minutes (a bit long for a non-expert to wait) and delved down to 0.02% support; however, for this low threshold, both Weka alternatives were substantially worse (Predictive Apriori took 40 minutes and Apriori led to overflow even when given 2GB of memory). Similar facts happened for the other datasets. Given this information, we decided to fix at 1% the support threshold for all the computations, and at 66% the confidence threshold (initial value set up by yacaree). In all the runs, we left unbounded, or, in the case of Weka tools, we set very high (10000) the number of rules to be found, even if this meant overriding their default value for this quantity. We show the number of rules obtained utilizing the different algorithms on our datasets in Table 2. The entries marked “—” on the table are cases where the corresponding algorithm was unable to complete in 6 hours. Table 2. Number of rules obtained on our datasets with the five algorithms Dataset Number of rules s=1% c=66% Weka Predictive Borgelt ChARM yacaree Apriori Apriori Apriori Dataset1 (linux materials) 2272 1730 524 366 40 Dataset2 (linux resources) 7523 over 10000 3751 5610 255 Dataset3 (linux resources reduced) 4249 over 10000 1876 2586 93 Dataset4 (multimedia resources) 1442 — 1023 1427 182 Dataset5 (multimedia resources reduced) 488 — 404 469 46 Results from “resources reduced” datasets If we analyze the results ob- tained with Apriori from Weka, we can see that the number of rules is unman- ageable, e.g. 4249 rules for Linux resources reduced dataset. The first 243 are implications of full confidence, 100%, low support, and high redundancy: see rules 2 and 3 and 235 and 236 and the followings in Table 3. Had we used the tool’s default settings of the parameters, we would have found essentially no information. The same happens with multimedia dataset (we do not show the table for space reasons). The analysis of the results obtained from Predictive Apriori is very costly, as it generates as many rules as we allow it to. With 10000 rules required, they are obtained on dataset2 and dataset3 waiting for more than 20 minutes, and the accuracy is still high, so that many further rules could be obtained. If we restrict ourselves to the first few rules returned, they turn out to offer a very low support and quite some redundancy (see Table 4). The output offered by Borgelt’s implementation presents a large number of rules: 1876 and 404 rules in Linux and multimedia reduced datasets respectively, Closures and Partial Implications in Educational Data Mining 105 Table 3. Subset of association rules obtained with Apriori from Weka on the ”Linux resources reduced” dataset No. Association rule (Sup., Conf.) 2 announcement tracking ⇒ assessment (1.7, 100) 3 announcement mygrades tracking⇒ assessment (1.6, 100) 235 assignments calendar contentpage discussion medialibrary syllabus ⇒ assessment (1.0, 100) 236 assessment calendar contentpage discussion medialibrary syllabus ⇒ assignments (1.0, 100) 2523 announcement assessment calendar syllabus ⇒ assignments contentpage (1.2, 78.0) 2524 announcement assessment calendar syllabus ⇒ assignments discussion (1.2, 78.0) 2530 announcement calendar mail ⇒ contentpage (1.0, 78.0) 2534 announcement assignments calendar chat ⇒ contentpage (1.0, 78.0) Table 4. Subset of association rules obtained with Predictive Apriori from Weka on the “linux resources reduced” dataset No. Association rule (Support, Accuracy) 122 assignments calendar search ⇒ syllabus (0.85, 0.95439) 123 assignments chat weblinks ⇒ assessment syllabus (0.85, 0.95439) 124 assignments chat weblinks ⇒ discussion syllabus (0.85, 0.95439) 125 assignments discussion search ⇒ assessment syllabus (0.85, 0.95439) of which 141 and 2 are implications. Coming up with specific conclusions becomes harder. The rules tend to be small, exhibit high redundancy and involve low- support tools that are almost never used, so that they offer little interest to the instructor. As shown in Table 5, where the rules 11, 12, 13 differ slightly from the rules 99, 100 and 101 which contain the announcement tool in the antecedent with a very low support and similar confidence. Table 5. Subset of association rules obtained with Borgelt’s apriori implementation on the “linux resources reduced” dataset No. Association rule (Supp. , Conf. ) 11 chat ⇒ discussion (3.7, 84.9) 12 chat ⇒ assignments (3.7, 75.6) 13 chat ⇒ assessment (3.7, 81.4) 99 chat announcement ⇒ discussion (2.0, 84.8) 100 chat announcement ⇒ assignments (2.0, 87.0) 101 chat announcement ⇒ assessment (2.0, 93.5) ChARM returns a higher number of rules, 2586 and 469 with 193 and 2 implications in Linux and multimedia resources reduced datasets respectively. 106 D. Garcı́a-Saiz et al. As in previous cases, the rules also present high redundancy (see rules 3 to 6 and 7 and 8 in Table 6 and rules 10,11,12 and 31,32,33 in Table 7). Table 6. Subset of association rules obtained with ChARM on the “linux resources reduced” dataset No. Association rule (Supp. , Conf. ) 3 announcement, contentpage, medialibrary, syllabus ⇒ assessment (1.02, 96.00) 4 announcement, assessment, medialibrary, syllabus ⇒ contentpage (1.02, 88.89) 5 announcement, assessment, contentpage, medialibrary ⇒ syllabus (1.02, 70.59) 6 announcement, medialibrary, syllabus ⇒ assessment, contentpage (1.02, 82.76) 7 announcement, medialibrary, syllabus ⇒ contentpage (1.07, 86.21) 8 announcement, contentpage, medialibrary ⇒ syllabus (1.07, 67.57) Table 7. Subset of association rules obtained with ChARM algorithm on the “multi- media resources reduced” dataset No. Association rule (Supp. , Conf. ) 10 chat, contentpage, discussion ⇒ assessment (1.13, 81.01) 11 assessment, chat contentpage ⇒ discussion (1.13, 94.12) 12 chat, contentpage ⇒ assessment, discussion (1.13, 71.91) 31 contentpage, discussion, syllabus, ⇒ assessment (1.12, 84.00) 32 assessment, discussion, syllabus, ⇒ contentpage (1.12, 66.32) 33 assessment, contentpage, syllabus, ⇒ discussion (1.12, 79.75) Despite the fact that the number of rules obtained with yacaree on reduced resources datasets is a bit high, 93 for dataset3 and 46 for dataset5, it is possible to discover the resources which students use frequently together in each learning session and, at the same time, the kind of sessions which they perform. It is remarkable the reduction in the number of rules due to the use of confidence boost parameter. A subset of the most relevant rules obtained with yacaree on Linux resources reduced dataset is shown in Table 8. However, there appear as well quite a few trivial and non-interesting rules for the instructor. For instance, rule 1 is trivial because it is obvious that to send a task is necessary to use the file manager tool. The rules 6, 18 and 19 do not offer new information to the instructor given that he uses the forum in order to establish the date of the exams. So that these kind of sessions are known to the instructor. The rules 7, 12, 36 and 50 gather sessions in which students want to know specific dates: deadlines for tasks or assessments, exam dates. Rule 16 indicates quite a few sessions in which the students are interested in knowing their progress, and rules 8 and 10 gather the study sessions in which the students combine reading of content pages with tackling self-tests. Table 9 depicts a subset of the most relevant rules obtained with yacaree on multimedia resources reduced dataset. As in the previous result, there are Closures and Partial Implications in Educational Data Mining 107 Table 8. Subset of association rules obtained with yacaree on the “linux resources reduced” dataset No. Association rule (Supp., Conf., Lift, Cboost) 1 filemanager ⇒ assignments (4.6, 93.9, 1.908, 1.908) 6 discussion whoisonline ⇒ assessment (3.0, 75.5, 1.648, 1.379) 18 discussion mail ⇒ assessment (3.2, 72.1, 1.574, 1.268) 19 announcement mail ⇒ assessment discussion (1.6, 80.9, 3.381, 1.267) 7 announcement ⇒ assessment (7.6, 88.1, 1.923, 1.369) 12 calendar ⇒ assessment (9.1, 75.9, 1.656, 1.337) 36 calendar ⇒ assignments (8.1, 67.0, 1.362, 1.219) 50 announcement calendar ⇒ assessment assignments (2.6, 77.2, 2.941, 1.200) 16 tracking ⇒ mygrades (6.8, 80.3, 2.409, 1.272) 8 contentpage mygrades ⇒ assessment (3.8, 84.8, 1.850, 1.369) 10 contentpage discussion ⇒ assessment (7.3, 75.1, 1.639, 1.339) some trivial and non-interesting rules for the instructor. For example, rule 1 already explained, and rule 2 and 40 which gather sessions in which students wanted to know specific dates for assignments. Instead, other rules as rule 7, 14 and 36 allowed the teacher to discover the students visited the content pages and the forum in working sessions with the aim at solving problems or doubts in the resolution of the tasks. Furthermore, she was happy when observed that learning objectives tool was used while studying the contents (rule 3). This means that students played the videotutorials which she had recorded with great effort. Additionally, rule 4 informed her about the joint use of contents and weblinks tools. This last one contains the links to downloadable material. This reinforced her idea that the material should be presented in both formats, online and downloadable. Table 9. Subset of association rules obtained with yacaree on the ”multimedia re- sources reduced” dataset No. Association rule (Supp., Conf., Lift, Cboost) 1 filemanager ⇒ assignments (5.1, 71.5, 1.871, 1.871) 2 calendar ⇒ assignments (6.1, 74.9, 1.961, 1.610) 40 announcement ⇒ assignments (3.9, 67.2, 1.759, 1.153) 3 weblinks ⇒ contentpage (3.7, 78.2, 2.105, 1.588) 4 learningobjectives ⇒ contentpage (4.5, 81.4, 2.192, 1.530) 7 contentpage mygrades ⇒ assignments (2.7, 66.7, 1.746, 1.421) 14 assignments whoisonline ⇒ discussion (1.7, 72.5, 1.612, 1.301) 36 discussion weblinks ⇒ assignments (1.9, 73.4, 1.923, 1.180) Results from “resources” datasets, not reduced From the point of view of a virtual course instructor who is not an expert in Data Mining, the decision 108 D. Garcı́a-Saiz et al. of removing the “organizer” item from the “resources” dataset is debatable. This would be rather an action typical of a Data Mining expert. We consider that it was appropriate to do it, as the designers of the e-learning platform could easily predict that this “organizer” item was to be extremely frequent, and thus the option of discarding it could be incorporated by design into a set of related data mining tools ahead of time. However, we briefly discuss now what happens if one works with the complete “resources” dataset. With yacaree we obtain 255 and 182 rules in dataset2 and dataset4 respec- tively. In both cases, one of them indicates that “organizer” is used in near 84% and 85% of the sessions respectively (see Tables 10 and 11). For this format of rule, with empty antecedent, support and confidence clearly must coincide. Essentially, the output of yacaree is not that different from the previous cases: many rules from the previous analysis reappear now in pairs, once with “orga- nizer” and once without; when such a pair appears, the rule having “organizer” may look sometimes redundant, but its confidence boost value shows that it has high enough confidence so as to make it nonredundant (see Tables 10 and 11). Table 10. Subset of association rules obtained with yacaree on the ”Linux resources” dataset No. Association rule (Supp., Conf., Lift, Cboost) 2 ⇒ organizer (83.9, 83.9, 1.000, 1.982) 158 mygrades tracking ⇒ assessment organizer (4.6, 71.7, 1.888, 1.109) 287 mygrades tracking ⇒ assessment (5.0, 78.6, 1.818, 1.096 ) Table 11. Subset of association rules obtained with yacaree on the ”multimedia re- sources” dataset No. Association rule (Supp., Conf., Lift, Cboost) 1 ⇒ organizer (84.9, 84.9, 1.000, 2.421) 9 chat ⇒ discussion organizer (2.0, 77.6, 2.324, 1.283) 113 chat ⇒ discussion (2.2, 84.2, 1.954, 1.085) The extra effort to be spent on the yacaree output is not that high compared with the alternative algorithms. ChARM and Borgelt’s Apriori runs into the same difficulties indicated for the reduced datasets, increased by the fact that the number of rules is, with ChARM, 5610 in dataset2 and 1427 in dataset4, and with Borgelt, 3751 in dataset2 and 1023 in dataset4, which include a considerable number of rules whose only consequent is “organizer”. Intuitively, all of them are pointing out to the fact that this item is so prevalent. Similarly, Weka Apriori obtains over 7000 rules in dataset2 and 1442 in dataset4, of which the first 568 are implications of 100% confidence, 474 of which are again rules that only have “organizer” as consequent. Predictive Apriori, beyond taking 45 minutes Closures and Partial Implications in Educational Data Mining 109 to complete, also generates a large amount of rules (which we limited to 10000 again); and again the first ones have as single consequent “organizer”, and the next ones are long rules of very low support. Results from the “linux materials” dataset We show in the Table 12 some of the most relevant rules among the 40 rules, of which 16 are implications of confidence 100%, selected by yacaree on this dataset. Such a limited output size allows for easy inspection by the instructor. Table 12. Subset of association rules obtained with yacaree on the “materials” dataset No. Association rule (Supp., Conf., Lift, Cboost) 1 topic6 ⇒ topic-pdf (13.3, 1.0, 2.544, 2.544) 2 topic7 ⇒ topic-pdf (9.8, 1.0, 2.544, 2.500) 3 topic4 topic-pdf ⇒ topic5 (6.4, 76.5, 5.764, 2.266) 18 topic1 topic3 ⇒ topic2 (3.9, 72.7, 4.055, 1.377) 6 topic9 ⇒ topic10 topic-pdf (0.057, 1.0, 7.537, 1.917) 7 topic10 topic7 ⇒ topic8 topic-pdf (0.037, 1.0, 14.536, 1.875) 23 topic-pdf topic10 topic6 ⇒ topic8 (2.9, 66.7, 9.690, 1.286) 40 exam2 topic-pdf ⇒ topic10 (1.7, 77.8, 5.862, 1.167) 9 test2 ⇒ test1 test3 (4.9, 71.4, 13.844, 1.667) 10 test9 ⇒ test6 test7 test8 topic-pdf topic10 (2.5, 66.7, 27.133, 1.667) 14 test7 topic-pdf topic10 ⇒ test6 test8 test9 (2.5, 76.9, 31.308, 1.538) 23 test9 ⇒ test8 topic-pdf topic10 (3.4, 93.3, 23.742, 1.273) 28 test3 test4 ⇒ test5 topic-pdf (2.7, 73.3, 14.213, 1.222) The rules show that the course is divided clearly in two parts, up to topic and test number 5 and the followings (see rules 2 and 18 and 6, 7 and 23 as well as the set of rules from 9 to 28). The instructor observed that not all topics get really studied: some are worked out only through self-tests (set rule from 9 to 28 with a higher support than the corresponding to topic rules). He was very interested by these rules: first, as many of them indicate that students do not really study their assigned materials, but rather they undertake the tests and only look at the study materials when they do not know the answer, hence reversing the intended order of use of the materials; second, because they show that the outdated, incomplete materials from the earlier edition of the course (topic-pdf appears in most rules), which were thought of as a remedial offer for cases of technical connectivity difficulties only, were actually used much more than intended, even in sessions devoted to learning through self-tests. The first seven rules shown in the table also seems to suggest that students checked at what extent the contents of each topic differs from the old compiled version and as it was easier to manage and carry out searches, they frequently used it with tests. Another piece of interesting information, as judged by the teacher, is the fact that the topics in the second half of the course were consulted in more sessions than the first; this did match his perception that he had had to offer 110 D. Garcı́a-Saiz et al. more “moral support” to students on the brink of failure towards the end of the course. Rule 38 shows a good support for exam2, which is not the case for exam1; in fact, the exams are one-shot events. This unexpected support for exam2 was due to technical problems: half the students lost their connections and had to reconnect later in order to finish their exams, accounting for a misleadingly high number of sessions. (The instructor was surprised that our association rules could detect this.). With Coron’s ChARM many of the rules generated are somewhat redundant variants of the rules found by yacaree. Many other rules are also found: essen- tially, longish rules of confidence 100% (see Table 13). The task of browsing through the hundreds of rules, however, is slow and not user-friendly, and we do not believe a regular instructor would display enough patience to find out the most instructive rules among those returned by the algorithm. Table 13. Subset of association rules obtained with Coron’s ChARM implementation on the “materials” dataset No. Association rule (Supp. , Conf. ) 6 topic7 topic9 topic10 topic-pdf ⇒ topic8 (1.23, 100.00) 7 topic7 topic8 topic9 topic-pdf ⇒ topic10 (1.23, 100.00) 8 topic7 topic8 topic9 topic10 ⇒ topic-pdf (1.23, 100.00) 9 topic7 topic9 topic-pdf ⇒ topic8 topic10 (1.23, 100.00) 65 test5 test7 test8 test9 topic10 topic-pdf ⇒ test6 (1.47, 100.00) 66 test5 test6 test8 test9 topic10 topic-pdf ⇒ test7 (1.47, 100.00) 67 test5 test6 test7 test9 topic10 topic-pdf ⇒ test8 (1.47, 100.00) This objection also happens in Borgelt’s implementation and worsens with the Weka Apriori, which produces 2272 rules, of which 1522 are again longish implications of confidence 100%. Still, one can see that some of the rules having several items as consequent subsume into a single line several rules that the classical scheme separates into one rule per consequent item. Predictive Apriori generates 1730 rules, of which the first handful are 100% confidence implications with topic-pdf (the old material) as consequent, and the rest consists mostly of rules of rather low support. 3 Conclusions One of the drawbacks of some data mining algorithms is a dependence on suit- able parameter settings which can be difficult for “non-expert data miners” to determine. Another aspect is the degree of difficulty of interpretation of the re- sults. Although the results obtained by association rule miners can be considered easy to interpret by end-users, the large number of rules generated by the more commonly used algorithms, most of which contain facts that, intuitively, will be seen as redundant by users, makes their interpretation and comprehension difficult. Closures and Partial Implications in Educational Data Mining 111 Our comparison of different associators shows that they are vastly different in mere quantitative terms (already advanced in [8] and confirmed in this work); most associators lead to voluminous output; on the other hand, yacaree provides several dozen rules that may contain good knowledge yet will not overwhelm the user. The main question, then, is: are they “the right ones?” Our educational datasets seem to require a low support threshold, but do include items of rather high support; and this combination seriously hinders the ability of traditional association miners to offer interesting output. On the other hand, the most recent version of yacaree, which includes implications of confidence 100%, seems particularly well-suited to these cases, and finds rules of both high and low supports; and indeed we find that in most cases these rules “say different things”. All our conclusions have been thoroughly discussed with the instructors of the virtual courses to which the datasets refer. Summarizing, we can say that yacaree offers several advantages for non- expert data miners. First, it offers a parameter-less interface, which makes its usage easier. Second, it generates a reduced number of rules, as it works with closed frequent itemsets, mines only a rule basis, and prunes the rules through the confidence boost parameter. Third, it shows the support, confidence, lift and confidence boost in the output at the same time, which allows end-users to better assess the rules, once these measures are conveniently explained. The current (and previous) versions of yacaree present a limitation: by de- fault, it sets up the number of output rules to 50; our study reveals that this con- dition should be removed or, at least, relaxed. Previous versions did not search for full implications, and only the latest current version (1.2.0) does; our studies confirm that this must be maintained, as a number of interesting implications for our external user were missed in previous versions. As final conclusion, our interaction with the instructors involved in the vir- tual courses analyzed indicates that the results of yacaree are superior, in the case of analyzing datasets coming from logs of educational learning systems, in comparison with the rest of the algorithms used in our case study. This program can be freely downloaded from SourceForge, and a link has been provided in the web page on FCA software kindly maintained by prof. Uta Priss. References 1. Luxenburger, M.: Implications partielles dans un contexte. Mathématiques et Sciences Humaines 29 (1991) 35–55 2. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag (1999) 3. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discov- ery of association rules. In: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1996) 307–328 4. Balcázar, J.L.: Parameter-free association rule mining with yacaree. In Khenchaf, A., Poncelet, P., eds.: EGC. Volume RNTI-E-20 of Revue des Nouvelles Technolo- gies de l’Information., Hermann-Éditions (2011) 251–254 112 D. Garcı́a-Saiz et al. 5. Balcázar, J.L., Garcı́a-Sáiz, D., de la Dehesa, J.: Iterator-based algorithms in self-tuning discovery of partial implications. ICFCA, Supplementary proceedings (2012) 6. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. Data Knowl. Eng. 42(2) (2002) 189–222 7. Balcázar, J.L.: Formal and computational properties of the confidence boost in association rules. Available at: [http://personales.unican.es/balcazarjl]. Extended abstract appeared as [31] (2010) 8. Zorrilla, M.E., Garcı́a-Sáiz, D., Balcázar, J.L.: Towards parameter-free data min- ing: Mining educational data with yacaree. [32] 363–364 9. Hung, J.L., Zhang, K.: Revealing online learning behaviors and activity patterns and making predictions with data mining techniques in online teaching. Journal of Online Learning and Teaching 4(4) (2008) 426–436 10. Zaı̈ane, O.R.: Building a recommender agent for e-learning systems. In: Proc. of the International Conference on Computers in Education (ICCE), Washington, DC, USA, IEEE Computer Society (2002) 55–59 11. Au, T.W., Sadiq, S., Li, X.: Learning from experience: Can e-learning technology be used as a vehicle? In: Proceed ings of the fourth International Conference on e-Learing, Toronto: Academic Publishing Limited (2009) 32–39 12. Ueno, M., Okamoto, T.: Bayesian agent in e-learning. IEEE International Confer- ence on Advanced Learning Technologies (2007) 282–284 13. Perera, D., Kay, J., Koprinska, I., Yacef, K., Zaı̈ane, O.R.: Clustering and sequen- tial pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering 21(6) (2009) 759–772 14. Romero, C., Ventura, S.: Educational data mining: A review of the state-of-the- art. IEEE Tansactions on Systems, Man and Cybernetics, part C: Applications and Reviews 40(6) (2010) 601–618 15. Castro, F., Vellido, A., Nebot, A., Mugica, F.: Applying data mining techniques to e-learning problems. In Kacprzyk, J., Jain, L., Tedman, R., Tedman, D., eds.: Evolution of Teaching and Learning Paradigms in Intelligent Environment. Vol- ume 62 of Studies in Computational Intelligence. Springer Berlin Heidelberg (2007) 183–221 10.1007/978-3-540-71974-8 8. 16. Romashkin, N., Ignatov, D.I., Kolotova, E.: How university entrants are choosing their department? mining of university admission process with fca taxonomies. [32] 229–234 17. Ignatov, D.I., Mamedova, S., Romashkin, N., Shamshurin, I.: What can closed sets of students and their marks say? [32] 223–228 18. Belohlávek, R., Sklenar, V., Zacpal, J., Sigmund, E.: Evaluation of questionnaires supported by formal concept analysis. In Eklund, P.W., Diatta, J., Liquiere, M., eds.: CLA. Volume 331 of CEUR Workshop Proceedings., CEUR-WS.org (2007) 19. Merceron, A., Yacef, K.: Mining student data captured from a web-based tutoring tool: Initial exploration and results. Journal of Interactive Learning Research 15(4) (2004) 319–346 20. Zorrilla, M.E., Garcı́a-Saiz, D.: Mining service to assist instructors involved in virtual education. In Zorrilla, M.E., Mazón, J.N., Óscar Ferrández, Garrigós, I., Daniel, F., Trujillo, J., eds.: Business Intelligence Applications and the Web: Models, Systems and Technologies. Information Science Reference (IGI Global Publishers) (September 2011) 21. Garcı́a, E., Romero, C., Ventura, S., de Castro, C.: An architecture for making recommendations to courseware authors using association rule mining and collab- orative filtering. User Model. User-Adapt. Interact. 19(1-2) (2009) 99–132 Closures and Partial Implications in Educational Data Mining 113 22. Garcı́a, E., Romero, C., Ventura, S., Calders, T.: Drawbacks and solutions of applying association rule mining in learning management systems. In: Procs of the International Workshop on Applying Data Mining in e-Learning. (2007) 13–22 23. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech- niques (2ed). Morgan Kaufmann (2005) 24. Merceron, A., Yacef, K.: Interestingness measures for associations rules in educational data. In de Baker, R.S.J., Barnes, T., Beck, J.E., eds.: EDM, www.educationaldatamining.org (2008) 57–66 25. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Comput. Surv. 38(3) (2006) 26. Lenca, P., Meyer, P., Vaillant, B., Lallich, S.: On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research 184(2) (2008) 610–626 27. Borgelt, C.: Efficient implementations of apriori and eclat. In Goethals, B., Zaki, M.J., eds.: FIMI. Volume 90 of CEUR Workshop Proceedings., CEUR-WS.org (2003) 28. Scheffer, T.: Finding association rules that trade support optimally against confi- dence. In: In: 5th European Conference on Principles of Data Mining and Knowl- edge Discovery. (2001) 424–435 29. Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering 17(4) (2005) 462–478 30. Kaytoue, M., Marcuola, F., Napoli, A., Szathmary, L., Villerd, J.: The Coron System. In Boumedjout, L., Valtchev, P., Kwuida, L., Sertkaya, B., eds.: 8th International Conference on Formal Concept Analsis (ICFCA) - Supplementary Proceedings. (2010) 55–58 (demo paper). 31. Balcázar, J.L.: Objective novelty of association rules: Measuring the confidence boost. In Yahia, S.B., Petit, J.M., eds.: EGC. Volume RNTI-E-19 of Revue des Nouvelles Technologies de l’Information., Cépaduès-Éditions (2010) 297–302 32. Pechenizkiy, M., Calders, T., Conati, C., Ventura, S., Romero, C., Stamper, J.C., eds.: Procs of the 4th International Conference on Educational Data Mining, Eindhoven, The Netherlands, July 6-8, 2011. In Pechenizkiy, M., Calders, T., Conati, C., Ventura, S., Romero, C., Stamper, J.C., eds.: EDM, www.educationaldatamining.org (2011)