Adaptive Class Association Rule Mining for Human Activity Recognition 1 1 1 Martin Atzmueller , Mark Kibanov , Naveed Hayat , 2 3 Matthias Trojahn , and Dennis Kroll 1 University of Kassel, Research Center for Information System Design Knowledge and Data Engineering Group, Kassel, Germany {atzmueller, kibanov, hayat}@cs.uni-kassel.de 2 Volkswagen AG, Wolfsburg, Germany matthias.trojahn@volkswagen.de 3 University of Kassel, Research Center for Information System Design Chair for Communication Technology, Kassel, Germany dennis.kroll@comtec.eecs.uni-kassel.de Abstract. The analysis of human activity data is an important research area in the context of ubiquitous and social environments. Using sensor data obtained by mobile devices, e. g., utilizing accelerometer sensors contained in mobile phones, behavioral patterns and models can then be obtained. However, the utilized models are often not simple to interpret by humans in order to facilitate assessment, evaluation and validation, e. g., in computational social science or in medical contexts. In this pa- per, we propose a novel approach for generating interpretable rule sets for classication: We present an adaptive framework for mining class association rules using subgroup discovery, and analyze dierent tech- niques for obtaining the nal classier. The approach is investigated in the context of human activity recognition. For our evaluation, we apply real-world activity data collected using mobile phone sensors. 1 Introduction With more and more ubiquitous devices emerging in our daily lives, sensor data capturing human activities is becoming a universal data source for the analysis of human behavioral patterns, and for building according models. However, of- ten such models are either black-box models like neural networks, or are rather complex, e. g., in the case of random forests or large decision trees. Rule-based models can then often provide simpler models with comparable accuracy, esti- mated using quality measures [6, 7], in order to facilitate human interpretation. Copyright c 2015 by the paper's authors. Copying permitted only for private and academic purposes. In: M. Atzmueller, F. Lemmerich (Eds.): Proceedings of 6th International Workshop on Mining Ubiquitous and Social Environments (MUSE), co-located with the ECML PKDD 2015. Published at http://ceur-ws.org 20 Atzmueller et al. class association rule mining In this paper, we propose a novel approach for using subgroup discovery. We present an adaptive framework for mining such rules, and demonstrate the eectiveness of the proposed approach using real- world activity data collected using mobile phone sensors. Specically, we focus on activity recognition, as a prominent research eld with respect to the classication of human activities. Class association rules are special association rules with a xed class attribute in the rule consequent. In order to mine such rules, we apply subgroup discov- ery [4,42]  an exploratory approach for discovering interesting subgroups dened by a description, e. g., a conjunction of attributevalue pairs (i. e., a typical rule body) with respect to a binary target concept. In the case of class association rules, the respective class can be dened as the target concept (i. e., the rule head). Then, subgroup discovery can be adapted as a rule generator for class association rule mining. As we will discuss below, there are further adaptations for mining the nal rule set, which we integrate into a comprehensive framework for adaptive class association rule mining. Our contribution can be summarized as follows: 1. We adapt subgroup discovery to class association rule mining, and embed it into an adaptive approach for obtaining a rule set that aims to target a simple rule base with an adequate level of predictive power, i. e., combining simplicity and accuracy. 2. For constructing the rule base, we utilize standard methods of rule selection and evaluation, and demonstrate the integration into our framework. 3. We provide an evaluation using real-world activity data obtained by mobile phone sensors, and demonstrate the eectiveness of our approach by a com- parison with typical descriptive models, i. e., using Ripper as a rule-based baseline, and C4.5 as a decision tree classier. The rest of the paper is structured as follows: Section 2 discusses related work. Then, Section 3 introduces the necessary background. After that, Section 4 introduces the adaptive framework for class association rule mining. In Section 5 we describe the applied dataset. Next, Section 6 presents the results of our experiments and discusses them in detail. Finally, Section 7 concludes with a summary and provides interesting options for future work. 2 Related Work Below, we discuss related work concerning general approaches for the classica- tion of sensor data, subgroup discovery and associative classication. 2.1 Classication and Sensor Data Classication of activities based on sensor data is a prominent research area. Several authors investigated the topic using wearable sensors, e. g., as also inte- grated into mobile phones. These sensors can be attached to parts of the body Adaptive Class Association Rule Mining for Human Activity Recognition 21 like arms, legs or the hip. The rst works in this regard were already done at the end of the 1990s [30]. In the research of Foerster et. al. [23] 24 participants wore sensors on sternum, wrist, thigh and the lower leg. Nine activities were then replicated. Also, Bao and Intille [16] asked 20 subjects to perform some everyday activities while wearing ve biaxial accelerometers on dierent parts of the body. Fabian et al. [21] developed a real-time mobile system to recognize six dif- ferent activities in both standing and sitting positions. Therefore three motion band devices were attached to the wrist, hip and the dominant ankle of the participants. These devices contained an accelerometer, a magnetometer and a gyroscope. While the training was done oine on a Desktop PC, the following recognition process was done in real time with a smartphone collecting the sensor data from the attached motion bands. In this paper, we consider the eld of wearable sensors, specically on those embedded in mobile phones, focusing on the accelerometer: Kwapisz et al. [29], for example, collected and labeled data from 29 users and tried to classify six ba- sic activities (like standing or walking). Reddy et al. [39] considered the problem of usage of mobile phones to determine transportation mode (such as walking, biking, or in motorized transport) and used additionally GSM receiver of the device. Berthold et al. [17] presented ActiServ  an architecture which creates an evolving activity classication system using feedback from the user commu- nity. Yang [43] proposed a physical activity diary based on automatic sensor data classication to use in mobile healthcare and further applications (currently such applications emerge, e. g., Apple ResearchKit 4 ). In contrast to most of the presented works, we concentrate on some special activities, some of which assume active interaction with mobile phones. We also dene a group of disrupt activities  activities which are similar to a usual activity  to examine if the presented classier may recognize small dierences in activities. Furthermore, we consider up to 8 sensors for improving activity recognition. In contrast, related work discussed above only uses accelerometer or in a few cases a limited number of two or three sensors. 2.2 Subgroup Discovery Subgroup discovery [2, 4, 15, 27, 42] has been established as a general and broadly applicable technique for descriptive and exploratory data mining: It aims at identifying descriptions of subsets of a dataset that show an interesting behavior with respect to certain interestingness criteria, formalized by a quality function, e. g., [4, 25, 27]. Overall, subgroup discovery and analytics are important tools for descriptive data mining: They can be applied, for example, for obtaining an overview on the relations in the data, for automatic hypotheses generation, and for data explo- ration. Prominent application examples include knowledge discovery in medical, technical, and social domains, e.g., [3, 10, 14, 15, 24, 31, 37]. Subgroup discovery is 4 http://researchkit.org 22 Atzmueller et al. especially suited for identifying local patterns in the data, that is, nuggets that hold for specic subsets: It can uncover hidden relations captured in small sub- groups, for which variables are only signicantly correlated in these subgroups. Typically, the discovered patterns are especially easy to interpret by the users and domain experts, cf. [11, 24, 25]. Standard subgroup discovery approaches commonly focus on a single target concept as the property of interest [25, 27, 31], while the quality function frame- work also enables multi-target concepts, e. g., [12,28]. Furthermore, more complex target properties [20, 32] can be formalized as exceptional models, cf. [32]. In the case of a binary target variable, the share in a subgroup can be compared to the share in the dataset in order to detect deviations in (large) subgroups. This is also the approach considered in this paper, where we focus on a specic class (a set of classes, respectively) as the target concept(s). In addition to basic sub- group discovery which aims at providing the obtained subgroups in exploratory and descriptive fashion, we embed subgroup discovery as the basis of our rule generation approach. We apply an adaptive method that aims to generate rules with increasing complexity (and accuracy) based on a performance estimate of the current subgroup set. In addition, we apply a rule selection strategy in order to obtain the nal set of class association rules for classication. 2.3 Associative Classication Associative classication approaches integrate association rule mining and clas- sication strategies. Thabtah [41] provides a survey on the eld. This includes the rst approach by Liu et al. [35] for class association rule mining, which includes association rule mining and subsequent rule selection in the CBA algo- rithm. It applies a covering strategy, selecting rules one by one, minimizing the total error. Alternative approaches include the CMAR algorithm by Li et al. [34] which also applies covering, but allows for multiple rules to cover an instance. The CPAR algorithm by Yin and Han [44] integrates rule mining and selection, and achieves comparable accuracy compared to CBA and CMAR. In addition to the rule mining and selection techniques, there are several strategies for the nal decision of how to combine the rules for the classication (voting of the matching rules), e. g., [40]. Compared to the approaches discussed above, our proposed approach applies subgroup discovery for class association rule mining, which allows for suitable selection of a (complex) quality function for mining the rules, in constrast to the (simple) condence/support-based approaches applied by association rule min- ing approaches. Then, for example, signicance criteria can be simply embedded. Furthermore, the presented approach applies an adaptive strategy for balancing rule complexity (size) with predictive accuracy by applying a ruleset assessment function, in addition to the rule selection function. However, our framework is general in that respect, that we do not enforce a specic strategy. Instead, this decision can be congured by the specic implementation of the framework. In our implementation throughout this paper, for example, we follow the rule selec- tion strategy of CBA; the ruleset assessment is done by a median-based ranking Adaptive Class Association Rule Mining for Human Activity Recognition 23 of the according condences of the rules, i. e., estimated by the respective shares of the class contained in the subgroups covered by the respective rules. We will describe these concepts below in more detail. 3 Background Below, we rst introduce some basic notation. After that, we summarize basics on subgroup discovery, before we sketch how to mine class association rules using subgroup discovery. 3.1 Basic Notation Formally, a database DB = (I, A) is given by a set of individuals I and a set of attributes A. A selector or basic pattern sel ai =vj is a Boolean function I → {0, 1} that is true if the value of attribute ai ∈ A is equal to vj for the respective individual. The set of all basic patterns is denoted by S . For a numeric attribute anum selectors sel anum ∈[minj ;maxj ] can be dened analogously for each interval [minj ; maxj ] in the domain of a num . The Boolean function is then set to true if the value of the respective attribute a num is within the respective interval. 3.2 Patterns and Subgroups Basic elements used in subgroup discovery are patterns and subgroups. Intu- itively, a pattern describes a subgroup, i. e., the subgroup consists of instances that are covered by the respective pattern. It is easy to see, that a pattern de- scribes a xed set of instances (subgroup), while a subgroup can also be described by dierent patterns, if there are dierent options for covering the subgroup' in- stances. In the following, we dene these concepts more formally. Denition 1. A subgroup description or (complex) pattern sd is given by a set of basic patterns sd = {sel 1 , . . . , sell } , where sel i ∈ S , which is interpreted as a conjunction, i.e., sd (I) = sel 1 ∧ . . . ∧ sel l , with length(sd ) = l. Without loss of generality, we focus on a conjunctive pattern language using nominal attributevalue pairs as dened above in this paper; internal disjunc- tions can also be generated by appropriate attributevalue construction methods, if necessary. Denition 2. A subgroup (extension) sg sd := ext(sd ) := {i ∈ I|sd (i) = true} is the set of all individuals which are covered by the pattern sd . S As search space for subgroup discovery the set of all possible patterns 2 is used, that is, all combinations of the basic patterns contained in S . Then, appropriate ecient algorithms, e. g., [8, 13, 33] can be applied. 24 Atzmueller et al. 3.3 Interestingness of a Pattern The interestingness of a pattern is determined by a quality function, which is selected according to the analysis task. Denition 3. A quality function q : 2S → R maps every pattern in the search space to a real number that reects the interestingness of a pattern (or the ex- tension of the pattern, respectively). While a large number of quality functions has been proposed in literature, many quality functions for a single target concept, e. g., in the binary or nu- merical case, trade-o the size n = |ext(sd )| of a subgroup and the deviation tsd − t0 , where tsd is the average value of a given target concept in the subgroup identied by the pattern sd and t0 the average value of the target concept in the general population. In the binary case, the averages relate to the share of the target concept. Thus, typical quality functions are of the form qa (sd ) = na · (tsd − t0 ), a ∈ [0; 1] . (1) For binary target concepts, this includes, for example, the weighted relative accu- racy for the size parameter a = 1 or a simplied binomial function, for a = 0.5. An extension to a target concept dened by a set of variables can be dened similarly, by extending common statistical tests. While a quality function provides a ranking of the discovered subgroup pat- terns, often also a statistical assessment of the patterns is useful in data explo- ration. Quality functions that directly apply a statistical test, for example, the Chi-Square quality function, e. g., [4] provide a p-Value for simple interpretation. However, the Chi-Square quality function estimates deviations in two directions. An alternative, which can also be directly mapped to a p-Value is given by the adjusted residual quality function qr , since the values of qr follow a large standard normal distribution, cf. [1]: 1 qr = n(tsd − t0 ) · p n (2) nt0 (1 − t0 )(1 − N ) The result of top-k subgroup discovery is the set of the k patterns sd 1 , . . . , sd k , S where sd i ∈ 2 with the highest interestingness according to the applied quality function. A subgroup discovery task can now be specied by a 5-tuple: (DB, c, S, q, k) . We focus on the case of a binary target concept c : I → < specifying the property of interest: In the context of class assocation rule mining, it maps each instance in the dataset to a target value c corresponding to the respective class of the S instance. The search space 2 is dened by set of basic patterns S . Furthermore, we consider additional constraints with respect to the complex- ity of the patterns. We can restrict the length l of these descriptions to a certain maximal value, e. g., with length l = 1 we only consider subgroup descriptions containing one selector, with length l = 2 we consider a conjunction of two selec- tors etc. Then, the complexity of the discovered patterns can also be adaptively adjusted as described in Section 4. Adaptive Class Association Rule Mining for Human Activity Recognition 25 3.4 Subgroup Discovery for Mining Class Association Rules For mining class association rules, we apply subgroup discovery, such that for every class c ∈ S , we create an according target concept c. Then, we discover a c c c set of the top-k patterns CAR c = {sd 1 , sd 2 , . . . , sd k } for each target concept. It is easy to see, that a subgroup pattern directly corresponds to a class association rule - the head of the rule is given by the target concept, while the body of the rule is given by the specic subgroup description. Then, these rules can be applied for building the classier. For that, a specic rule selection strategy needs to be applied, after the total set of class association rules has been determined. It usually aims at selecting the subset with the best predictive power, e. g., using one of the algorithms discussed above in Section 2. When applying the model, dierent rule combination strategies can be used, e. g., taking the best rule, or aggregating the votes of the individual matching rules, cf. [40]. Basically, for each rule r that matches an instance i ∈ I that we want to classify, we can combine the dierent classications of the individual ri in order to combine the nal classication. The best rule strategy just selects the rule with the highest condence (and its respective classication). In addition, we can apply voting methods for obtaining the nal classication, cf. [40], i. e., for combining individual predictions as votes for the individual classication. Essentially, for classifying an individual (instance) i ∈ I , this works as follows: X class(i) = arg max weight(r) , (3) ci ∈C r∈Ri where Ri is the subset of rules matching instance i ∈ I of class ci , and C ⊆ S denotes the set of available classes in our dataset. The weight of a rule weight(r) depends on the chosen weighting method. Following [40], we applied the unweighted strategy, where weight U (r) = 1 for all rules r , and the laplacian weight strategy weight L (r) = Laplace(r), where the laplacian weight is determined according to the Laplace correction [18] to the estimated class probabilities of the applied dataset: pri + 1 Laplace(r) = P r , (4) cj ∈C pj + |C| r r where pj (and pi ) are the numbers of covered examples by rule r that belong to the respective classes cj (and ci of the rule, respectively). 4 An Adaptive Framework for Class Association Rule Mining In this section, we provide an overview on the proposed approach presenting our novel framework Carma, an Adaptive Framework for Class Association Rule Mining, and provide examples of its instantiation in Section 6. For our adaptive framework, we distinguish two phases: The learning phase that constructs the model, and the classication phase that applies the model. 26 Atzmueller et al. Learning: Model Construction For the construction of the model, we apply the steps described in Algorithm 1. Basically, Carma starts with discovering class association rules for each class c contained in the dataset. Using subgroup discovery (line 5, calling procedure SubgroupDiscovery that needs to be instan- tiated with an appropriate subgroup discovery algorithm), we collect a set of class association rules for the specic class, considering a maximal length of the concerned patterns. After that, we apply a boolean ruleset assessment function a (line 6) in order to check, if the quality of the ruleset is good enough. If the outcome of this test is positive, we continue with the next class (line 10). Oth- erwise, we increase the maximal length of a rule (up to a certain user-denable threshold T , line 12). After the nal set of all class association rules for all classes has been determined, we apply the rule selection function r (line 14) in order to obtain a set of class association rules that optimizes predictive power on the trainingset. That is, the rule selection function aims to estimate classication error and should select the rules according to coverage and accuracy of the rules on the trainingset. Algorithm 1 CARMA Require: Set of classes C , k specifying the number of top-k patterns, maxlength T denoting the maximal possible length of a subgroup pattern, quality function q , ruleset assessment function a, rule selection function r. 1: Patterns P = ∅ 2: for allc∈C do 3: Current length threshold length = 1 4: while true do 5: Obtain candidate patterns CP by CP = SubgroupDiscovery(DB , c, S , q, k , T ) 6: ifCurrent candidate patterns are good enough, i. e., a(CP ) = true then 7: P = P ∪ CP 8: break 9: else iflength > T then 10: break 11: else 12: length = length + 1 13: Add a default pattern (rule) for the most frequent class to P 14: Apply rule selection function: P = r(P ) 15: return P {Model, consisting of the result set of rules} Classication For the classication phase, we apply all the rules contained in the model P . For aggregating the predictions of the (matching) rules for an individual (instance) i ∈ I , and for obtaining the nal classication, we apply a specic rule combination strategy, see Section 3 for examples. Adaptive Class Association Rule Mining for Human Activity Recognition 27 5 Dataset We collected a dataset containing a diverse set of activities (classes) split into two categories: (1) Activities which demand the direct usage of the device , e. g., holding the device close to the ear, or putting the device in a specic place, and (2) typical walking activities, e. g., walking slowly or normally. We dened ve scenarios that consist of sets of dierent activities. While doing these activities the person used a smartphone with a running application. This application recorded the sensor data. The persons used the smartphone actively (e. g., putting device in the pocket) or passively (e. g., while walking). Another smartphone was used to record the exact start and nish time of each activity. 39 test persons of dierent sex and age repeated each scenario six times. The resulting dataset consists of a total of 3077 valid single activities. Table 1 shows an overview on the dataset, specic activities and class distributions in detail. Table 1. Activity dataset  Overview: Description of the individual activities (classes), body position, device context, number of instances (samples) for each activity/class. ID Description Body Device No. of Activity/Class Position Usage Samples 1 Put device in right trousers pocket Sit Yes 54 2 Put device in right trousers pocket Stand Yes 290 3 Put device in shirt pocket Sit Yes 54 4 Put device in shirt pocket Stand Yes 162 5 Take device from right trousers pocket Sit Yes 54 6 Take device from right trousers pocket Stand Yes 290 7 Take device from shirt pocket Sit Yes 54 8 Take device from shirt pocket Stand Yes 162 9 Put device on the table Sit Yes 55 10 Put device on the table Stand Yes 272 11 Take device from the table Sit Yes 55 12 Take device from the table Stand Yes 272 13 Give device to another person Sit Yes 109 14 Give device to another person Stand Yes 163 15 Take device from another person Sit Yes 55 16 Take device from another person Stand Yes 217 17 Hold device near the ear Stand Yes 217 18 Take device away from the ear Stand Yes 54 19 Walk slowly (device in hand)  No 54 20 Walk slowly (device near ear)  No 54 21 Walk normally (device in shirt pocket)  No 54 22 Walk normally (device in hand)  No 54 23 Walk normally (device near ear)  No 55 24 Walk normally (device in right trousers pocket)  No 55 25 Walk fast (device in hand)  No 54 26 Walk fast (device near ear)  No 54 27 Walk fast (device in right trousers pocket)  No 54 28 Atzmueller et al. Table 2. Overview on the features generated using the collected sensor data. Feature Sensor Average/Minimum/Maximum Value All Standard Deviation All Zero-Crossings All Without Light and Proximity Sensors 75th Percentile All Without Light and Proximity Sensors Overall, we recorded data from eight dierent sensors, installed on Sam- sung Galaxy Nexus Device, particularly: (1) Accelerometer, (2) Magnetome- ter, (3) Gyroscope, (4) Light sensor, (5) Proximity sensor, (6) Rotation vector, (7) Gravity sensor, and (8) Linear acceleration. Using these, we created a set of features applying window-based techniques. A xed window size of 1 second was used. This size was already proven to be ecient for walking activities [26]. We created 6 features per window and per sensor as described in Table 2. Zero- crossings describes the number of changes from positive to negative and negative to positive values, respectively. The 75th percentile represents the lowest value that is greater than or equal to 75% of the values. Other features were the calcu- lated mean, min/max and standard deviation for the given window. The features were extracted for every axis of every sensor. The only exception were light and proximity sensors. Zero-crossings and the 75th percentile were not calculated for these sensors because of the nature of their returning values. Thus 4 features were obtained for both the light and proximity sensor and 18 for each of the others, resulting in a total set of 116 features. In order to use the features for class association rule mining, we employed the discretization technique by Fayad & Irani [22] for deriving according selectors. 6 Evaluation Below, we compare an instantiation of the proposed Carma framework against two baselines: The Ripper algorithm [19] as a rule-based learner, and the C4.5 algorithm [38] for learning decision trees. For the subgroup discovery step in the Carma framework, we apply the BSD algorithm [33] using the implementation provided by the VIKAMINE system [9]. Further details are described below when we discuss the experimental setup and results. As the basic evaluation measures, we consider (multi-class)model accuracy and model complexity with respect to activity recognition on the 116 features and 27 classes (shown in Table 1), cf. Section 5. Accuracy is dened as a portion of samples that were classied correctly. Furthermore, complexity relates to the size of a model using two parameters: the total number of rules contained in a rule-based model (also corresponding to the number of leaves in a decision tree), and its average complexity (i. e., for a decision tree the average length of path from a root to a leaf of a tree). All experiments were performed in a standard 10-fold cross-validation setting. Adaptive Class Association Rule Mining for Human Activity Recognition 29 6.1 Baselines Results We applied both JRip and J48 algorithms as baseline methods. We compare results with the described approach and explore the inuence of dierent pa- rameters in terms of accuracy and model complexity. Table 3. Baseline results using C4.5 (J48) and Ripper (JRip). Algorithm Accuracy Complexity No. of Rules Avg. Complexity J48 69.02% 1394 6.76 JRip 66.87% 176 3.40 Table 3 shows performance and complexity of the baseline algorithms. J48 showed a better performance but built a more complex model with 1394 rules and average rule complexity of 6.76. JRip's accuracy is 2% lower but the model is much smaller with only 176 rules and an average rule length of 3.40. 6.2 Results and Discussion When applying the Carma framework, we need to instantiate several compo- nents according to the analytical question. In the context of our experiments we instantiate these elements as follows:  For the subgroup discovery algorithm, we selected the BSD algorithm [33].  For the ruleset assesssment function, we just check, if the median of the rules' condences is above a certain threshold τc . In our experiments, we applied a threshold τc = 0.5.  Furthermore, for the rule selection function, we apply an adaptation of the CBA algorithm [35].  In addition to the basic CBA algorithm, we also implemented a variant, which we call CBA*. This algorithm ensures, that there is at least one rule for each class in the derived model, i. e., when estimating classication per- formance on the training set, it is checked that at least one rule for each class exists in the nal classier. We default to the rule with the highest condence, if there is none contained in the initial model.  Since we are interested in easily interpretable rules, we also selected the quality function qr (adjusted residuals, described above) which directly maps to signicance criteria.  We opted for interpretable patterns with a maximal length of 7 conditions, and set the respective threshold T = 7 accordingly.  In the evaluation, we used three dierent TopK values: 100, 200 and 500.  For the rule combination strategy, we experimented with four strategies: tak- ing the best rule according to condence and Laplace value, the unweighted voting strategy, and the weighted voting (Laplace) method (see Section 3). 30 Atzmueller et al. Table 4. Evaluation results: The table shows accuracy and complexity of Carma depending on dierent choices of k, the rule selection techniques CBA and CBA*, and the following rule combination strategies: UnweightedVote (unweighted voting), LaplaceVote (voting using laplacian weights), BestLaplace (best rule using Laplace value), and BestCondence (best rule according to rule condence), cf. Section 3 for a detailed discussion. TopK CBA CBA* No. of Avg. No. of Avg. Accuracy Accuracy Rules Complexity Rules Complexity UnweightedVote 67.14 % 347.2 2.79 ± 1.00 67.31 % 345.3 2.82 ± 1.04 LaplaceVote 66.47 % 347.1 2.80 ± 1.00 66.96 % 345.0 2.81 ± 1.04 100 BestLaplace 59.60 % 349.4 2.81 ± 1.00 59.10 % 345.4 2.79 ± 0.98 BestCondence63.31 % 349.8 2.82 ± 1.03 62.22 % 346.5 2.81 ± 1.01 UnweightedVote 67.82 % 424.9 2.91 ± 1.01 67.99 % 422.5 2.88 ± 1.00 200 LaplaceVote 68.20 % 426.7 2.91 ± 1.0269.09 % 424.8 2.89 ± 1.00 BestLaplace 59.45 % 421.8 2.90 ± 1.01 59.63 % 423.1 2.88 ± 1.01 BestCondence 64.75 % 424.7 2.87 ±1.01 64.93 % 422.8 2.89 ± 1.04 UnweightedVote 69.38 % 517.3 3.05 ± 0.9770.52 % 522.3 3.05 ± 0.98 LaplaceVote 69.95 % 518.4 3.05 ± 0.95 69.96 % 522.1 3.05 ± 0.96 500 BestLaplace 60.95 % 518.3 3.06 ± 1.01 60.60 % 521.7 3.04 ± 0.97 BestCondence 66.80 % 525.4 3.06 ± 0.98 66.80 % 520.6 3.06 ± 0.97 Table 4 shows the results of our experiments. Overall, it is easy to see that the proposed approach outperforms the baselines both in accuracy as well as in complexity, i. e., an instantiation with the UnweightedVote or LaplaceVote func- tions and k = 500 outperforms even the C4.5 baseline clearly. If we especially concentrate on the complexity (or simplicity) of the model, we can observe that Carma demonstrates its advantages since it clearly generates less complex mod- els than the baselines with a comparable accuracy, e. g., C4.5. If we consider the Ripper algorithm, we can observe that it still has a better average complexity (i. e., lower average complexity of a rule) while it outperforms Ripper in terms of accuracy clearly. Considering the voting functions, we observe that the functions (unweighted voting, and weighted Laplace) always outperform the rest. In our experiments, using larger values of k indicates a higher accuracy  here also the compexity (in the number of rules) can be tuned. We observe a slight trade-o between accuracy and complexity here. Basically, the parameter k seems to have an inuence on the complexity, while the remaining instantiations do not seem to have a strong inuence. This can be explained by the fact, that the model generation phase is mainly dependent on k (and the maximum length of the patterns) but not on the applied voting method. CBA and CBA* seem quite close in terms of accuracy and complexity, while we can observe a slight improvement for CBA*. In empirical evaluations it turned out that the dierence between CBA* and CBA was even more pronounced for lower numbers of k , leading to slightly better models for CBA*. However, for our parameter selection, we do not see strong improvements of CBA* compared to CBA. Adaptive Class Association Rule Mining for Human Activity Recognition 31 In summary, the proposed framework always provides a more compact model than the baseline algorithms concerning rule complexity, with simple rules such as: IF minProx = (0.5 − 3] ∧ minMagnetY > 34 ∧ zeroCrossAccelX = (0.5 − 1.5] THEN Class =Hold device near the ear. In our experiments, it is at least in the same range or even better than the baselines concerning accuracy. In particular, considering the best parameter instantiations, the proposed approach is able to outperform both baselines concerning the accuracy (see Figures 1-2). UnweightedVote CBA LaplaceVote BestLaplace 70% BestConfidence J48 JRip 65% Accuracy 60% 55% 50% TopK = 100 TopK = 200 TopK = 500 Fig.1. Comparison of the accuracy of Carma using the standard CBA method for rule selection, with dierent rule combination strategies to the baselines. 7 Conclusions Human activity recognition, and interpretable models for classication are promi- nent research directions, especially considering the ever-increasing amount of available sensor data and social media. In this paper, we presented a unifying view on these topics, proposing a novel approach adaptive class association rule mining using subgroup discovery. We successfully applied and evaluated this approach in the eld of human activity recognition. 32 Atzmueller et al. UnweightedVote CBA* LaplaceVote BestLaplace 70% BestConfidence J48 JRip 65% Accuracy 60% 55% 50% TopK = 100 TopK = 200 TopK = 500 Fig.2. Comparison of the accuracy of Carma using the (improved) CBA* method for rule selection, with dierent rule combination strategies to the baselines. Carma framework is especially suited for generating inter- The proposed pretable rule sets for classication, with a low model complexity. We discussed and analyzed dierent instantiations of Carma, e. g., for parameter selection and for obtaining the nal classier. For our evaluation, we applied real-world data collected for dierent activities using mobile phone sensors. Our experi- ments showed, that the proposed approach can outperform the baselines clearly, both in terms of accuracy and complexity of the resulting predictive model. For future work, we aim to consider more datasets, in order to extend the evaluation further. In addition, we aim to analyze the performance of Carma in further domains, e. g., in the medical domain, or for classifying social media. Furthermore, we plan to investigate further rule assessment and rule selection strategies in detail, e. g., [36], in order to perform further algorithmic comparison and assessment. Based on these, we aim to provide guidelines for instantiating the Carma framework for specic contexts, also in semi-automatic scenarios [5]. Adaptive Class Association Rule Mining for Human Activity Recognition 33 References 1. Agresti, A.: An Introduction to Categorical Data Analysis. Wiley-Blackwell (2007) 2. Atzmueller, M.: Knowledge-Intensive Subgroup Mining  Techniques for Automatic and Interactive Discovery, DISKI, vol. 307. IOS Press (March 2007) 3. Atzmueller, M.: Data Mining on Social Interaction Networks. JDMDH 1 (2014) 4. Atzmueller, M.: Subgroup Discovery  Advanced Review. WIREs Data Mining and Knowledge Discovery 5(1), 3549 (2015) 5. Atzmueller, M., Baumeister, J., Hemsing, A., Richter, E.J., Puppe, F.: Subgroup Mining for Interactive Knowledge Renement. In: Proc. Conf. on Articial Intelli- gence in Medicine. pp. 453462. LNAI 3581, Springer, Heidelberg, Germany (2005) 6. Atzmueller, M., Baumeister, J., Puppe, F.: Quality Measures and Semi-Automatic Mining of Diagnostic Rule Bases. In: Proc. INAP. pp. 6578. No. 3392 in LNAI, Springer, Heidelberg, Germany (2005) 7. Atzmueller, M., Baumeister, J., Puppe, F.: Semi-Automatic Learning of Simple Diagnostic Scores Utilizing Complexity Measures. Artif Intell Med 37(1) (2006) 8. Atzmueller, M., Lemmerich, F.: Fast Subgroup Discovery for Continuous Target Concepts. In: Proc. ISMIS. LNCS, vol. 5722, pp. 115. Springer, Heidelberg, Ger- many (2009) 9. Atzmueller, M., Lemmerich, F.: VIKAMINE - Open-Source Subgroup Discovery, Pattern Mining, and Analytics. In: Proc. ECML/PKDD. LNAI, vol. 7524. Springer, Heidelberg, Germany (2012) 10. Atzmueller, M., Lemmerich, F.: Exploratory Pattern Mining on Social Media using Geo-References and Social Tagging Information. IJWS 2(1/2) (2013) 11. Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Who are the Spammers? Understandable Local Patterns for Concept Description. In: Proc. 7th CMS Con- ference. Oprogramowanie Nauko-Techniczne, Krakow, Poland (2009) 12. Atzmueller, M., Mueller, J., Becker, M.: Mining, Modeling and Recommending 'Things' in Social Media, LNAI, vol. 8940, chap. Exploratory Subgroup Analytics on Ubiquitous Data. Springer, Heidelberg, Germany (2015) 13. Atzmueller, M., Puppe, F.: SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery. In: Proc. ECML/PKDD. LNAI, vol. 4213, pp. 617. Springer, Heidel- berg, Germany (2006) 14. Atzmueller, M., Puppe, F.: A Case-Based Approach for Characterization and Anal- ysis of Subgroup Patterns. Journal of Applied Intelligence 28(3), 210221 (2008) 15. Atzmueller, M., Puppe, F., Buscher, H.P.: Exploiting Background Knowledge for Knowledge-Intensive Subgroup Discovery. In: Proc. 19th International Joint Con- ference on Articial Intelligence. pp. 647652. Edinburgh, Scotland (2005) 16. Bao, L., Intille, S.S.: Activity Recognition from User-Annotated Acceleration Data. In: Pervasive Computing, pp. 117. Springer (2004) 17. Berchtold, M., Budde, M., Gordon, D., Schmidtke, H., Beigl, M.: ActiServ: Activity Recognition Service for Mobile Phones. In: Proc. ISWC. pp. 18 (Oct 2010) 18. Clark, P., Boswell, R.: Rule Induction with CN2: Some Recent Improvements. In: Proc. EWSL. pp. 151163. Springer, Heidelberg, Germany (1991) 19. Cohen, W.W.: Fast Eective Rule Induction. In: Twelfth International Conference on Machine Learning. pp. 115123. Morgan Kaufmann (1995) 20. Duivesteijn, W., Knobbe, A., Feelders, A., van Leeuwen, M.: Subgroup Discov- ery Meets Bayesian NetworksAn Exceptional Model Mining Approach. In: Proc. ICDM. pp. 158167. IEEE, Washington, DC, USA (2010) 34 Atzmueller et al. 21. Fábián, Á., Gy®rbíró, N., Hományi, G.: Activity Recognition System for Mobile Phones using the MotionBand Device. In: Proc. MOBILWARE. p. 41. ICST (2008) 22. Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued At- tributes for Classication Learning. In: IJCAI. pp. 10221029 (1993) 23. Foerster, F., Smeja, M., Fahrenberg, J.: Detection of Posture and Motion by Ac- celerometry: A Validation Study in Ambulatory Monitoring. Computers in Human Behavior 15(5), 571583 (1999) 24. Gamberger, D., Lavrac, N.: Expert-Guided Subgroup Discovery: Methodology and Application. Journal of Articial Intelligence Research 17, 501527 (2002) 25. Herrera, F., Carmona, C., Gonzalez, P., del Jesus, M.: An Overview on Subgroup Discovery: Foundations and Applications. KAIS 29(3), 495525 (2011) 26. Huynh, T., Schiele, B.: Analyzing Features for Activity Recognition. In: Proc. 2005 Joint Conf. on Smart Objects and Ambient Intelligence. pp. 159163. ACM (2005) 27. Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249271. AAAI (1996) 28. Klösgen, W.: Handbook of Data Mining and Knowledge Discovery, chap. 16.3: Subgroup Discovery. Oxford University Press, New York (2002) 29. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity Recognition using Cell Phone Accelerometers. SIGKDD Explor. Newsl. 12(2), 7482 (Dec 2010) 30. Lara, O.D., Labrador, M.A.: A Survey on Human Activity Recognition using Wear- able Sensors. Communications Surveys & Tutorials, IEEE 15(3), 11921209 (2013) 31. Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup Discovery with CN2- SD. Journal of Machine Learning Research 5, 153188 (2004) 32. Leman, D., Feelders, A., Knobbe, A.: Exceptional Model Mining. In: Proc. ECML/PKDD. LNCS, vol. 5212, pp. 116. Springer (2008) 33. Lemmerich, F., Rohlfs, M., Atzmueller, M.: Fast Discovery of Relevant Subgroup Patterns. In: Proc. FLAIRS. pp. 428433. AAAI Press, Palo Alto, CA, USA (2010) 34. Li, W., Han, J., Pei, J.: CMAR: Accurate and Ecient Classication Based on Multiple Class-Association Rules. In: Proc. ICDM. pp. 369376. IEEE (2001) 35. Liu, B., Hsu, W., Ma, Y.: Integrating Classication and Association Rule Mining. In: Proc. KDD. pp. 8086. AAAI Press (August 1998) 36. Liu, B., Ma, Y., Wong, C.K.: Improving an Association Rule Based Classier. In: Principles of Data Mining and Knowledge Discovery. pp. 504509. Springer (2000) 37. Puppe, F., Atzmueller, M., Buscher, G., Huettig, M., Lührs, H., Buscher, H.P.: Application and Evaluation of a Medical Knowledge-System in Sonography (Sono- Consult). In: Proc. ECAI. pp. 683687. IOS Press (2008) 38. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993) 39. Reddy, S., Mun, M., Burke, J., Estrin, D., Hansen, M., Srivastava, M.: Using Mobile Phones to Determine Transportation Modes. ACM TOSN 6(2), 13:113:27 (2010) 40. Sulzmann, J.N., Fürnkranz, J.: A Comparison of Techniques for Selecting and Combining Class Association Rules. In: Proc. LWA. Technical Report, vol. 448, pp. 8793. Dept. of Computer Science, University of Würzburg, Germany (2008) 41. Thabtah, F.: A Review of Associative Classication Mining. KER 22(1) (2007) 42. Wrobel, S.: An Algorithm for Multi-Relational Discovery of Subgroups. In: Proc. PKDD. LNCS, vol. 1263, pp. 7887. Springer, Heidelberg, Germany (1997) 43. Yang, J.: Towards Physical Activity Diary: Motion Recognition Using Simple Ac- celeration Features with Mobile Phones. In: Proc. Intl. Workshop on Interactive Multimedia for Consumer Electronics. pp. 110. ACM, New York, NY, USA (2009) 44. Yin, X., Han, J.: CPAR: Classication based on Predictive Association Rules. In: Barbará, D., Kamath, C. (eds.) Proc. SDM. pp. 331335. SIAM (2003)