Towards Recognizing Abstract Activities: An Unsupervised Approach Albert HEIN and Thomas KIRSTE Dept. of Computer Science University of Rostock 18059 Rostock, Germany {albert.hein, thomas.kirste}@uni-rostock.de Abstract. The recognition of abstract high-level activities using wearable sensors is an important prerequisite for context aware mobile assistance, especially in AAL and medical care applications. A major difficulty in detecting this type of activities is that different activities often share similar motion patterns. One possible solution is to aggregate these activities from shorter, easier to detect base level actions, but the explicit annotation of these is not trivial and very time consuming. In this paper we introduce a simple clustering based method for the recognition of compound activities at a high level of abstraction using k-Means as an unsupervised learning algorithm. A general problem of these methods is that the resulting cluster affilia- tions are typically not human readable and some kind of interpretation is needed. To achieve this, we developed a hybrid approach using a generative probabilistic model built on top of the clusterer. We adapted a Hidden Markov Model for map- ping the cluster memberships onto high-level activities and sucessfully evaluated the feasibility of this technique using experimental data from two test runs of a home care scenario showing a higher accuracy and robustness than conventional discriminative methods. Keywords. Activity Recognition, High-Level Activities, Clustering, Probabilistic Models, AAL 1. Introduction Activity Recognition based on wearable sensors is a growing and fast changing field of research and is widely seen as a major prerequisite for context aware computing and mobile intelligent assistance systems. In our work we focus on assistive technology sup- porting elderly people during ageing at home, a field of application often referred as Ambient Assisted Living (AAL). But as technical aids cannot satisfy all requirements, at some stage people depend on human medical care. An alternative to stationary treatment is the professional ambulant elderly care at home. Especially this kind of service needs accurate documentation of care activities to allow correct accounting for the health in- surances. The usual documentation process is to this day still done manually and takes up to 40% of the working time, is error-prone and mostly inaccurate because it does not happen in situ but afterwards. Our research tries to automate this documentation process by recognizing these care activities. 102 Using inertial measurement units consisting of accelerometers, gyroscopes, mag- netometers or combinations of them many research groups already reported successful recognition of simple, often called "base level" activities [1,2,3,4]. Those natural activ- ities like walking, running, sitting, climbing stairs, etc. are characterized by a distinct corresponding, often periodic motion pattern. In most cases simple discriminative pat- tern recognition approaches, static classifiers like Support Vector Machines or Decision Trees are sufficient for detecting this kind of activities at a surprisingly high rate. Aux- iliary generative models built upon the classifiers are mostly used as a kind of temporal smoothing. Depending on the kind of activities to be detected usually only parameters like extracted features and window lengths may vary. Unfortunately the care activities are more complex. The recognition of higher-level activities still is a current research objective. These are generally more abstract and artificially defined, compound activity sequences with ambiguous motion patterns, where the described methods fail. Simple base-level actions and motions are repeatedly shared between different activities. An example taken from the care scenario would be the big vs. small morning toilet (washing whole body, brush- ing teeth, shaving vs. only washing the body). Probabilistic models are capable of handling noisy, unsure and incomplete sensor data and are able to include causal and temporal dependencies and prior knowledge into the decision process which can, besides temporal smoothing, greatly enhance the recog- nition rate of such complex activities. They are able to represent the compound charac- ter at a higher level of abstraction making it possible to compose a compound activity from several easily distinguishable base level actions like building a sentence from single words. If supervised learning is used for the base level activity layer a very detailed anno- tation of the underlying training material is essential. Each single action building block like step, crouch, turn around, rotate wrist, lean forward, etc. must be tagged. Select- ing actions which are actually important or easily distinguishable (e.g. different types of leaning forward) is a non-trivial, extremely time consuming and because of the available sensor data a mostly unrealistic task which must be repeated for every single application domain. Unsupervised machine learning techniques on the other hand don’t need any anno- tated training data as they aim to find inherent structures. The main disadvantage of these methods for the activity recognition task is that those structures - or clusters - are gener- ally not easily comprehensible as they don’t distinctly map to the anticipated activities. Our approach uses generative probabilistic models for learning cluster memberships as model state emissions to map them onto abstract activity classes. Our key contributions in this paper are: A novel method for recognizing very ab- stract (arbitrarily defined) compound high-level activities containing shared and repeti- tive occurring partial base level activities first published in this work. Furthermore we re- duced the annotation complexity through the use of unsupervised learning methods need- ing only coarse annotation of the target activities for the overlying model. We realized the interpretation of cluster affiliations, the inclusion of temporal and causal coherences and of prior knowledge through a probabilistic model which is also capable of handling very different sensor modalities and the automatic determination of model parameters from training data considering individual linear reliability biases. Finally we evaluated the feasibility of our approach regarding recognition accuracy and robustness where it outperforms conventional supervised techniques. 103 Clusterer Model Features Clusters .997 .993 .997 .999 Activities [fk-Means t · · · ft+δt ] .001 Feature .971 .002 0 1 2 3 Activity .005 Cluster Sensors .001 Extraction Clustering Time (seconds) HMM Time (seconds) Classifier Model Activity Activity C4.5 Figure 1. Inference System Overview: For each timestep t a feature vector ft is calculated from the raw sensor Time (seconds) Time (seconds) data. The Clusterer assigns each ft to a single cluster which represents a not explicitly named base level activity. The Hidden Markov Model allocates SVM each cluster sighting to the currently most likely abstract activity class. Activity Activity Time (seconds) Time (seconds) 2. Technical Challenges Activity Activity NB While the recognition of basic low-level activities works quite well using classical pat- Time (seconds) Truth Time (seconds) Estimate tern recognition techniques, abstract and high-level activities are much more problematic and the research at its beginnings. One can illustrate this problem imaginating a human observer without special domain knowledge. While simply watching he can intuitively distinguish between base-level actions due to their characteristic motion patterns, but he would not be able to discriminate different abstract and arbitrarily defined activities as he doesn’t know about their elementary base-level components. To summarize some general challenges: • Interleaved or interrupted activities which are not executed sequentially • Ambiguities between different activities sharing the same motions or gestures • Variations in the activity performance between single or multiple subjects and distortions by uninvolved persons • Different levels of complexity between elementary or compound activities • Different levels of granularity between coarse motion and fine grained gestures A layered approach would address most of these problems by representing different lev- els of abstraction by different model layers. Using discriminative supervised methods for the underlying base-level recognition, as they have proven to be successful at this task, brings out a new difficulty. Choosing relevant and easily distinguishable actions and annotating them is non-trivial and very time consuming as already mentioned in the introduction. Utilising unsupervised clustering methods avoids this problem, but does not produce an interpretable classification result anymore, so some kind of mapping is needed. 3. Algorithms For detecting abstract compound and high-level activities we are following a hybrid two- level approach reflecting the inherent structure of the given activities. As they are con- sisting of multiple base level actions corresponding to relatively easy to detect motion patterns, we use an underlying unsupervised discriminative layer for identifying them. As we are not interested in the actual sequence of these low-level actions but in the high- level activity trajectory these basic building blocks must be mapped to abstract activities as we mentioned in the section before. For this task we assign the outputs of the under- lying layer as emissions of single states of a generative model (HMM), representing the compound activities. An abstract overview of the system is given in Fig. 1. 104 3.1. Related Work The idea of breaking the task of detecting activities into two steps is not new in gen- eral. 2005 Lester et. al. [1] presented a related approach for detecting base level activi- ties which used a discriminative layer (boosted decision stumps) and a second layer of HMMs as temporal smoothers. Wang et al. [5] increased the representational power of the probabilistic model by adding a Dynamic Bayesian Network for including tempo- ral characteristics and RFID-based object detection. Our work extends these approaches by using unsupervised learning for the basic actions and one single overlying HMM for modelling and inferring the abstract activities. As noticed before the explicit labelling of each base level activity is extremely time consuming and the selection of relevant and easy detectable classes is non-trivial, so an unsupervised learning approach is highly desirable. Initial work of Huỳnh et al. [6,7] (using multiple Eigenspaces) and Nguyen et al. [8] (a modified HMM based approach) indicated that an unsupervised discovery of inherent structure in training data from in- ertal sensors is feasible. Even simple k-Means clustering already showed good results in grouping undefined activities [9]. As in general these inherent structures are not di- rectly corresponding to the appropriate activities. Hence Huỳnh et al. [10] extended the k-Means clustering method with histogram based temporal smoothing and statistical classifiers (Nearest Neighbours and SVM) for assigning them to the appropriate activity, achieving recognition rates about 92% for three high level activities and compared them to a less successful HMM working on the raw sensor data. 3.2. Algorithmic Details The individual raw sensor data channels are being synchronized and then processed in the feature extraction module. This module calculates 562 different features from half overlapping windows of 1.28s, consisting of frequency domain, statistical, curve, physi- cal, correlation and step detection features (for a detailed explanation see [11]). In this work we also utilize the k-Means clustering algorithm for autonomous iden- tification and detection of simple base level motion patterns as it showed promising be- haviour in prior work [9,10], is easy to implement and resource-friendly, which is impor- tant for mobile devices. As a comparison between supervised and unsupervised methods we also evaluate Naive Bayes, the C4.5 Decision Tree and a Support Vector Machine as statistical classifiers which all proved to be successful in terms of activity recognition. Above this layer the Hidden Markov Model is used for the interpretation of the clus- ter affiliations. Every hidden state represents one high level activity and emits a cluster membership at a certain probability each timestep. Therefore the model keeps an emis- sion probability table containing multinomial distributions for each state. The individual transition probabilities reflect the likeliness of a state transition and implicitly model the duration of the given activities through self referencing (loops). Together with the prior probabilities this unveils a major problem of HMMs - the huge number of parameters to set which normally is done initially and then refined using an Expectation Maximization algorithm. In our approach we determine these parameters analytically from the training data in a preprocess. That typically produces an optimal and therefore overfitted model which represents the training examples best but does not generalize on unseen data. For this rea- 105 son we are using an ad-hoc regularization mechanism inspired by the Regularized Dis- criminant Analysis (see [12], pp. 90-91). Therefore we introduced regularization factors λprior for prior, λtrans transition and λobs emission observation probabilities (Eq. 1 to 3 shows how final prior P (St ), state transition P (St |St−1 ) and observation probabilities P (O|St ) are obtained from weighted probabilities precalculated from the training data and equally distributed probabilities). This results in weighting factors between λ = 0 (equally distributed / no training data) and λ = 1 (perfect fit to training data) which allows a simple and intuitive manual control over the reliability of the model parame- ters given the individual uncertainty. This way the model can be made resistant against unsure, noisy and incomplete data. P (St ) = λprior · P (St )calc + (1 − λprior ) · P (St )equal (1) P (St |St−1 ) = λtrans · P (St |St−1 )calc + (1 − λtrans ) · P (St |St−1 )equal (2) P (O|St ) = λobs · P (O|St )calc + (1 − λobs ) · P (O|St )equal (3) Another advantage – not regarded in this paper, as the focus lies on IMU1 tracking, but in prior work – is the possible inclusion of different sensor modalities on various time bases (e.g. event based), particularly RFID object detection (see [13,5,14,15]). As we suspect the state-conditional distributions of the observations to be non-gaussian and multimodal in the general case, we are relying on a sample based modelling approach. Therefore we use the HMM in conjunction with a particle filter [16]. 4. Experimental Setup and Sensors The following experiment was conducted to evaluate the feasibility of our unsupervised approach for recognizing sufficiently realistic health care activities. The chosen repertory only consists of compound activities at a high level of abstraction by trying to consider all of the challenges specified in section 2. As we were interested in a general proof of concept, we were not doing tests out of lab with authentic subjects inside a nursing home at this early stage. It is best practice to carry out initial experiments in a controllable environment under optimal observability. Therefore the test runs have additionally been accompanied by video and audio surveillance to facilitate later manual annotation of the ground truth. For our setting we roughly rebuilt the floor plan of an apartment consisting of a bed- room, a bathroom, a living room and a kichenette in our laboratory. The test runs were performed by professional care personnel (a geriatric nurse) and a student who helped out as a patient. The scenery was observed by a fisheye and a ceiling mounted dome cam- era (Example still frames see Fig 2). A general preselection of care activities was given, as this is common for a care plan. The test agenda and the scenario have been developed in close cooperation with a nursing service, which also provided authentic equipment for the tests. We have sampled two runs of an authentic sequence of morning care activities taken from a real person. The activities were directly taken from the service accounting catalogue of the health insurances: "general service" (greeting, fetching newspaper, ...), 1 Inertial Measurement Unit 106 "big morning toilet" (including washing whole body, brushing teeth), "micturition and defecation", "administration of medications", "bandaging", "preparation of food" and "documentation". We collected 14min (317mb) and 12 minutes (289mb) of raw data. Ad- ditionally the experimental environment was equipped with RFID tags to support object interaction detection, which is not in the focus of this paper. To avoid biasing, the test subjects were not involved in planning and setting up the experiment or analyzing the data afterwards in any way. Each subject was instructed to behave as natural as possible and to try to ignore the attached sensors. We used three SparkFun IMU 6-DOF v3 sensor boards for our data collection. These are equipped with a 3-axis Freescale MMA7260Q accelerometer, 3-axis InvenSense IDG300 gyroscopes and a 2-axis Honeywell HMC1043 magnetometer. The LPC2138 ARM7 microcontroller is also capable of preprocessing the raw data onboard. We used the IMU to sample relative motion and rotation at a rate of 50Hz at a range of 6g to fully capture normal human motion as described in [17]. The raw data was instantaneously transmitted via a class 1 bluetooth link with a max. operating distance of 30 to 100m. Because of the compact size (51x41x23mm) the board could be attached at unobtru- sive positiones: at the dominant wrist for recording gestures and object motion, at the chest/upper back and at the hip. These sensor positions have been shown to operate well in the literature and in own prior work [2,3,4]. For our initial tests we were recording raw data with a probably higher number of sensors than actually required for the final appliance, so that it is possible to evaluate single sensor channels or combinations of subsets afterwards on the original sensor data later. All data streams were wirelessly transmitted to a laptop computer where they were immediately formatted, synchronized and saved to disk. 5. Evaluation Three different aspects were regarded during the evaluation of the clustering/model al- gorithm. At first the best fitting number of underlying clusters was determined, then the accuracy of the clustering approach was compared to conventional statistical classifiers using the same abstract model. At last the robustness was tested by applying a model trained on the first test run to the second. During this test optimal values for the model Figure 2. Still frames taken from the surveillance cameras. Fisheye (left) and Dome (right). The test subject is equipped with sensors at the hip and upper back and dominant wrist. 107 regularization factors λobs and λtrans were determined. For all experiments the decoding of the HMM was done using a particle filter with 100000 particles. 5.1. Determining the optimal number of clusters The main idea behind the clustering approach is that all clusters represent single base level actions which are the building blocks of the abstract activities. These actions are not predefined, so the number of clusters/actions is an important parameter, as the k-Means clustering algorithm is not able to find an optimal number of clusters itself but needs a preset value for k. An optimal value will maximize the recognition accuracy of the whole model. Finding the best value seems to be a typical task for the expectation maximization algorithm. In this case the EM based parameter optimization fails as it leads to massive overfitting. As too high values of k simply allow to memorize specific training examples which produces nearly perfect recognition rates, the EM algorithm simply increases k until nearly each training example is represented by a single cluster. Hence we decided to manually choose a value for k following the best practice of increasing k stepwise and choosing the first local maximum ("kink", see [12], pp. 461-472) after which the ascent of the recognition rate starts to flatten. This was done for both care test runs, resulting in an optimal value of k = 35 (Tab. 1). For both test runs the model regularization factors λprior , λtrans and λobs were set to 1 which equates a perfectly fitting model with full trust in the sensor data and model probabilities to provide theoretically ideal test conditions as we were only interested in the underlying clustering parameters at this stage.The tests were conducted using a 10-fold stratified cross validation to ensure a realistic estimization of the generalization error. Confusion matrices for k = 35 are shown in Tab. 2. Notice that activity 6 ("documentation") was not carried out in the first test run. k 5 10 15 20 25 30 35 40 50 60 80 100 150 200 care1 65.4 67.2 77.4 88.5 94.0 97.1 98.2 93.3 94.2 98.2 98.9 98.2 98.4 98.7 care2 44.4 67.9 84.7 86.6 88.1 89.0 96.0 94.3 97.5 97.7 95.5 98.5 97.7 98.5 Table 1. Finding the optimal number of clusters: The table shows the recognition accuracy in % for our model with increasing number of clusters k for both test runs. λprior , λtrans and λobs were set to 1 (perfect fit) estimate estimate care2 care1 0 1 2 3 4 5 6 0 1 2 3 4 5 0 27 0 0 0 0 0 0 0 23 0 0 0 0 0 1 2 228 6 0 0 0 0 1 4 196 0 0 0 0 2 0 5 23 0 0 0 0 2 0 0 32 0 0 0 truth truth 3 0 0 2 34 0 0 0 3 0 0 1 33 0 0 4 1 0 0 2 121 0 0 4 0 0 0 0 144 1 5 1 0 0 0 0 13 0 5 0 0 0 0 2 15 6 0 0 0 0 0 0 6 Table 2. Confusion matrices for both test runs at a value of k = 35. Different activity classes are 1 "general service" (greeting, fetching newspaper, ...), 1 "big morning toilet" (including washing whole body, brushing teeth), 2 "micturition and defecation", 3 "administration of medications", 4 "bandaging", 5 "preparation of food" and 6 "documentation" 108 5.2. Comparison of Supervised and Unsupervised Approaches After estimating the performance of the unsupervised clustering approach, we were in- terested in how it compares to other conventional methods used in own and related prior work. Therefore we simply replaced the k-Means Clusterer with several relevant super- vised learners while keeping the model. The same training data was learned by statisti- cal classifiers regarding the given annotated activity classes. Then the model parameters were set equally to the unsupervised approach, again with λprior , λtrans and λobs set to 1, and again cross validated. This proceeding assures direct comparability of the un- derlying learning algorithms. As statistical classifiers we chose the C4.5 decision tree, a Support Vector Machine (SVM) and Naive Bayes (NB). Table 3 shows the recognition results of all four methods for both test runs. For the k-Means k was set to 35 (see above). care1 care2 C4.5 86.9 86.6 Support Vector Machine 85.8 85.6 Naive Bayes 77.2 70.9 k-Means (k = 35) 98.2 96.0 Table 3. Supervised/Unsupervised comparison results: The table shows the specific recognition accuracies in % for both test runs. λprior , λtrans and λobs were again set to 1 (perfect fit). For both test runs C4.5 and SVM showed a comparable recognition accuracy about 96% while Naive Bayes results were settled at 77% / 71% respectively. The best recogni- tion rates were achieved by the embedded clusterer by far with a recognition rate around 98% / 96%. The detailed results of this comparison on a continuous time trace are illustrated in Fig. 3. The left side shows the outputs of the first layer which are the cluster affiliations (k-Means) or class memberships (statistical classifiers). It can be clearly seen that the k = 35 clusters do not correspond to the anticipated activity classes, while the output of the C4.5, SVM and Naive Bayes represents the actual activities although being very noisy. On the right the final outputs of the models are shown in comparison to the ground truth. 5.3. Robustness Experiments As the recognition accuracy of a model trained and evaluated on a single experiment does not give any information regarding the robustness and generalized performance on completely unseen activity sequences we estimated the general recognition rate with a clusterer/model trained on the first test run and applied it to the second. This method is rather brute force and can just give a hint on the real generalization performance due to the lack of sufficient training data. For getting a more realistic estimation it would be necessary to record a higher number of test runs and evaluate the recognition system using leave-one-out cross validation. For estimating the achievable performance we had to determine the optimal regu- larization factors λprior , λtrans and λobs , as a model 100% adapted to one test run is unable to explain a differing second one. Factor λprior was left constant because the prior probabilities in both test cases were the same, so a variation did not have any effect. 109 Clusterer Model Activity Cluster k-Means Time (seconds) Time (seconds) Classifier Model Activity Activity C4.5 Time (seconds) Time (seconds) Activity Activity SVM Time (seconds) Time (seconds) Activity Activity NB Time (seconds) Time (seconds) Truth Estimate Figure 3. Supervised/Unsupervised comparison results for the first test run: The left side shows the outputs of the first layer (cluster affiliations and class memberships) at a continuous time trace. The diagrams on the right side show the individual activity trajectory recognized by the overlying model compared to the ground truth. The regularization factors λtrans (representing the trust in the transition probabilities of the model) and λobs (representing the trust in the cluster assignments) were gradually altered. The size of the steps was decreased in regions where the changes in performance were more significant. λtrans = 1.00 was not applicable in this test as the model did not allow any temporal changes in the activity trajectory of the test run, which causes the particle filter to make all particles die immediately at the first time the observations don’t fit the states anymore. The actual recognition results are illustrated in Tab. 4 and Fig. 4. The best perfor- mance was achieved for λtrans = 0.99 and λobs = 0.05. We were surprised by the rel- atively high recognition rate of about 75%. As we observed a very high variance in the performance of the test subject we were expecting a very low accuracy. A closer look at the regularization factors shows that recognition results are significantly better when the trust in the model state transition is very high and far away from equally distributed. The optimal trust in the cluster affiliation observation is relatively low and settled around 5% 110 75,6% Accuracy in % λtrans λobs Figure 4. Robustness experiments: The recognition accuracies for our model trained on the first test run and applied on the data of the second in relation to λtrans and λobs . λtrans 0.99 0.98 0.97 0.95 0.90 0.80 0.50 0.00 1.00 60.9 59.9 59.0 57.1 53.5 50.0 46.5 42.9 0.50 62.6 59.4 58.6 56.9 54.1 50.5 45.2 39.5 0.20 67.1 64.8 63.5 61.1 55.8 50.7 45.4 39.5 λobs 0.10 72.6 70.3 66.2 62.8 58.2 51.4 45.2 39.1 0.05 75.6 74.1 73.2 67.5 60.1 52.9 46.3 38.9 0.00 52.6 52.7 52.9 52.9 53.1 52.6 50.3 15.1 Table 4. Robustness experiments: The detailed recognition accuracies in % for our model trained on the first test run and applied on the data of the second in relation to λtrans and λobs . which indicated two things: First, the implicit knowledge saved in the generative model is of superior importance for the robust recognition of abstract activities. Second, the variability in the execution of the two test runs was very high, resulting in highly differ- entiating cluster affiliation probabilities for the activities of both test runs which again could be expected due to the high variability in execution. Although the trust in cluster observations is relatively low the clustering is an essential step. Letting λobs converge to zero the recognition rate rapidly falls off with an activity trajectory ending in a flat line or complete chaos respectively. Besides the high executional variety, the diagram (Fig. 5) and the confusion matrix (Tab. 5) reveal some further problems: First, the test subject did not make any documenta- tion in the first test run, so the model was not trained on this activity and unable to detect it. Therefore it interpreted "documentation" (activity 6) as "general service" (activity 1). Second: Some activities just had a very small number of training examples, which itself already causes some fundamental problems for machine learning algorithms, especially in a high dimensional feature space, regardless the applied learning method. If we again compare the clustering approach to the supervised methods the clustering method significantly outperforms the conventional methods by more than 40% (Tab. 6). 111 Truth Activity Estimate Time (seconds) Figure 5. Robustness experiments: A continuous time trace of the activities recognized by the model compared to the ground truth. For this detailed view the model was trained on the first test run and applied on the second, parameters were set to k = 35, λprior = 1 λtrans = 0.99 and λobs = 0.05. estimate 0 1 2 3 4 5 0 14 0 0 0 0 13 1 39 178 19 0 0 0 2 0 0 13 15 0 0 truth 3 0 0 0 30 6 0 4 3 0 0 0 118 3 5 11 0 0 0 0 3 6 5 1 0 0 0 0 Table 5. Confusion matrix for the model trained on the first test run and applied on the second, parameters were set to k = 35, λprior = 1 λtrans = 0.99 and λobs = 0.05. Different activity classes are 1 "general service" (greeting, fetching newspaper, ...), 1 "big morning toilet" (including washing whole body, brushing teeth), 2 "micturition and defecation", 3 "administration of medications", 4 "bandaging", 5 "preparation of food" and 6 "documentation". A surprising exception seems to be the Naive Bayes Classifier with a recognition rate of more than 80%. Unfortunately a closer look unveils the "fraud". The output classes of the classifier caused the HMM to simply detect the two most common activities ("big morning toilet" and "bandaging") which make about 80% of the recorded data and ignore the rest. Hence this approach is completely unusable, although the recognition rate seems exceptionally high. care1 → care2 C4.5 29.3 Support Vector Machine 31.0 Naive Bayes 80.9 k-Means (k = 35) 75.6 Table 6. Clusterer/Classifier robustness comparison results: The recognition accuracy in % for the Clus- terer/Classifiers/Models trained on the first test run and applied on the data of the second run. (λtrans = 0.99, λobs = 0.05) 112 6. Conclusion The major goal of this paper was to evaluate the feasibility of a simple clustering based method for the recognition of compound activities at a high level of abstraction. As one of the main difficulties in detecting this type of activities are similar motion patterns shared between multiple activities we proceeded to aggregate these activities from shorter, easier to detect base level actions. As the explicit annotation of these is not trivial and very time consuming we were looking for an unsupervised learning algorithm. A general problem of these methods is that the resulting cluster affiliations are typically not human readable and some kind of interpretation is needed. To achieve this, we developed a hybrid method using a generative probabilistic model built on top of the clusterer. We adapted a Hidden Markov Model for mapping the cluster memberships onto high level activities and evaluated this technique using experimental data from two test runs of a home care scenario. A welcome side effect of this type of model is the ability to also include knowledge about temporal and causal dependencies. It could be shown that the hybrid k-Means/HMM approach outperforms classical hybrid discriminative/generative methods built upon statistical classifiers significantly in terms of accuracy and robustness. It was found that best overall results could be obtained at a number of about 35 clusters which shows that already about 35 base level actions are able to disambiguate the anticipated high level activities successfully. When using the trained model on a completely unseen test run the recognition accuracy noticably drops from about 96% to 75%, which is still much above random guessing (14%) and above the conventional approaches (31%, Naive Bayes’ impressive "bogus" results disregarded). We expect this value to be better in future experiments, as we were relying on very few training data which overemphasized the variability between both test runs. The bad recognition results of the statistical classifiers suggest that conventional methods trying to handle compound activities in a monolithic manner are inappropriate in principle as they do not generalize well. As k-Means is a very simple algorithm we do not expect it to be the method of choice in general, other clusterers may be even more successful. Probably also supervised learning methods can outperform this simple approach, provided that the appropriate base level actions have been annotated in advance, but as mentioned before this would not be practicable. We also proposed a simple method for automatically determining the necessary model parameters without time consuming learning under consideration of a few intu- itive regularization factors comparable to laplacian correction. Due to the limited amount of data our results can just be seen as initial steps towards an approach for recognizing abstract high level activities in a manner requiring much less supervision. The next logical step will be collecting a much larger and more realistic out-of-lab dataset recorded during the day to day work inside a nursing home. We noticed one general disadvantage of the chosen HMM during the evaluation. As the state transition probabilities include both the probability of switching to the next state and the average duration of the current state, the interpretation of this transition matrix is somehow inconvenient and not very handy. Hence we have already started to modify our model to explicitly incorporate state durations disentangled from state switching probabilities. Another thing we are currently investigating is the integration of partial order task models into the modelling process. 113 Acknowledgements The project MArika [18] is funded by the state of Mecklenburg-Vorpommern, Ger- many within the scope of the LFS-MA project. The care experiments were supported by Informatik-Forum Rostock e.V.. References [1] J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford, “A hybrid discriminative/generative approach for modeling human activities,” in IJCAI (L. P. Kaelbling and A. Saffiotti, eds.), pp. 766–772, Professional Book Center, 2005. [2] J. Parkka, M. Ermes, P. Korpipaa, J. Mantyjarvi, J. Peltola, and I. Korhonen, “Activity classification using realistic data from wearable sensors,” Information Technology in Biomedicine, IEEE Transactions on, vol. 10, no. 1, pp. 119–128, Jan. 2006. [3] N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman, “Activity recognition from accelerometer data,” in AAAI (M. M. Veloso and S. Kambhampati, eds.), pp. 1541–1546, AAAI Press / The MIT Press, 2005. [4] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive (A. Ferscha and F. Mattern, eds.), vol. 3001 of Lecture Notes in Computer Science, pp. 1–17, Springer, 2004. [5] S. Wang, W. Pentney, A.-M. Popescu, T. Choudhury, and M. Philipose, “Common sense based joint training of human activity recognizers,” in IJCAI (M. M. Veloso, ed.), pp. 2237–2242, 2007. [6] T. Huynh and B. Schiele, “Unsupervised discovery of structure in activity data using multiple eigenspaces,” in 2nd International Workshop on Location- and Context-Awareness (LoCA), (Dublin, Ireland), Springer, May 2006. [7] T. Huynh and B. Schiele, “Towards less supervision in activity recognition form wearable sensors,” in Proceedings of the 10th IEEE International Symposium on Wearable Computing (ISWC), (Montreux, Switzerland), October 2006. [8] A. Nguyen, D. Moore, and I. McCowan, “Unsupervised clustering of free-living human activities us- ing ambulatory accelerometry,” Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE, pp. 4895–4898, 22-26 Aug. 2007. [9] T. Huynh and B. Schiele, “Analyzing features for activity recognition,” in Proceedings of the 2005 joint conference on Smart objects and ambient intelligence: innovative context-aware services: usages and technologies, (Grenoble, France), pp. 159–163, ACM Press New York, NY, USA, 2005. [10] T. Huynh, U. Blanke, and B. Schiele, “Scalable recognition of daily activities with wearable sensors,” in 3rd International Symposium on Location- and Context-Awareness (LoCA), 2007. [11] A. Hein, “Echtzeitfähige merkmalsgewinnung von beschleunigungswerten und klassifikation von zyk- lischen bewegungen,” Master’s thesis, University of Rostock, 11 2007. [12] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning. Springer, August 2001. [13] D. J. Patterson, D. Fox, H. Kautz, and M. Philipose, “Sporadic state estimation for general activity inference,” tech. rep., University of Washington and Intel Research Seattle, July 2004. [14] D. J. Patterson, D. Fox, H. Kautz, and M. Philipose, “Fine-grained activity recognition by aggregating abstract object usage,” iswc, vol. 0, pp. 44–51, 2005. [15] A. Hein and T. Kirste, “Activity recognition for ambient assisted living: Potential and challenges,” in Ambient Assisted Living Ambient Assisted Living Ambient Assisted Living, pp. 263–268, VDE Verlag, 01 2008. [16] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking,” Signal Processing, IEEE Transactions on [see also Acous- tics, Speech, and Signal Processing, IEEE Transactions on], vol. 50, no. 2, pp. 174–188, Feb 2002. [17] C. Bouten, K. Koekkoek, M. Verduin, R. Kodde, and J. Janssen, “A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity,” Biomedical Engineering, IEEE Trans- actions on, vol. 44, no. 3, pp. 136–147, Mar 1997. [18] “Landesforschongsschwerpunkt http://marika.lfs-ma.de/,” April 2008. 114