Semi-supervised learning: predicting activities in Android
                          environment
                                  Alexandre Lopes1, João Mendes-Moreira2,3, João Gama3,4

Abstract.1 Predicting activities from data gathered with sensors                applications of machine learning and statistics, sample size tends to
gained importance over the years with the objective of getting a                be the dominant limitation. The problem of working with data
better understanding of the human body. The purpose of this paper               streams is the arrival rate of the examples. When new examples
is to show that predicting activities on an Android phone is                    arrive at a higher rate than they can be mined, the quantity of
possible. We take into consideration different classifiers, their
                                                                                unused data grows without bounds as time progresses.
accuracy using different approaches (hierarchical and one step
classification) and limitations of the mobile itself like battery and              By building a new Smartphone application we attempt to solve
memory usage. A semi-supervised learning approach is taken in                   problems consistent with previous undertakings, such as: accuracy,
order to compare its results against supervised learning. The                   cost, performance among others. We explore matters like: (1) the
objective is to discover if the application can be adapted to the user          impact of the app on the phone’s battery lifetime; (2) how long
providing a better solution for this problem. The activities                    should the interval to collect samples be in order to guarantee
predicted are the most usual in everyday life: walking, running,                accurate classifications; (3) the time to create a model; and (4) the
standing idle and sitting. An android prototype, embedding the                  memory space needed.
software MOA, was developed to experimentally evaluate the ideas                   All software used is open-source so the experiments can be
proposed here.                                                                  continued and the application can be improved.
                                                                                   The aim of this work will be to create an application that adapt
1         INTRODUCTION                                                          to each new user along time, learning his behavior and becoming
                                                                                more accurate.
Recognizing human activities with sensors next to the body has
become more important over the years, aiming to create or improve
systems in elder care support, health/fitness monitoring, and                   2         RELATED WORK
assisting those with cognitive disorders.                                       Activity recognition is not new. Bao & Intille [1] created a system
   It is important to have systems that are practical for the user and          capable of recognizing twenty activities with bi-axial
that have the possibility to always be with them whilst not feeling             accelerometers positioned in five different locations of the user’s
strange or uncomfortable. Taking this into account we will attempt              person. This work led to an important discovery, which was
to use only one sensor instead of a, less practical but more                    possible to get accurate results predicting activities just using
accurate, system of distributed multi-sensors.                                  acceleration values gathered by a sensor placed on the thigh or
   The new generation of smart phones has incorporated many                     dominant wrist. Despite this work uses twenty activities the most
powerful sensors, such as acceleration sensors (i.e.                            common activities used in other works [2,9,17] are walking,
accelerometers), GPS sensors etc. They give the opportunity to                  running, sitting, standing, up and downstairs.
create a system that can always be next to the user and work in                 Some research exists aiming to create a universal model that can be
real-time. In this work we will focus on the motion sensor of the               applied to any user. The idea is to use it in an Android application
cell phone, accelerometer, in order to predict the activity that the            in order to measure the physical exercise of the user by predicting
user is performing, as was attempted previously by Bao & Intille                his activities [2]. This study uses three classification algorithms
[1].                                                                            from WEKA (decision trees J48, logistic regression and multilayer
   This problem will be treated as a classification problem using               neural networks) to induce models to predict user activities. Other
techniques of semi-supervised learning. This will be done in order              studies, that also use the WEKA toolkit, implement common
to take advantage of existing examples (typically unlabeled) from               algorithms like Naïve Bayes, decision tables, K-nearest neighbors
the current user.                                                               and SVM .
   Knowledge discovery systems are constrained by three main                       The common activities that research tries to predict are walking,
limited resources: time, memory and sample size. In traditional                 running, sitting, standing, up and downstairs.
1
                                                                                   Gu et al. [3] tried to solve the activity recognition problem with
  Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias,
                                                                                techniques of semi-supervised learning using a large amount of
  s/n 4200-465 Porto – Portugal, email: alexolopes89@gmail.com
2
  Departamento de Engenharia Informática, Faculdade de Engenharia,              unlabeled data, together with the labeled data, to build better
  Universidade do Porto, Rua Dr. Roberto Frias, s/n 4200-465 Porto –            classifiers. Because semi-supervised learning requires less human
  Portugal, email: jmoreira@fe.up.pt                                            effort and gives higher accuracy, it is of great interest both in
3
  LIAAD – INESC TEC, Rua de Ceuta, 118, 6º; 4050-190 Porto – Portugal           theory and in practice [4].
4
  Faculdade de Economia, Universidade do Porto, Rua Dr. Roberto Frias,
  s/n 4200-464 Porto - Portugal, email: jgama@fep.up.pt


                                                                           38
    One of the most important aspects of the research, in this field,          of major importance since we are dealing with streams in a mobile
is the classifiers’ accuracy and the difficulty of label new instances.        context.
Both Masud et al. [15] and Guan et al. [16] use ensemble methods                   The Naive Bayes algorithm is a classification algorithm based
to increase accuracy in partially labeled data (semi-supervised                on Bayes rule and can often outperform more sophisticated
problems). A common thing in all the works is how they try to find             classification methods. The Naive Bayes algorithm is based on
the more accurate model, testing multiple classifiers with the same            conditional probabilities; it calculates a probability by counting the
data. Authors like Kwapisz et al. [2] showed, when trying to solve             frequency of values and combinations of values in the historical
this classification problem using decision trees, that the most                data. Bayes' Theorem finds the probability of an event occurring
important attribute to differentiate the activities is the acceleration        given the probability of another event that has already occurred. It
they induce on the accelerometer. Domingos et al. [11] showed that             assumes that the attributes X1…Xn are all conditionally
decision trees like C4.5 could be outperformed by Hoeffding trees,             independent of one another, given the target variable Y. The value
and demonstrated their importance when dealing with streams and                of this assumption is that it simplifies dramatically the
limited memory space. The biggest problem of decision trees is                 representation of P(X|Y), and the problem of estimating it from the
that they assume that all training examples can be stored                      training data [12]. An important advantage of this algorithm is the
simultaneously in main memory, and are thus severely limited in                possibility to calculate the required probabilities in one pass over
the number of examples they can learn from. Still, regarding the               the training set. Additionally, it is able to obtain good classification
accuracy, the problem can be solved in a hierarchical way.                     performance even when trained in a small amount of data. We can
Hierarchical classification splits the initial problem into simpler            conclude that this classifier can be trained on an efficient way,
sub-problems. The objective is to have a tree in the end where tests           gathering the probabilities of each attribute
are done in each node. The classes contained in different nodes                    Hoeffding trees [13] operate by collecting, for each leaf node,
from the same level of the tree should be independent [5] so there             sufficient statistics of the training instances each leaf contains.
is no possible uncertainty when choosing the path. It is expected to           Periodically, these leaves are checked to compare the relative
obtain more accurate classifiers by training them in the split data.           merits of each candidate attribute for splitting. The Hoeffding
For activity recognition, this can be done by classifying firstly              bound, or similar metric, is used to determine when a candidate is
whether the activity is motion or motionless and, in a second step,            better than the others. At this point the leaf is split on the best
classifying it in lying, sitting, standing (if it was classified as            attribute, allowing the tree to grow. Typically, information gain is
motionless in the first step) or walking, gentle motion and posture            used to rank the merits of the split candidates, although other
translation (if it was classified as motion in the first step). These          metrics could be used. In the case of discrete attributes, it is
experiments came to the conclusion that rule-based reasoning can               sufficient to collect counts of attribute labels relative to class labels
improve the overall accuracy proving the lustiness of this approach            to compute the information gain afforded by a split. There are
[6].                                                                           some variations of Hoeffding Trees, based on VFDT (Very Fast
    The main drawbacks of using such approaches in a mobile                    Decision Tree learner) which is a high-performance data mining
phone are the limited battery and memory. Experiments were                     system [11]. It is effective in taking advantage of massive numbers
carried out to determine how long the data samples provided by the             of examples by using a very small constant time per example.
cell accelerometer should be in order to obtain accurate                       Since we are working with a mobile phone the biggest advantage is
classification. Some experiments were made and it was discovered               that Hoeffding trees do not store any examples (or parts thereof) in
that at least they need to be captured for 6s and the interval                 main memory, requiring only a space proportional to the size of the
between them can be up to 10s [7]. These results are used in our               tree and the associated sufficient statistics [11].
experiments as described in section 4. Another thing that has                      The novelty of our work is the creation of the Android
impact on the cell phone, more specifically in its memory, is how              application that records data from the accelerometer. It uses a semi-
the data is saved. Not all the data needs to be saved. Using sliding           supervised learning algorithm to process data with a model
windows only the most recent data needs to be available [8]. The               previously learned. This model is used to label the unlabeled data
features of the raw accelerometer data that can be retrieved are the           in real-time. This new labeled data can be used to train future
mean, the standard deviation, the energy and the correlation [9].              models that fit over the user. In the semi supervised approach we
The usefulness of these features has already been demonstrated [1].            defined a threshold of 70% (value that we assumed to be a good
It allows saving both data and memory.                                         percentage of certainty for a classification) which means that we
    In terms of mobile applications, DiaTrace [10] is a system                 add to the training file the instances classified with 70% or more of
developed to aid in sport activities. The authors do not explain how           certainty. We can also define the number of these new instances
they carry out the classification. However they guarantee 95% of               that we need to gather in order to create a new model. The older
accuracy if the mobile phone is used in the trousers front pocket.             instances are deleted in order to maintain the size of the file.
This is an example of how the market demands this type of
applications.
                                                                               4         AN ANDROID PROTOTYPE
                                                                               We have implemented an Android application that records data
3         METHODS
                                                                               from the accelerometer. We use: (1) sequence-based sliding
The tests were made on Naïve Bayes and Hoeffding Trees [11].                   windows [8] in order to save memory; and (2) the method of duty
These two algorithms were chosen because some studies showed                   cycles [7] in order to save battery.
that Naïve Bayes can predict equally as well as decision trees                    In sequence-based sliding windows an amount of data is
(Langley, Iba, & Thomas 1992; Kononenko 1990; Pazzani 1996)                    defined. The file will have only the amount of data that the
and Hoeffding trees can learn in a very small constant time what is            sequence-based sliding window allows. If new data is added it


                                                                          39
replaces the oldest data in order to keep the size stipulated by the          has two classifications: (1) The first one classifies the data into
window.                                                                       Dynamic or Static whether the activities involve motion or not,
   In the duty cycles, 6s of data is needed in order to get enough            respectively (Table 2); (2) Then, in the second classification, a
data so an accurate classification can be achieved. To proceed with           model was built on each category so we could proceed to the
the classification we have 10s before retrieving new data. It means           classification on Walking or Running on the Dynamic category,
that the data from the accelerometer does not need to be fetched all          and Sitting or Standing Idle on the Static one (Table 3).
time, saving battery with less operations of the app running. To              Table 2. Classifiers’ accuracy in the first level of the hierarchical approach.
sum up, we record data for 6s. Then, an instance is created with an           Dynamic vs. Static             Naïve Bayes            Hoeffding Tree
average of the collected values. Finally, it is classified on the next        Accuracy                        82.11 %                    99.85%
10s. This cycle is repeated along time.
                                                                                    Table 3. Classifiers’ accuracy in the second classification of the
                                                                                                          hierarchical approach.
4.1       EXPERIMENTAL SETUP                                                                                  Naïve Bayes           Hoeffding Tree
Before testing the application some decisions had to be made in               Running, walking                  76.25%                   99.05%
order to have a controlled environment so we knew which result                Sitting, standing idle            99.83%                   99.93%
we were expecting for each test done.
    The placement of the mobile was an important issue. Without                  To test the effectiveness of the classification, unlabeled data of a
having the option of placing sensors in different parts of the human          person, which was not used for training the classifier, was used.
body we have chosen the trousers’ front pocket [14] to conduct all            Here are the results for the walking activity – Table 4.
experiments. So there is recorded data with the mobile in a vertical
and horizontal position inside the pocket.                                      Table 4. Accuracy for the walking activity using as test set data from a
    To create the models, data from two persons was used. This data                           person without data on the training sets
contained the average of the values recorded by an accelerometer
for several hours doing, only, activities of walking, running,                                     One-step           Hierarchical       Hierarchical
standing idle and sitting, being the waking activity the one with                                  classification     1st classif.       2nd classif.
more recorded instances. In the total approximately 27 thousand               Naïve Bayes          86,37%             90,17%             84,27%
instances were used.                                                          Hoeffding Tree       67,65%             94,04%             88,09%
    The unlabeled data (files from approximately 16 thousand to 30
thousand instances) was not used to create the model. It belongs to              These results only show that Hoeffding Tree is better than
the two people that contributed with data to create the model.                Naïve Bayes for the walking activity on a hierarchical approach.
There is, also, data from a third person that was not used for                However, Naïve Bayes gives better results on the one-step
learning the models. It was used to evaluate the semi-supervised              approach (Table 4). Further tests were needed for the remaining
learning approach.                                                            activities. Additionally, a semi-supervised approach was also used,
We needed to choose between timestamp and sequence-based                      besides the supervised one described above, in order to evaluate the
sliding windows depending whether the window length is defined                usefulness of using unlabeled data from the user that is being
according to a predefined interval or a predefined amount of data.            tested.
We have chosen sequence-based sliding windows because we                         In order to adapt the model to the normal user of the cell phone
wanted to keep the number of instances controlled and with a time             a threshold of 70% was created, as described in section 3. This
interval that is impossible because the number of data elements in
                                                                              meant that data labeled with at least 70% of certainty would be
the window may vary over time.
                                                                              recorded on the training file of the classifier, so a new model, more
    A threshold of 70% probability is used to proceed with semi-
                                                                              suitable to the user, could be generated. This approach is compared
supervised learning as explained in section 3. This allows creating
                                                                              against the supervised approach (Figure 1). It is easier to check the
new models by appending to previous data the recent labeled data
                                                                              better accuracy when using the semi-supervised approach.
when classified with 70% of certainty, at least.


4.2       EXPERIMENTS AND RESULTS
Previously, labeled data from three different persons was recorded.
The data contained four activities: walking, running, sitting and
standing idle. Using MOA, two different approaches were taken.
   Firstly, models were induced using both Naïve Bayes and
Hoeffding Tree. The classifiers were tested on unlabeled data from
one person (Table 1).

                    Table 1. Classifiers’ accuracy.
                            Naïve Bayes          Hoeffding Tree                 Figure 1 Accuracy of one step classification using both supervised and
   Accuracy                92.00%                  94.78%                                            semi-supervised learning.

   Secondly, a hierarchical approach with two levels was also                    After doing the hierarchical classification (Figure 2 and 3) the
carried out using the same classifiers. The hierarchical approach             labeled data was checked by visual inspection and it was easy to


                                                                         40
observe that Hoeffding Tree tend to label data on the first                      The balance characteristic of Naïve Bayes mentioned before can
classification as Dynamic (probably because the dataset is                   be verified in Figure 5, giving better results when used in the first
unbalanced and the Dynamic class is the majority one: there are              classification. The tendency of Hoeffding Trees to classify, in the
about 15000 Dynamic instances and about 8000 Static ones). Naïve             first step, the data as a Dynamic movement has influence on the
Bayes seems more balanced when labeling new data in the first                second classification where Naïve Bayes has difficulties to label
classification of the hierarchical approach.                                 data because it gets lots of Static labeled data as Dynamic data
                                                                             from the first step. Overall better accuracy is achieved when using
                                                                             the Naïve Bayes classifier on the first classification (Dynamic or
                                                                             Static movement) and Hoeffding Tree on the second classification.


   Figure 2. Classifiers' accuracy on final step of hierarchical
   classification with a supervised learning approach.


                                                                                  Figure 5. Classifiers’ accuracy on final step of hierarchical
                                                                                  classification with a semi-supervised learning approach, using
                                                                                  different classifiers for each step.

                                                                                 The application had also concerns about both the battery and the
                                                                             memory usage. In order to test the battery usage, a stress situation
                                                                             where the app did both the hierarchical classification and the one
                                                                             step classification was created. In order to do it two models were
                                                                             created using the data of about 23.000 lines of labeled data, and
                                                                             doing the classification of 10 unlabeled instances. This experiment
                                                                             told us that the battery usage needs a maximum of 600.0mW for
                                                                             the CPU and between 500mW and 600mW for the LCD, which
                                                                             gives a total between 1100 and 1200mW on hierarchical
  Figure 3. Classifiers’ accuracy on final step of hierarchical              classification. The one step classification only creates one model.
  classification with a semi-supervised learning approach.                   The battery usage needs a maximum of 526mW for the CPU, the
   At last we tested how using the two classifiers together would            LCD needs the same power as the hierarchical approach, of course.
affect the classification (Figure 4).                                        Running the application five times, in a row, we got an energy
                                                                             usage of 120.8J for the CPU in hierarchical classification. However
                                                                             in one step classification we get a total of 110.3J.
                                                                                 Creating models and classifying about 10 instances took almost
                                                                             60s which is a good time since we have only to classify 1 instance
                                                                             every 16s.
                                                                                 In terms of memory, the prototype is about 3Mb, and the files
                                                                             used for training the model having about 23000 lines are 1.466kB
                                                                             each. At most we will have the existence of three files for training
                                                                             (hierarchical approach). These files will grow because we defined a
                                                                             limit of 30000 instances for the training set (sequence-based
                                                                             window), which means that until we reach this limit none of the old
                                                                             training data will be erased and new data is added. When we reach
                                                                             the 30000 instances the sequence-based window will keep the size
  Figure 4. Classifiers’ accuracy on final step of hierarchical              of the file. Whenever new labeled data from the user arrives (using
  classification with a supervised learning approach, using different        the aforementioned 70% threshold) it will substitute the oldest data
  classifiers for each step.                                                 in order to have a semi-supervised learning approach.
                                                                                 The accuracy is not the only indicator of the classifiers’
                                                                             performance. Precision and recall are also important. The
                                                                             technique with higher accuracy might not be the one with the best
                                                                             balance between precision and recall. In our experiments we
                                                                             noticed that Hoeffding Trees have a better balance between
                                                                             precision and recall than Naïve Bayes.


                                                                        41
5         CONCLUSIONS AND FUTURE WORK                                         REFERENCES
The encouraging results of the experiments lead us to affirm that a           [1]    L. Bao & S. S. Intille (2004). “Activity Recognition from User-
step forward has been taken in the study of activity classification.                 Annotated Acceleration Data”, LNCS 3001, Springer, pp. 1-17.
    The most difficult activities to distinguish are walking and              [2]    J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity
running because it is not clear where to draw the line between these                 Recognition using Cell Phone Accelerometers,” SIGKDD
                                                                                     Explorations, vol. 12, no. 2, pp. 74-82, 2010.
two activities.
                                                                              [3]    Tao Gu, Zhanqing Wu, Xianping Tao, Hung Keng Pung, and Jian
    To achieve good results the techniques do not need to be too                     Lu. epSICAR: An Emerging Patterns based Approach to Sequential,
complex, like it was shown using Naïve Bayes. A fair conclusion                      Interleaved and Concurrent Activity Recognition. In Proc. of the 7th
after analyzing the figures is that hierarchical approach gives better               Annual IEEE International Conference on Pervasive Computing and
results with Naïve Bayes doing the first classification and                          Communications (Percom '09), Galveston, Texas, March 9–13,
Hoeffding Tree dealing with the final one. With less complex                         2009.
techniques less power of the mobile is needed, leading to a minor             [4]    X. Zhu, Semi-Supervised Learning Literature Survey, Tech. report,
impact on the classification performance. So, if Naïve Bayes does                    Computer Sciences, University of Wisconsin-Madison, USA, 2005.
not decrease the accuracy it is better to use it in order to save             [5]    A. C. F. Coster, “Classification of basic daily movements using a
memory and battery.                                                                  triaxial accelerometer,” Medical & Biological Engineering, pp. 679-
                                                                                     687, 2000.
    The battery usage confirms that the app can be used non-stop. It
                                                                              [6]    M. Schneider, M. Velten, and J. Haupert, “The ObjectRules
would be thrilling and of greater convenience to create a way that                   Framework - Providing Ad Hoc Context-Dependent Assistance in
could swap classification techniques when the battery was low so it                  Dynamic Environments,” 2010 Sixth International Conference on
could be saved and the application did not have to stop. Changing                    Intelligent Environments, pp. 122-127, Jul. 2010.
from hierarchical classification to one step classification would             [7]    N. S. Y. Wang, J. Lin, M. Annavaram, Q. A. Jacobson, J. Hong, B.
have a maximum impact of 2% on the accuracy using Hoeffding                          Krishnamachari, “A Framework of Energy Efficient Mobile Sensing
tree as classifier.                                                                  for automatic user state recognition”, pp. 179-191, 2009.
    The model used only has to be created when the application                [8]    B. Babcock and M. Datar, “Sampling from a moving window over
starts working. It is used for classifying until the app is shut down.               streaming data,” of the thirteenth annual ACM-SIAM, 2002.
It has only to classify one instance every 16s which is enough to do          [9]    N. Ravi, N. Dandekar, and P. Mysore, “Activity recognition from
                                                                                     accelerometer data,” Proceedings of the National, pp. 1541-1546,
it, so the duty cycles work perfectly.
                                                                                     2005.
    Regarding the memory usage a limit on the training files can be           [10]   G. Bieber, J. Voskamp, and B. Urban, “Activity Recognition for
created, when this limit is reached the older data can be erased and                 Everyday Life on Mobile Phones,” Universal Access in HCI, Part
new data added. This allows the adaptation of the application to                     II, HCII 2009, LNCS 5615, pp. 289-296, 2009.
new users as long as the application is being used by these new               [11]   P. Domingos & G. Hulten (2000). Mining high-speed data streams.
users.                                                                               Proceedings of the sixth ACM SIGKDD international conference on
    The application can be improved by making possible to wear                       Knowledge discovery and data mining - KDD’00, 71-80. New York,
the mobile on other location, testing other classifiers or changing                  New York, USA: ACM Press. doi:10.1145/347090.347107.
the way the data is processed.                                                [12]   T. Mitchell (1997). Machine Learning, McGraw Hill.
    New tests can be made using data from people with mobility                [13]   Holmes, K. Richard, B. Pfahringer (2005). Tie-breaking in
                                                                                     Hoeffding trees. In proceedings of the Second International
constraints. Improving the app so it can adapt to this kind of people
                                                                                     Workshop on Knowledge Discovery from Data Streams, Porto,
can be important if an accurate prediction can be made. Studies of                   Portugal, 2005.
patients with diseases that tend to degrade the ability to move can           [14]   S. Sprager and D. Zazula, “A cumulant-based method for gait
be accomplished to prevent, for example, falls or just to study how                  identification using accelerometer data with principal component
the movements change. This prevention can also be applied to                         analysis and support vector machine,” WSEAS Transactions on
elder people.                                                                        Signal Processing, vol. 5, no. 11, pp. 369–378, 2009.
    With this knowledge, people who practice sport can also                   [15]   M.M. Masud, J. Gao, L. Khan, J. Han and B. Thuraisingham,
benefit. For example, understanding how their body posture can be                    2008. “A practical approach to classify evolving data streams:
corrected in order to achieve better results.                                        Training with limited amount of labeled data”. Proceedings of the
    This is just the beginning of an application that can be expanded                8th International Conference on Data Mining, December 15-19,
                                                                                     2008, Pisa, Italy, pp: 929-934.
in order to provide a better intimate experience between users and
                                                                              [16]   D. Guan , W. Yuan , Y. Lee , A. Gavrilov , S. Lee, “Activity
mobile phones.                                                                       Recognition Based on Semi-supervised Learning”, Proceedings of
                                                                                     the 13th IEEE International Conference on Embedded and Real-
                                                                                     Time Computing Systems and Applications, p.469-475, August 21-
ACKNOWLEDGEMENTS                                                                     24, 2007.
                                                                              [17]   T. Brezmes, J. Gorricho, J. Cotrina, “ Activity Recognition from
                                                                                     accelerometer Data on a Mobile Phone”, Lecture Notes in Computer
This work is funded by the ERDF through the Programme
                                                                                     Science, 2009, Volume 5518, Distributed Computing, Artificial
COMPETE and by the Portuguese Government through FCT -                               Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted
Foundation for Science and Technology, project KDUS ref.                             Living, Pages 796-799
PTDC/EIA-EIA/098355/2008.


                                                                         42