=Paper=
{{Paper
|id=Vol-1724/paper4
|storemode=property
|title=Extending Naive Bayes with Precision-tunable Feature Variables for Resource-efficient Sensor Fusion
|pdfUrl=https://ceur-ws.org/Vol-1724/paper4.pdf
|volume=Vol-1724
|authors=Laura Isabel Galindez Olascoaga,Wannes Meert,Herman Bruyninckx,Marian Verhelst
|dblpUrl=https://dblp.org/rec/conf/ecai/OlascoagaMBV16
}}
==Extending Naive Bayes with Precision-tunable Feature Variables for Resource-efficient Sensor Fusion==
Extending naive Bayes with precision-tunable feature variables for resource-efficient sensor fusion Laura Isabel Galindez Olascoaga1 and Wannes Meert2 and Herman Bruyninckx3 and Marian Verhelst1 Abstract. Resource-constrained ubiquitous sensing devices suf- accuracy 4 without taking the actual hardware footprint of online in- fer from the fundamental conflict between their limited hardware ference into account. To facilitate effective sensory fusion inference resources and the desire to continuously process all incoming sen- tasks inside embedded devices, this paper strives to enable resource- sory data. The data’s representation quality has an immediate im- aware and resource-tunable inference systems, which are capable pact on both aspects. This paper strives to enable resource-aware and of operating in various trade-off points between inference accuracy resource-tunable inference systems, which are capable of operating and resource usage. Such performance-tunability can be realized by in various trade-off points between inference accuracy and resource dynamically reallocating resources across sensory features in accor- usage. We present an extension to naive Bayes that is capable of dy- dance to the task relevance and complexity. Recent techniques, such namically tuning feature precision in function of incoming data qual- as feature-cost aware inference [4, 7], perform hardware-cost aware ity, difficulty of the task and resource availability. We also develop feature selection to minimize the overall resource cost of feature the heuristics that optimize this tunability. We demonstrate how this extraction. Additionally to feature-cost one can also adapt known enables much finer granularity in the resource versus inference accu- machine learning models such that they are efficiently run on em- racy trade-off space, resulting in significant resource efficiency im- bedded systems by preferring integer operators [25], considering the provements in embedded sensor fusion tasks. trade-off between number of operations and accuracy [19], reducing the precision [29] and the value range [9] of the features , or de- composing the model and distributively running the inference task 1 INTRODUCTION on different processing units [14, 16, 10]. In addition, recent efforts have attempted to integrate such machine learning models and tech- The Internet of Things (IoT) paradigm is on the rise, promising an niques under embedded hardware efficient frameworks [14]. These important contribution to tackling societal challenges through im- optimization techniques result in a fixed resource usage versus per- proved distributed sensing capabilities. This paradigm is expected to formance operating trade-off and, as a result, fail to exploit all the re- significantly impact application scenarios like e-health and domotics, source saving opportunities that the hardware platform can provide. which already benefit from the widespread availability of embed- To overcome these limitations, this paper introduces the following ded sensing devices (i.e., smartphones, activity trackers and service innovations: robots) that can reliably gather and process a massive amount of data. 1. Feature precision-tunability: Instead of only selecting or deselect- The main enabling factor of this vision is the ability to seamlessly in- ing a sensory feature, embedded platforms can also tune the pre- tegrate several technological solutions which motivates the desire to cision of sensors and sensory feature extraction (i.e. by changing run complex inference tasks in-situ and in a distributed manner. Yet, their resolution or number of bits) in return for hardware resource smart embedded devices’ processing abilities are held back by their savings through techniques such as approximate computing and limited resource availability, both in terms of energetic as well as approximate sensing. This allows them to dynamically trade-off computational resources [2]. This creates a fundamental conflict be- feature quality for resource efficiency. In this paper, we will extend tween the desire to fuse information from more and more always-on the naive Bayes fusion model to such feature-precision tunability. sensors in embedded devices, and the inability of these embedded 2. Run-time accuracy-resource-awareness: Instead of offline feature devices to process all incoming data continuously and at high preci- or precision selection, a dynamic approach should allow run-time sion. This conflict is currently circumvented by running most sensor accuracy-resource tunability. This requires the creation of a sin- fusion and sensory inference tasks in the cloud [1, 4]. Yet, this has gle fusion model, capturing all feature precision-tunability states, important consequences towards the system’s latency and the user’s which can be explored and traversed at run-time. The resulting privacy [5]. Moreover, it does not solve the excessive power con- model enables run-time adaptations of feature precision in func- sumption spent by the always-on sensors and the wireless link [8]. tion of incoming data quality, in function of the difficulty of the Efficiently running inference tasks on the devices themselves calls inference task, or in function of the instantaneous resource avail- for awareness of the real-time embedded platform’s resource limita- ability in the embedded system. tions. This is in sharp contrast with most state-of-the-art inference ap- proaches, which focus on maximizing information gain and inference We present an extension to naive Bayes that is capable of dynamic feature precision tunability, as well as heuristics to optimize this tun- 1 MICAS – Department of Electrical engineering, KU Leuven, ability. We demonstrate how this enables much finer granularity in email: Laura.Galindez@esat.kuleuven.be 2 DTAI – Department of Computer Science, KU Leuven 4 In this paper we refer to inference accuracy as the percentage of correctly 3 PMA – Department of Mechanical Engineering, KU Leuven predicted queries from a test set. 23 the resource versus inference accuracy trade-off space, resulting in structure of the naive Bayes network is shown in Figure 1. This as- significant resource efficiency improvements in embedded sensor fu- sumption allows for efficient learning and inference as it simplifies sion tasks. These performance tuning capabilities are of up most Equation 2 to relevance in data-dense and resource-constricted environments like the IoT. Therefore, we demonstrate the functionality of our proposed c = max Pr(F1 = f1 |C = c) . . . Pr(Fn = fn |C = c) Pr(C = c) c techniques in sensor fusion datasets related to applications relevant to this paradigm. by applying the rule of Bayes and conditional independence. De- The paper is structured as follows. In Section 2 we give a the- spite the independence assumption, naive Bayes classifiers perform oretical background of naive Bayes classifiers and Bayesian Net- surprisingly good. This makes them one of the most effective and works (BN) and we explain how precision tuning enables resource- efficient inductive learning algorithms [33, 6]. efficiency in the current context. Section 3 gives the details of the proposed feature precision-tunable BN and explains how parameter learning and inference are performed in it. In Section 4 we propose a 𝐶 heuristic that uses the proposed BN to select optimal operating points in the resource versus inference accuracy space and we evaluate the trade-off it achieves by performing experiments on four data corpora in Section 5. Finally, we discuss the results as well as our contribu- tions and future work in Section 6. 𝐹# 𝐹$ 𝐹% 2 BACKGROUND 2.1 Bayesian Networks Figure 1. Graphical representation of the naive Bayes classifier Bayesian Networks (BN) are directed acyclic graphs that compactly encode a joint probability distribution over a set of random vari- ables [24]. Consider a set of random variables U = {F1 , . . . , Fn } where each feature Fi may take values from a finite set Val (Fi ). A Bayesian Network for a set of random variables U is formally de- 2.3 Resource awareness and precision tuning fined as the pair B =< G, Θ >. The first component G represents Embedded hardware platforms have to operate under very scarce re- the graph which encodes conditional independence assumptions. The sources. First and foremost, their miniaturization results in very lim- nodes represent variables Fi and its arcs represent the probabilis- ited battery capabilities which motivates the quest for high energy tic dependencies between variables. The second component Θ repre- efficient designs and methodologies. Moreover, due to size, cooling sents the set of conditional probability distributions Pr(Fi |ΠFi ) that and cost restrictions, the computational bandwidth of these devices is quantify the network where ΠFi denotes the parents of Fi . extremely scarce. This has sparked an enormous amount of research The joint probability distribution defined by the network B is into adaptive hardware over the last decade. Under this paradigm, given by resource consumption can be tuned at run-time to be lower at the expense of reduced quality sensor streams or computations. This dy- n Y namic trade-off is achievable in several ways: P r(F1 , ..., Fn ) = P r(Fi |ΠFi ). (1) i=1 1. Noisy sensors: The amount of noise present in sensory measure- Inference in BNs, Pr(Q|E = e), can be performed by assign- ment strongly depends on the amount of energy spent in the sensor ing values e to variables E that are observed and by summing out front-end (in its filters, amplifiers, etc). By tolerating more statis- variables U\(Q ∪ E) that are not part of the query Q. tical noise on the measurement result, energy can be saved [15, 3]. 2. Stochastic computing: The resulting accuracy of digital compu- tations can be traded off against processing resource usage and 2.2 Bayesian Network Classifiers energy consumption by using stochastic or approximate comput- ing techniques. In stochastic computing, for example, numbers are Classification is the task of assigning a class label C to instances de- represented by bit-streams that can be processed by very simple scribed by a set of features F1 , ..., Fn . Such a task can be tackled by circuits such as standard logic gate arrays. These implementations a Bayesian network where one of the random variables, C, is consid- allow a limited amount of errors (stochastic noise) in the digital ered the class and the other random variables, F1 , . . . , Fn , represent embedded calculations in return for a more efficient implementa- the features. The task is now to find the most likely value for the class tion [21]. variable C: 3. Reduced precision sensing and computing: Instead of applying aforementioned stochastic techniques to dynamically trade fea- c = arg max Pr(C = c|F1 = f1 , . . . , Fn = fn ), (2) ture quality for resource savings, significant resource savings are c also achievable by simply limiting the amount of bits with which where c is the current class and fi is the observed value for feature sensor values are sampled and digitally processed for feature ex- Fi . traction. Standard hardware platforms digitize sensory values at A widely used type of Bayesian classifier is the naive Bayes clas- fixed precision and typically process them with 16-bit resolution. sifier [12]. The main assumption is that every feature is independent Recent works present the development precision-tunable digitiz- of the other features given that the class is known. The graphical ers, as well as digital processing platforms capable of dynamically 24 adjusting the precision with which internal computations are per- formed [23, 20]. 𝐶 Within this paper, we focus on enabling the feature quality versus Pr(𝐹$,& |𝐶) Pr(𝐹',& |𝐶) resource trade-off through the latter technique: computations with variable precision features. The results are transferable to the dis- … 𝐹$,& 𝐹',& cussed alternatives, which is left for future work. Under variable pre- cision computations, the extracted features U = {F1 , ..., Fn } are Pr(𝐹$,&)$ 𝐹$,& ) Pr(𝐹',&)$ |𝐹',& ) each computed and represented by a tunable amount of bits. The number of bits representing the feature, directly impacts the set of … values a feature Fi can take on and will be referred to as Fi,m with 𝐹$,&)$ 𝐹',&)$ m the number of bits. More specifically, when using a m-bit repre- sentation, the feature can take |Val (Fi,m )| = 2m possible values. Pr(𝐹$,&)0 |𝐹$,&)$ ) Pr(𝐹',&)0 |𝐹',&)$ ) As we are interested in studying resource efficiency in sensory ap- … … plications we construct these m-bit representations by following a signal processing quantization approach [27]. Mapping of the origi- Pr(𝐹$,* |𝐹$,$ ) Pr(𝐹',* |𝐹',$ ) nal feature to an m-bit representation is based on comparisons with … decision levels tk . If the feature value is between tk and tk+1 it gets 𝐹$,* 𝐹',* mapped to a quantization level lk , where the number of levels must be equal to 2m . In this paper, the decision levels tk are derived from Figure 2. Naive Bayes model extended with multiple feature quality the feature value range and the set of lower precision decision levels versions is a subset of the higher precision decision levels (see also Section 3). State-of-the-art implementations show that under such computa- tional precision tuning, the resource cost (here expressed in terms of 3.2 Parameter Learning energy per computational operation) scales more than quadratically with the computational precision, expressed in terms of number of bits [22]. In this paper, we will assume that the feature cost (denoted We assume that the studied features were generated by a continu- Ti,m ) of feature Fi computed with precision m-bits is equal to ous distribution [24], therefore, we model the conditional probabili- ties between features of highest precision and classes P r(Fi,m |C) as Gaussian distributions [18]. Even though the model includes Ti,m = αi · m2 . (3) n × (m + 1) parameters θ, only the conditional probabilities be- where αi is the feature-dependent cost of the nominal precision fea- tween features of highest precision and classes P r(Fi,m |C), must be ture. trained since the conditional probabilities between lower precision features P r(Fi,b |Fi,b+1 ) are deterministic. Once we have knowl- edge of the decision levels tk that generated the lower precision fea- 3 VARIABLE FEATURE PRECISION NAIVE tures, we are able to systematically add them to the tails of the pro- BAYES posed naive Bayes structure. Tuning feature precision enables a wide range of resource depen- dent operation points.We make the probabilistic relations required 3.3 Inference by classification tasks explicit in a Bayesian Network (BN) where features can be observed at different precision levels. At any given time, every feature Fi is observed at only one of the precision options bi depending on the current resource consumption desires and constraints. Given observation o = 3.1 Model Structure {f1,b1 , f2,b2 , ..., fn,bn }, classification is performed by estimating the class posterior probability given by The proposed BN represents a naive Bayes classifier that has multiple versions of the same feature in each leaf, as shown by Figure 2. Feature versions with the highest precision (Fi,m ) are directly n Y linked to the class variable C. There are no direct links between P r(C|o) ∼ P r(f i, bi |C) · P r(C). (5) lower precision feature versions Fi \ Fi,m and the class variable i=1 since the relation between these versions is deterministic. The pro- posed structure encodes the following joint probability distribution This implies that, for every observed feature, its lower precision over the multiple feature version sets Fi = {Fi,m , . . . , Fi,0 } and the versions are not observed, while their higher precision versions are class variable C marginalized. Consider the example depicted in Figure 3 which may correspond to a robot navigation application, as discussed in Section 5.2. Sup- P r(C, F1,m , ..., Fn,0 ) = pose we obtain sensor readings at 8 bit, 4 bit and 2 bit for sensors n m−1 Y Y S1, S2 and S3, respectively and we decide to turn off sensor S4. P r(Fi,b |Fi,b+1 ) · P r(Fi,m |C) · P r(C). (4) Here, we can estimate the class posterior probability (which can be a i=1 b=0 location, for example) with the following equation 25 neighborhood search in our heuristic ensures resource reduction, and P r(C|S1,8b , S2,4b , S3,2b ) ∼ the cost function further motivates it by explicitly trading off the two terms. P r(S1,8b |C) · X Two of this algorithm’s aspects distinguish it from conventional P r(S2,4b |S2,8b )P r(S2,8b |C) · state-of-the-art feature selection techniques [7, 28, 31]: 1) We merge S2,8b accuracy gain and resource usage in a joint cost optimization, hence X X taking hardware implementation aspects into account from the algo- P r(S3,2b |S3,4b ) · P r(S3,4b |S3,8b ) · P r(S3,8b |C) · S3,4b S3,8b rithmic level. 2) In contrast to cost-aware feature selection techniques which decide whether to use a feature or not, we enable the selection P r(C), (6) of a variety of feature precision combinations. and predict the class c ∈ C with the highest posterior. Algorithm 1 details the method. We initialize the selected fea- ture set to the highest precision feature combination Uselected = {F1,m , ..., Fn,m } . At each iteration, we perform a greedy neigh- borhood search over n feature combination candidates. In each can- 𝑪 didate i, the precision of feature Fi is dropped one level with re- spect to the current precision. We evaluate the classification accu- racy and resource usage of each candidate and select the one that maximizes the cost function CF . The procedure is repeated un- 𝑺𝟏,𝟖𝒃 𝑆$,&' 𝑆(,&' 𝑆),&' til the feature combination with the lowest precision is selected (Uselected = {F1,0 , ..., Fn,0 }). Note that the algorithm is able to per- form feature pruning if a ”null precision” leaf is added to the Naive Bayes model (see Figure 3 for an example). Classification accuracy is computed by estimating the posterior 𝑆0,)' 𝑺𝟐,𝟒𝒃 𝑆(,)' 𝑆),)' probability P r(C|k) of every instance k from a testing data-set Utest and comparing the prediction to to the instance’s label (see Algo- rithm 2). 𝑆0,$' 𝑆$,$' 𝑺𝟑,𝟐𝒃 𝑆),$' 5 EXPERIMENTS We evaluate the resource-accuracy trade-off achieved by our pro- posal with one synthetic dataset and three data corpora from two real 𝑆0,2' 𝑆$,2' 𝑆(,2' 𝑺𝟒,𝟎𝒃 applications relevant to the IoT paradigm. 5.1 Synthetic Data Figure 3. Example of a four feature application where each of them is observed at a different precision (circles with bold edges). Features with This dataset consists of 2000 points sampled from 4 Gaussians higher precision than the observed are marginalized (circles with black −1.66 edges) and versions with lower precision are not observed (circles with gray N (mi , σ i ), i = {1, 2, 3, 4}, where m1 = −0.33 −0.33 , m2 = edges). Note that feature 4 is observed with ”0 bit” precision, which is 1.00 3.33 −1.66 −2.00 0.80 equivalent to pruning it. −1.43 0.5 1.00 , m 3 = 2.00 0.5 , m 4 = −0.66 , σ 1 = 1.00 , σ 2 = 1.00 1.00 0.70 0.5 1.00 1.00 −3.33 1.00 1.00 , σ 3 = and σ 4 = 1.00 . The Gaussians are de- 1.00 1.00 1.00 1.00 1.00 1.00 1.00 4 ACCURACY-RESOURCE TUNABILITY fined to have different degrees of overlap in every dimension and have therefore a varying miss-classification risk for different feature The proposed model enables multiple resource and accuracy de- combinations. We quantize the data-set at 5, 3, 2 and 1 bits and ran- pendent operating points. In this paper we analyze the trade-off in- domly divide it into a training and a testing set (used for model train- duced by the available feature combination choices and we propose a ing and accuracy estimation, respectively). We compute the resource methodology to find the optimal operating points given the system’s usage with Equation 3, as included in Table 1 . To assign the vari- resource constraints. able αi , we assume that features that are less likely to cause miss- We propose an accuracy-resource sensitive algorithm that selects classification would be more expensive to extract and compute in a the optimal feature precision across the accuracy-resource usage real application. Thus giving a higher value to them. We add a ”null trade-off space. At each iteration, we select the feature set that op- precision” leaf, to enable feature pruning as shown in the example timizes a cost function CF , which is defined according to the de- depicted by Figure 3. sired application and the constraints thereto [13, 17]. In this paper Figure 4 shows the resource vs accuracy trade-off curve achieved we maximize the cost function given by by the proposed algorithm and achieved by a typical resource-aware heuristic 5 in red and blue, respectively. The gray point cloud repre- ! ∆resource sents all the possible accuracy-resource trade-off operational points CF = log − log(∆accuracy), (7) to select from. The proposed heuristic has a richer feature combina- max(resource) tion space to select from, which prevents accuracy degradation for a where the term ∆ refers to the predicted state difference between 5 The typical resource-aware heuristic considers only features at the highest time k and time k+1 as will be detailed in Algorithm 1. The greedy precision Fi,m and decides whether to prune them by maximizing CF . 26 resource usage scale-down of up to 20 times. The non-tunable pre- cision heuristic has comparatively very few feature combination op- Algorithm 1: Feature precision selection algorithm for accuracy- tions to select from, which leads, in contrast, to a maximum resource resource tradeoff scaling of approximately 2 times without accuracy degradation. 1 Feature precision selection (Utest , Ti,m , Θ, C, CF ); Input : Utest , Ti,m , Θ, C, CF 90 Accuracy-resource tradeoff on the synthetic dataset Output: Selected feature set Uselectedk = {F1,b1 , ..., Fn,bn } 2 k=0; 80 /* Initialize with the highest precision feature set */ 70 Accuracy percentage 3 Uselectedk = {F1,m , ..., Fn,m } 4 accuracyk ← AccuracyEvaluation(Θ,C, Uselectedk ) 60 5 resourcek ← Ti,bi ∀ Fi,bi ∈ Uselectedk 50 /* while the lowest feature precision has not been selected */ 6 while Uselectedk 6= {F1,0 , ..., Fn,0 } 40 7 do 8 for i = 1 to n // For each candidate combination 30 Precision-tunable resource-aware selection 9 do Resource-aware selection /* drop Fi ’s precision one level */ 20 0 50 100 150 200 250 300 350 400 10 Ucandidatei ← Uselectedk \ Fi,bi ∨ {Fi,bi −1 } Resource consumption 11 accuracycandidatei ← AccuracyEvaluation(Θ, C, Figure 4. Algorithm performance comparison on the synthetic dataset. Ucandidatei ); 12 resourcecandidatei ← Ti,bcanditatei ∀ Fi,bcandidatei ∈ Ucandidatei ; 13 ∆accuracycandidatei = accuracyk − accuracycandidatei ; 14 ∆resourcecandidatei = 5.2 Real Datasests resourcek − resourcecandidatei ; We analyze three sensor based applications that benefit from feature 15 end precision tuning. The first is a robot navigation task. The second and 16 update k=k+1; third dataset are activity recognition tasks. 17 Uselectedk ← argmin CF(∆accuracycandidate ,∆resourcecandidate ); U∈Ucandidate Wall-Following Robot Navigation We analyze a public domain 18 update accuracyk ← AccuracyEvaluation(Θ,C, dataset that was collected as a mobile robot navigates through a room Uselectedk ) following the wall in a clockwise direction, for 4 rounds, using 4 19 update resourcek ← Ti,bi ∀ Fi,bi ∈ Uselectedk ultrasound sensors positioned on the front, left, right and back of its Return: Uselectedk body [11]. Four states can be identified from the sensor readings: 20 end Move-Forward, Slight-Right-Turn, Sharp-Right-Turn or Slight-Left- Turn. The data-set has a precision of 8 bits and we further quantize it at 5, 2 and 1 bits. We assume the four sensors have the same hardware properties, so we set the variable αi equal to one for all of them and use Equation 3 to generate the resources for the experiments, as shown in Table 1. Furthermore, we add a random number between 0 Algorithm 2: Classification accuracy evaluation algorithm and 5 to each cost to simulate non ideal performance conditions. Figure 5 shows the cost-accuracy trade-off achieved by the pro- 1 AccuracyEvaluation (Θ,C,U); posed precision-tunable heuristic and the trade-off achieved by a cost Input : Θ,C,U aware method in red and blue, respectively. The gray crosses repre- Output: accuracy sent all the possible operation points to choose from. Both heuris- 2 correct = 0; tics display negligible accuracy degradation when their resource con- 3 for k = 1 to N // With N the number of instances in the testing set sumption is scaled down a factor 2 (from 400 to 200). The slight ac- 4 do /* For each class we approximate the posterior probability given the curacy gain achieved by the non-tunable heuristic can be due to the instance currently analyzed */ discretization related improvements discussed in [32] and [29] . For a 5 P r(C|k) ← P r(k|C) · P r(C) factor 4 resource scaling (from 400 to 100) the non-tunable heuristic /* We predict the class with the highest posterior probability */ has already pruned 3 out of its four features which causes the accu- 6 cmaxk = argmaxP r(C|k) racy to degrade from 90% to 75%. The precision-tunable heuristic c∈C keeps observing all features, yet reduces their precision which also 7 if cmaxk == ck then correct=correct+1; produces a factor 4 resource consumption downscale but no accuracy 8 end degradation. 9 update accuracy ← correct ÷ N ; Return: accuracy USC-HAD This dataset was designed as a benchmark for Human Activity Detection (HAD) algorithm comparisons [34]. It was col- lected by an Inertial Measurement Unit (IMU) placed in the subjects’ 27 Accuracy-resource tradeoff on the Robot Navigation dataset Accuracy-resource tradeoff on the USC-HAD dataset 100 90 90 80 80 Accuracy percentage 70 Accuracy percentage 70 60 60 50 50 40 40 Precision-tunable resource-aware selection 30 Resource-aware selection Precision-tunable resource-aware selection 30 Resource-aware selection 0 50 100 150 200 250 300 350 400 20 Resource consumption 0 500 1000 1500 2000 Resource consumption Figure 5. Performance comparison in the robot navigation application Figure 6. Trade-off comparison on the Human Activity Detection dataset with gyroscope and accelerometer. hip consisting of a 3-axis accelerometer and a 3-axis gyroscope and it contains measurements for the identification of 12 different low- leaf that enables feature pruning.The resource parameters used in this level daily activities. In accordance to previously performed Activity experiment are listed in 1. Recognition analyses [7, 26], the activities that can be best classified Figure 7 shows the results from this experiment with the same with naive Bayes and that are therefore used in this experiment are color coding as previous. The precision-tunable approach’s perfor- Walking-forward, Running-Forward, Sitting and Sleeping. mance is superior, as it achieves up to 12x resource savings (from We tuned the dataset’s precision from the original 8 bits to 5,4,3,2 620 to 50) for a maximum accuracy degradation of 4% (from 80% to and 1 bits. For resource assignment, we consider that the power con- 76%). The non-tunable strategy displays accuracy losses of less than sumption of a gyroscope can be up to 10 times that of an accelerom- 5% up to a resource consumption scaling of 3x (620 to 200). For any eter [34] so we set the corresponding αi variables to 1 and 10, re- resource down scaling larger than that, the accuracy degrades more spectively, and use Equation 3 to calculate the resource consumption. than 10%. Like in the previous experiment, we add a random number between Accuracy-resource tradeoff on the HAR-RIO dataset 0 and 5 to simulate non ideal behavior. The resource consumption 90 assignments for this experiment are detailed in Table 1. Again, we enable feature pruning through the addition of the 0 bit leaf to he 80 model. Accuracy percentage Figure 6 shows the cost-accuracy trade-off curves achieved by the 70 precision-tunable and the cost-aware only heuristics in red and blue, respectively. The possible operating points are represented by gray 60 crosses. For a resource consumption downscale of 2.5 (from 2130 to 845), the non-precision tunable heuristic suffers from an accuracy 50 degradation of 6% (88% to 82%), while there is no accuracy reduc- tion with the precision-tunable method. Although the 6% accuracy 40 loss/2x cost saving of the non-tunable strategy could be acceptable Precision-tunable resource-aware selection in some situations, it is worth noting the limitations imposed by the Resource-aware selection 30 available number of operating points. In addition to the 2.5x re- 0 100 200 300 400 500 600 700 source downscale, only a 30x reduction (from 2130 to 65) is pos- Resource consumption sible at the expense of accuracy degrading from 88% to 61%. The Figure 7. Trade-off comparison on the Human Activity Recognition precision-tunable strategy has, in contrast, the possibility to choose dataset with accelerometers. from approximately 26 operation points, with up to 6x resource sav- ings before accuracy is lower than 80%. HAR-RIO In this dataset, 5 activities (Sitting-Down, Standing- 6 CONCLUSIONS AND DISCUSSION Up, Standing, Walking, and Sitting) can be identified from 8 hours of recordings performed by 4 accelerometers positioned in the waist, Our main contribution in this paper was to enable efficient embedded left thigh, right ankle and right arm of 4 healthy subjects [30]. The sensor fusion through a resource-aware naive Bayes model, capable accelerometers are tri-axial which results in a total number of 12 of exploiting variable precision features. By encapsulating various features; {xi , yi , zi }, i = {1, 2, 3, 4}. For the experiments in this precision features within the model structure, we enable the possibil- paper, 9 of those features were selected in accordance to previ- ity to dynamically tune resource consumption and inference accuracy ously performed classification algorithm benchmarking [30], namely according to the circumstances and available resources. We propose {y1 , z1 , x2 , y2 , z2 , x3 , y3 , z4 , y4 , z4 }. The dataset’s precision is 8 an algorithm that finds optimal operating points by reducing resource bits, we quantize it at 4,3,2,1 bits and we add the ”null precision” consumption and minimizing accuracy degradation. 28 tend for voice activity detection’, Solid-State Circuits, IEEE Journal of, Table 1. Feature resources used for experiments 51(1), 291–302, (2016). [4] F. H. Bijarbooneh, W. Du, E. C. H. Ngai, X. Fu, and J. Liu, ‘Cloud- m bits feature precision assisted data fusion and sensor selection for internet of things’, IEEE Internet of Things Journal, 3(3), 257–268, (June 2016). Dataset α 10 8 5 4 3 2 1 [5] Azzedine Boukerche, Antonio AF Loureiro, Eduardo F Nakamura, Ho- Synthetic racio ABF Oliveira, Heitor S Ramos, and Leandro A Villas, ‘Cloud- Feat. 1 1 100 - 25 - 9 4 1 assisted computing for event-driven mobile services’, Mobile Networks Feat. 2 0.7 70 - 17.5 - 6.3 2.8 0.7 and Applications, 19(2), 161–170, (2014). Feat. 3 0.9 90 - 22.5 - 8.1 3.6 0.9 [6] Jie Cheng and Russell Greiner, ‘Comparing Bayesian network classi- Feat. 4 0.8 80 - 20 - 7.2 3.2 0.8 fiers’, in Proceedings of the Fifteenth conference on Uncertainty in ar- tificial intelligence, pp. 101–108. Morgan Kaufmann Publishers Inc., Robot 1 100 - 25 - - 4 1 (1999). [7] Jian Cui and Bin Xu, ‘Cost-effective activity recognition on mo- USC-HAD bile devices’, in Proceedings of the 8th International Conference on Accel. 1 - 64 25 16 9 4 1 Body Area Networks, BodyNets ’13, pp. 90–96, ICST, Brussels, Bel- Gyro. 10 - 640 250 160 90 40 10 gium, Belgium, (2013). ICST (Institute for Computer Sciences, Social- HAR-RIO 1 - 64 25 16 9 4 1 Informatics and Telecommunications Engineering). [8] Artem Dementyev, Steve Hodges, Stephen Taylor, and Johan Smith, ‘Power consumption analysis of bluetooth low energy, zigbee and ant sensor nodes in a cyclic sleep scenario’, in Wireless Symposium (IWS), 2013 IEEE International, pp. 1–4. IEEE, (2013). We have compared our scheme with a state-of-the-art resource- [9] K. Frank, P. Robertson, S. Fortes Rodriguez, and R. Barco Moreno, aware feature selection technique and we conclude that overall our ‘Faster bayesian context inference by using dynamic value ranges’, scheme has better cost saving capabilities due to the rich variety of in Pervasive Computing and Communications Workshops (PERCOM Workshops), 2010 8th IEEE International Conference on, pp. 50–55, operational points it can choose from. We tested one artificial and (2010). three public domain data corpora with the proposed methodology. [10] K. Frank, M. Rckl, and P. Robertson, ‘The bayeslet concept for modu- Accuracy degradation was prevented while achieving resource usage lar context inference’, in Mobile Ubiquitous Computing, Systems, Ser- scalings of 20x for the synthetic dataset, 4x for the Robot Naviga- vices and Technologies, 2008. UBICOMM ’08. The Second Interna- tional Conference on, pp. 96–101, (Sept 2008). tion application, 6x for the Human Activity Detection application [11] A. L. Freire, G. A. Barreto, M. Veloso, and A. T. Varela, ‘Short-term with accelerometers and gyroscopes, and 12x for the Human Ac- memory mechanisms in neural network learning of robot navigation tivity Recognition application with accelerometers. The non-tunable tasks: A case study’, in Robotics Symposium (LARS), 2009 6th Latin precision heuristic achieved, in comparison, a resource scaling of 2x American, pp. 1–6, (Oct 2009). for the synthetic dataset, 2x for the Robot Navigation application, [12] Nir Friedman, Dan Geiger, and Moises Goldszmidt, ‘Bayesian network classifiers’, Journal of Machine Learning, 29(2), 131–163, (1997). 2.5x for the Human Activity Detection application with accelerom- [13] Isabelle Guyon and André Elisseeff, ‘An introduction to variable and eters and gyroscopes, and 3x for the Human Activity Recognition feature selection’, J. Mach. Learn. Res., 3, 1157–1182, (March 2003). application with accelerometers. [14] Josué Iglesias, Ana M. Bernardos, Paula Tarrı́o, José R. Casar, and There are many ways in which feature quality tuning can improve Henar Martı́n, ‘Design and validation of a light inference system to support embedded context reasoning’, Personal and Ubiquitous Com- hardware resource efficiency. We proved this concept by tuning fea- puting, 16(7), 781–797, (2012). ture precision but the next step in our work will be to extend the [15] Peter Kinget and Michiel Steyaert, ‘Impact of transistor mismatch on proposed method to other quality tuning paradigms beyond preci- the speed-accuracy-power trade-off of analog CMOS circuits.’, in Pro- sion tunability such as varying levels of noisy sensing. This will ceedings of the IEEE Custom Integrated Circuits Conference, pp. 333– potentially require the modification of the proposed multiple-level 336, (1996). [16] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qen- Bayesian Network as the relationship between nodes of different dro, and F. Kawsar, ‘Deepx: A software accelerator for low-power deep qualities will not be deterministic anymore. Furthermore, we will learning inference on mobile devices’, in 2016 15th ACM/IEEE In- explore more complex structures for applications that are not mod- ternational Conference on Information Processing in Sensor Networks eled with sufficient accuracy under the naive Bayes independence (IPSN), pp. 1–12, (April 2016). [17] Steven Lauwereins, Wannes Meert, Jort Gemmeke, and Marian Ver- assumption. helst, ‘Ultra-low-power voice-activity-detector through context- and The long term goal is to integrate the optimal feature precision se- resource-cost-aware feature selection in decision trees’, in 2014 IEEE lection scheme in an embedded sensory application, where the sys- International Workshop on Machine Learning for Signal Processing tem dynamically and autonomously selects features and their pre- (MLSP), pp. 1–6, (Sept 2014). cision given the current state of the hardware devices with limited [18] Jonathan Lester, Tanzeem Choudhury, Nicky Kern, Gaetano Borriello, and Blake Hannaford, ‘A hybrid discriminative/generative approach for computational overhead. This scheme could enable the seamless in- modeling human activities.’, in IJCAI, volume 5, pp. 766–772, (2005). tegration of sensory based algorithms into smart environments which [19] Daniel Lowd and Pedro Domingos, ‘Learning arithmetic circuits’, is one of the elements envisioned for the IoT. arXiv preprint arXiv:1206.3271, (2012). [20] Bert Moons, Bert De Brabandere, Luc Van Gool, and Marian Verhelst, ‘Energy-efficient convnets through approximate computing’, in 2016 REFERENCES IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE, (2016). [1] Rajesh Arumugam, Vikas Reddy Enti, Liu Bingbing, Wu Xiaojun, Kr- [21] Bert Moons and Marian Verhelst, ‘Energy and accuracy in multi-stage ishnamoorthy Baskaran, Foong Foo Kong, A Senthil Kumar, Kang Dee stochastic computing’, in New Circuits and Systems Conference (NEW- Meng, and Goh Wai Kit, ‘Davinci: A cloud computing framework for CAS), 2014 IEEE 12th International, pp. 197–200. IEEE, (2014). service robots’, in Robotics and Automation (ICRA), 2010 IEEE Inter- [22] Bert Moons and Marian Verhelst, ‘Dvas: Dynamic voltage accuracy national Conference on, pp. 3084–3089. IEEE, (2010). scaling for increased energy-efficiency in approximate computing’, in [2] Luigi Atzori, Antonio Iera, and Giacomo Morabito, ‘The internet of Low Power Electronics and Design (ISLPED), 2015 IEEE/ACM Inter- things: A survey’, Computer networks, 54(15), 2787–2805, (2010). national Symposium on, pp. 237–242. IEEE, (2015). [3] Komail MH Badami, Steven Lauwereins, Wannes Meert, and Marian [23] Bert Moons and Marian Verhelst, ‘A 40nm CMOS, 35mW to 270mW, Verhelst, ‘A 90 nm CMOS, power-proportional acoustic sensing fron- 29 precision-scalable cnn processor for run-time scalable embedded vi- sion’, in IEEE VLSI Symposium. IEEE, (2016). [24] Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. [25] Nico Piatkowski, Sangkyun Lee, and Katharina Morik, ‘Integer undi- rected graphical models for resource-constrained systems’, Neurocom- puting, 173, 9–23, (2016). [26] Olga Politi, Iosif Mporas, and Vasileios Megalooikonomou, ‘Human motion detection in daily activity tasks using wearable sensors’, in Sig- nal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp. 2315–2319. IEEE, (2014). [27] John G. Proakis and Dimitris K Manolakis, Digital Signal Processing (4th Edition), Pearson, 2006. [28] Yvan Saeys, Iaki Inza, and Pedro Larraaga, ‘A review of feature selec- tion techniques in bioinformatics’, Bioinformatics, 23(19), 2507–2517, (2007). [29] Sebastian Tschiatschek and Franz Pernkopf, ‘On Bayesian network classifiers with reduced precision parameters’, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(4), 774–785, (2015). [30] Wallace Ugulino, Débora Cardador, Katia Vega, Eduardo Velloso, Ruy Milidiú, and Hugo Fuks, ‘Wearable computing: Accelerometers data classification of body postures and movements’, in Advances in Artifi- cial Intelligence-SBIA 2012, 52–61, Springer, (2012). [31] Zhixiang Xu, Matt J Kusner, Kilian Q Weinberger, Minmin Chen, and Olivier Chapelle, ‘Classifier cascades and trees for minimizing feature evaluation cost’, The Journal of Machine Learning Research, 15(1), 2113–2144, (2014). [32] Ying Yang and Geoffrey I Webb, ‘Proportional k-interval discretization for naive-Bayes classifiers’, in ECML, pp. 564–575. Springer, (2001). [33] Harry Zhang, ‘The optimality of naive Bayes’, in Proceedings of FLAIRS, (2004). [34] Mi Zhang and Alexander A. Sawchuk, ‘USC-HAD: A daily activity dataset for ubiquitous activity recognition using wearable sensors’, in ACM International Conference on Ubiquitous Computing (Ubicomp) Workshop on Situation, Activity and Goal Awareness (SAGAware), Pittsburgh, Pennsylvania, USA, (September 2012). 30