=Paper=
{{Paper
|id=Vol-1724/paper4
|storemode=property
|title=Extending Naive Bayes with Precision-tunable Feature Variables for Resource-efficient Sensor Fusion
|pdfUrl=https://ceur-ws.org/Vol-1724/paper4.pdf
|volume=Vol-1724
|authors=Laura Isabel Galindez Olascoaga,Wannes Meert,Herman Bruyninckx,Marian Verhelst
|dblpUrl=https://dblp.org/rec/conf/ecai/OlascoagaMBV16
}}
==Extending Naive Bayes with Precision-tunable Feature Variables for Resource-efficient Sensor Fusion==
<pdf width="1500px">https://ceur-ws.org/Vol-1724/paper4.pdf</pdf>
<pre>
      Extending naive Bayes with precision-tunable feature
          variables for resource-efficient sensor fusion
    Laura Isabel Galindez Olascoaga1 and Wannes Meert2 and Herman Bruyninckx3 and Marian Verhelst1


Abstract. Resource-constrained ubiquitous sensing devices suf-              accuracy 4 without taking the actual hardware footprint of online in-
fer from the fundamental conflict between their limited hardware            ference into account. To facilitate effective sensory fusion inference
resources and the desire to continuously process all incoming sen-          tasks inside embedded devices, this paper strives to enable resource-
sory data. The data’s representation quality has an immediate im-           aware and resource-tunable inference systems, which are capable
pact on both aspects. This paper strives to enable resource-aware and       of operating in various trade-off points between inference accuracy
resource-tunable inference systems, which are capable of operating          and resource usage. Such performance-tunability can be realized by
in various trade-off points between inference accuracy and resource         dynamically reallocating resources across sensory features in accor-
usage. We present an extension to naive Bayes that is capable of dy-        dance to the task relevance and complexity. Recent techniques, such
namically tuning feature precision in function of incoming data qual-       as feature-cost aware inference [4, 7], perform hardware-cost aware
ity, difficulty of the task and resource availability. We also develop      feature selection to minimize the overall resource cost of feature
the heuristics that optimize this tunability. We demonstrate how this       extraction. Additionally to feature-cost one can also adapt known
enables much finer granularity in the resource versus inference accu-       machine learning models such that they are efficiently run on em-
racy trade-off space, resulting in significant resource efficiency im-      bedded systems by preferring integer operators [25], considering the
provements in embedded sensor fusion tasks.                                 trade-off between number of operations and accuracy [19], reducing
                                                                            the precision [29] and the value range [9] of the features , or de-
                                                                            composing the model and distributively running the inference task
1 INTRODUCTION                                                              on different processing units [14, 16, 10]. In addition, recent efforts
                                                                            have attempted to integrate such machine learning models and tech-
The Internet of Things (IoT) paradigm is on the rise, promising an          niques under embedded hardware efficient frameworks [14]. These
important contribution to tackling societal challenges through im-          optimization techniques result in a fixed resource usage versus per-
proved distributed sensing capabilities. This paradigm is expected to       formance operating trade-off and, as a result, fail to exploit all the re-
significantly impact application scenarios like e-health and domotics,      source saving opportunities that the hardware platform can provide.
which already benefit from the widespread availability of embed-            To overcome these limitations, this paper introduces the following
ded sensing devices (i.e., smartphones, activity trackers and service       innovations:
robots) that can reliably gather and process a massive amount of data.
                                                                            1. Feature precision-tunability: Instead of only selecting or deselect-
The main enabling factor of this vision is the ability to seamlessly in-
                                                                               ing a sensory feature, embedded platforms can also tune the pre-
tegrate several technological solutions which motivates the desire to
                                                                               cision of sensors and sensory feature extraction (i.e. by changing
run complex inference tasks in-situ and in a distributed manner. Yet,
                                                                               their resolution or number of bits) in return for hardware resource
smart embedded devices’ processing abilities are held back by their
                                                                               savings through techniques such as approximate computing and
limited resource availability, both in terms of energetic as well as
                                                                               approximate sensing. This allows them to dynamically trade-off
computational resources [2]. This creates a fundamental conflict be-
                                                                               feature quality for resource efficiency. In this paper, we will extend
tween the desire to fuse information from more and more always-on
                                                                               the naive Bayes fusion model to such feature-precision tunability.
sensors in embedded devices, and the inability of these embedded
                                                                            2. Run-time accuracy-resource-awareness: Instead of offline feature
devices to process all incoming data continuously and at high preci-
                                                                               or precision selection, a dynamic approach should allow run-time
sion. This conflict is currently circumvented by running most sensor
                                                                               accuracy-resource tunability. This requires the creation of a sin-
fusion and sensory inference tasks in the cloud [1, 4]. Yet, this has
                                                                               gle fusion model, capturing all feature precision-tunability states,
important consequences towards the system’s latency and the user’s
                                                                               which can be explored and traversed at run-time. The resulting
privacy [5]. Moreover, it does not solve the excessive power con-
                                                                               model enables run-time adaptations of feature precision in func-
sumption spent by the always-on sensors and the wireless link [8].
                                                                               tion of incoming data quality, in function of the difficulty of the
   Efficiently running inference tasks on the devices themselves calls
                                                                               inference task, or in function of the instantaneous resource avail-
for awareness of the real-time embedded platform’s resource limita-
                                                                               ability in the embedded system.
tions. This is in sharp contrast with most state-of-the-art inference ap-
proaches, which focus on maximizing information gain and inference          We present an extension to naive Bayes that is capable of dynamic
                                                                            feature precision tunability, as well as heuristics to optimize this tun-
1    MICAS – Department of Electrical engineering, KU Leuven,               ability. We demonstrate how this enables much finer granularity in
    email: Laura.Galindez@esat.kuleuven.be
2 DTAI – Department of Computer Science, KU Leuven                          4 In this paper we refer to inference accuracy as the percentage of correctly
3 PMA – Department of Mechanical Engineering, KU Leuven                       predicted queries from a test set.


                                                                            23
the resource versus inference accuracy trade-off space, resulting in         structure of the naive Bayes network is shown in Figure 1. This as-
significant resource efficiency improvements in embedded sensor fu-          sumption allows for efficient learning and inference as it simplifies
sion tasks. These performance tuning capabilities are of up most             Equation 2 to
relevance in data-dense and resource-constricted environments like
the IoT. Therefore, we demonstrate the functionality of our proposed          c = max Pr(F1 = f1 |C = c) . . . Pr(Fn = fn |C = c) Pr(C = c)
                                                                                     c
techniques in sensor fusion datasets related to applications relevant
to this paradigm.                                                            by applying the rule of Bayes and conditional independence. De-
   The paper is structured as follows. In Section 2 we give a the-           spite the independence assumption, naive Bayes classifiers perform
oretical background of naive Bayes classifiers and Bayesian Net-             surprisingly good. This makes them one of the most effective and
works (BN) and we explain how precision tuning enables resource-             efficient inductive learning algorithms [33, 6].
efficiency in the current context. Section 3 gives the details of the
proposed feature precision-tunable BN and explains how parameter
learning and inference are performed in it. In Section 4 we propose a                                            𝐶
heuristic that uses the proposed BN to select optimal operating points
in the resource versus inference accuracy space and we evaluate the
trade-off it achieves by performing experiments on four data corpora
in Section 5. Finally, we discuss the results as well as our contribu-
tions and future work in Section 6.

                                                                                                𝐹#               𝐹$               𝐹%
2 BACKGROUND
2.1    Bayesian Networks                                                           Figure 1. Graphical representation of the naive Bayes classifier

Bayesian Networks (BN) are directed acyclic graphs that compactly
encode a joint probability distribution over a set of random vari-
ables [24]. Consider a set of random variables U = {F1 , . . . , Fn }
where each feature Fi may take values from a finite set Val (Fi ). A
Bayesian Network for a set of random variables U is formally de-
                                                                             2.3     Resource awareness and precision tuning
fined as the pair B =< G, Θ >. The first component G represents              Embedded hardware platforms have to operate under very scarce re-
the graph which encodes conditional independence assumptions. The            sources. First and foremost, their miniaturization results in very lim-
nodes represent variables Fi and its arcs represent the probabilis-          ited battery capabilities which motivates the quest for high energy
tic dependencies between variables. The second component Θ repre-            efficient designs and methodologies. Moreover, due to size, cooling
sents the set of conditional probability distributions Pr(Fi |ΠFi ) that     and cost restrictions, the computational bandwidth of these devices is
quantify the network where ΠFi denotes the parents of Fi .                   extremely scarce. This has sparked an enormous amount of research
   The joint probability distribution defined by the network B is            into adaptive hardware over the last decade. Under this paradigm,
given by                                                                     resource consumption can be tuned at run-time to be lower at the
                                                                             expense of reduced quality sensor streams or computations. This dy-
                                        n
                                        Y                                    namic trade-off is achievable in several ways:
                 P r(F1 , ..., Fn ) =         P r(Fi |ΠFi ).          (1)
                                        i=1
                                                                             1. Noisy sensors: The amount of noise present in sensory measure-
  Inference in BNs, Pr(Q|E = e), can be performed by assign-                    ment strongly depends on the amount of energy spent in the sensor
ing values e to variables E that are observed and by summing out                front-end (in its filters, amplifiers, etc). By tolerating more statis-
variables U\(Q ∪ E) that are not part of the query Q.                           tical noise on the measurement result, energy can be saved [15, 3].
                                                                             2. Stochastic computing: The resulting accuracy of digital compu-
                                                                                tations can be traded off against processing resource usage and
2.2    Bayesian Network Classifiers                                             energy consumption by using stochastic or approximate comput-
                                                                                ing techniques. In stochastic computing, for example, numbers are
Classification is the task of assigning a class label C to instances de-        represented by bit-streams that can be processed by very simple
scribed by a set of features F1 , ..., Fn . Such a task can be tackled by       circuits such as standard logic gate arrays. These implementations
a Bayesian network where one of the random variables, C, is consid-             allow a limited amount of errors (stochastic noise) in the digital
ered the class and the other random variables, F1 , . . . , Fn , represent      embedded calculations in return for a more efficient implementa-
the features. The task is now to find the most likely value for the class       tion [21].
variable C:                                                                  3. Reduced precision sensing and computing: Instead of applying
                                                                                aforementioned stochastic techniques to dynamically trade fea-
         c = arg max Pr(C = c|F1 = f1 , . . . , Fn = fn ),            (2)       ture quality for resource savings, significant resource savings are
                   c
                                                                                also achievable by simply limiting the amount of bits with which
where c is the current class and fi is the observed value for feature           sensor values are sampled and digitally processed for feature ex-
Fi .                                                                            traction. Standard hardware platforms digitize sensory values at
   A widely used type of Bayesian classifier is the naive Bayes clas-           fixed precision and typically process them with 16-bit resolution.
sifier [12]. The main assumption is that every feature is independent           Recent works present the development precision-tunable digitiz-
of the other features given that the class is known. The graphical              ers, as well as digital processing platforms capable of dynamically


                                                                             24
   adjusting the precision with which internal computations are per-
   formed [23, 20].
                                                                                                                       𝐶
   Within this paper, we focus on enabling the feature quality versus                              Pr(𝐹$,& |𝐶)                 Pr(𝐹',& |𝐶)
resource trade-off through the latter technique: computations with
variable precision features. The results are transferable to the dis-                                                  …
                                                                                                    𝐹$,&                         𝐹',&
cussed alternatives, which is left for future work. Under variable pre-
cision computations, the extracted features U = {F1 , ..., Fn } are
                                                                                      Pr(𝐹$,&)$ 𝐹$,& )                               Pr(𝐹',&)$ |𝐹',& )
each computed and represented by a tunable amount of bits. The
number of bits representing the feature, directly impacts the set of                                                    …
values a feature Fi can take on and will be referred to as Fi,m with                               𝐹$,&)$                       𝐹',&)$
m the number of bits. More specifically, when using a m-bit repre-
sentation, the feature can take |Val (Fi,m )| = 2m possible values.                Pr(𝐹$,&)0 |𝐹$,&)$ )                               Pr(𝐹',&)0 |𝐹',&)$ )
   As we are interested in studying resource efficiency in sensory ap-                                 …                           …
plications we construct these m-bit representations by following a
signal processing quantization approach [27]. Mapping of the origi-                      Pr(𝐹$,* |𝐹$,$ )                             Pr(𝐹',* |𝐹',$ )
nal feature to an m-bit representation is based on comparisons with                                                    …
decision levels tk . If the feature value is between tk and tk+1 it gets                             𝐹$,*                        𝐹',*
mapped to a quantization level lk , where the number of levels must
be equal to 2m . In this paper, the decision levels tk are derived from
                                                                                  Figure 2. Naive Bayes model extended with multiple feature quality
the feature value range and the set of lower precision decision levels
                                                                                                            versions
is a subset of the higher precision decision levels (see also Section 3).
   State-of-the-art implementations show that under such computa-
tional precision tuning, the resource cost (here expressed in terms of
                                                                             3.2      Parameter Learning
energy per computational operation) scales more than quadratically
with the computational precision, expressed in terms of number of
bits [22]. In this paper, we will assume that the feature cost (denoted      We assume that the studied features were generated by a continu-
Ti,m ) of feature Fi computed with precision m-bits is equal to              ous distribution [24], therefore, we model the conditional probabili-
                                                                             ties between features of highest precision and classes P r(Fi,m |C)
                                                                             as Gaussian distributions [18]. Even though the model includes
                            Ti,m = αi · m2 .                          (3)
                                                                             n × (m + 1) parameters θ, only the conditional probabilities be-
where αi is the feature-dependent cost of the nominal precision fea-         tween features of highest precision and classes P r(Fi,m |C), must be
ture.                                                                        trained since the conditional probabilities between lower precision
                                                                             features P r(Fi,b |Fi,b+1 ) are deterministic. Once we have knowl-
                                                                             edge of the decision levels tk that generated the lower precision fea-
3 VARIABLE FEATURE PRECISION NAIVE                                           tures, we are able to systematically add them to the tails of the pro-
  BAYES                                                                      posed naive Bayes structure.

Tuning feature precision enables a wide range of resource depen-
dent operation points.We make the probabilistic relations required           3.3      Inference
by classification tasks explicit in a Bayesian Network (BN) where
features can be observed at different precision levels.
                                                                             At any given time, every feature Fi is observed at only one
                                                                             of the precision options bi depending on the current resource
                                                                             consumption desires and constraints. Given observation o =
3.1    Model Structure
                                                                             {f1,b1 , f2,b2 , ..., fn,bn }, classification is performed by estimating
                                                                             the class posterior probability given by
The proposed BN represents a naive Bayes classifier that has multiple
versions of the same feature in each leaf, as shown by Figure 2.
   Feature versions with the highest precision (Fi,m ) are directly                                         n
                                                                                                            Y
linked to the class variable C. There are no direct links between                          P r(C|o) ∼             P r(f i, bi |C) · P r(C).                (5)
lower precision feature versions Fi \ Fi,m and the class variable                                           i=1
since the relation between these versions is deterministic. The pro-
posed structure encodes the following joint probability distribution            This implies that, for every observed feature, its lower precision
over the multiple feature version sets Fi = {Fi,m , . . . , Fi,0 } and the   versions are not observed, while their higher precision versions are
class variable C                                                             marginalized.
                                                                                Consider the example depicted in Figure 3 which may correspond
                                                                             to a robot navigation application, as discussed in Section 5.2. Sup-
             P r(C, F1,m , ..., Fn,0 ) =                                     pose we obtain sensor readings at 8 bit, 4 bit and 2 bit for sensors
             n m−1
             Y Y                                                             S1, S2 and S3, respectively and we decide to turn off sensor S4.
                       P r(Fi,b |Fi,b+1 ) · P r(Fi,m |C) · P r(C).    (4)    Here, we can estimate the class posterior probability (which can be a
             i=1 b=0                                                         location, for example) with the following equation


                                                                             25
                                                                                neighborhood search in our heuristic ensures resource reduction, and
                                        P r(C|S1,8b , S2,4b , S3,2b ) ∼         the cost function further motivates it by explicitly trading off the two
                                                                                terms.
                                                       P r(S1,8b |C) ·
                               X                                                   Two of this algorithm’s aspects distinguish it from conventional
                                      P r(S2,4b |S2,8b )P r(S2,8b |C) ·         state-of-the-art feature selection techniques [7, 28, 31]: 1) We merge
                              S2,8b                                             accuracy gain and resource usage in a joint cost optimization, hence
X X                                                                             taking hardware implementation aspects into account from the algo-
              P r(S3,2b |S3,4b ) · P r(S3,4b |S3,8b ) · P r(S3,8b |C) ·
S3,4b S3,8b
                                                                                rithmic level. 2) In contrast to cost-aware feature selection techniques
                                                                                which decide whether to use a feature or not, we enable the selection
                                                                P r(C),   (6)   of a variety of feature precision combinations.
  and predict the class c ∈ C with the highest posterior.                          Algorithm 1 details the method. We initialize the selected fea-
                                                                                ture set to the highest precision feature combination Uselected =
                                                                                {F1,m , ..., Fn,m } . At each iteration, we perform a greedy neigh-
                                                                                borhood search over n feature combination candidates. In each can-
                                        𝑪                                       didate i, the precision of feature Fi is dropped one level with re-
                                                                                spect to the current precision. We evaluate the classification accu-
                                                                                racy and resource usage of each candidate and select the one that
                                                                                maximizes the cost function CF . The procedure is repeated un-
                𝑺𝟏,𝟖𝒃        𝑆$,&'          𝑆(,&'       𝑆),&'                   til the feature combination with the lowest precision is selected
                                                                                (Uselected = {F1,0 , ..., Fn,0 }). Note that the algorithm is able to per-
                                                                                form feature pruning if a ”null precision” leaf is added to the Naive
                                                                                Bayes model (see Figure 3 for an example).
                                                                                   Classification accuracy is computed by estimating the posterior
                𝑆0,)'        𝑺𝟐,𝟒𝒃          𝑆(,)'       𝑆),)'                   probability P r(C|k) of every instance k from a testing data-set Utest
                                                                                and comparing the prediction to to the instance’s label (see Algo-
                                                                                rithm 2).

                𝑆0,$'        𝑆$,$'          𝑺𝟑,𝟐𝒃       𝑆),$'                   5      EXPERIMENTS
                                                                                We evaluate the resource-accuracy trade-off achieved by our pro-
                                                                                posal with one synthetic dataset and three data corpora from two real
                𝑆0,2'        𝑆$,2'          𝑆(,2'       𝑺𝟒,𝟎𝒃                   applications relevant to the IoT paradigm.

                                                                                5.1         Synthetic Data
  Figure 3. Example of a four feature application where each of them is
  observed at a different precision (circles with bold edges). Features with    This dataset consists of 2000 points sampled 
                                                                                                                             from 4 Gaussians
   higher precision than the observed are marginalized (circles with black                                                               −1.66
edges) and versions with lower precision are not observed (circles with gray    N (mi , σ i ), i = {1, 2, 3, 4}, where m1 = −0.33 −0.33 , m2 =
   edges). Note that feature 4 is observed with ”0 bit” precision, which is      1.00            3.33          −1.66        −2.00
                                                                                                                                    0.80 
                           equivalent to pruning it.                                                                −1.43
                                                                                   0.5
                                                                                  1.00   , m  3 =   2.00
                                                                                                     0.5  , m 4 =   −0.66  , σ 1 =   1.00 , σ 2 =
                                                                                                                                     1.00
                                                                                  1.00
                                                                                 0.70              0.5
                                                                                                  1.00            1.00 
                                                                                                                    −3.33            1.00

                                                                                  1.00 , σ 3 =            and σ 4 = 1.00 . The Gaussians are de-
                                                                                  1.00             1.00              1.00
                                                                                                   1.00
                                                                                     1.00            1.00                  1.00
4 ACCURACY-RESOURCE TUNABILITY                                                  fined to have different degrees of overlap in every dimension and
                                                                                have therefore a varying miss-classification risk for different feature
The proposed model enables multiple resource and accuracy de-                   combinations. We quantize the data-set at 5, 3, 2 and 1 bits and ran-
pendent operating points. In this paper we analyze the trade-off in-            domly divide it into a training and a testing set (used for model train-
duced by the available feature combination choices and we propose a             ing and accuracy estimation, respectively). We compute the resource
methodology to find the optimal operating points given the system’s             usage with Equation 3, as included in Table 1 . To assign the vari-
resource constraints.                                                           able αi , we assume that features that are less likely to cause miss-
   We propose an accuracy-resource sensitive algorithm that selects             classification would be more expensive to extract and compute in a
the optimal feature precision across the accuracy-resource usage                real application. Thus giving a higher value to them. We add a ”null
trade-off space. At each iteration, we select the feature set that op-          precision” leaf, to enable feature pruning as shown in the example
timizes a cost function CF , which is defined according to the de-              depicted by Figure 3.
sired application and the constraints thereto [13, 17]. In this paper              Figure 4 shows the resource vs accuracy trade-off curve achieved
we maximize the cost function given by                                          by the proposed algorithm and achieved by a typical resource-aware
                                                                                heuristic 5 in red and blue, respectively. The gray point cloud repre-
                                             !
                          ∆resource                                             sents all the possible accuracy-resource trade-off operational points
       CF = log                                  − log(∆accuracy),        (7)   to select from. The proposed heuristic has a richer feature combina-
                         max(resource)
                                                                                tion space to select from, which prevents accuracy degradation for a
   where the term ∆ refers to the predicted state difference between            5 The typical resource-aware heuristic considers only features at the highest

time k and time k+1 as will be detailed in Algorithm 1. The greedy                  precision Fi,m and decides whether to prune them by maximizing CF .


                                                                                26
                                                                                   resource usage scale-down of up to 20 times. The non-tunable pre-
                                                                                   cision heuristic has comparatively very few feature combination op-
 Algorithm 1: Feature precision selection algorithm for accuracy-                  tions to select from, which leads, in contrast, to a maximum resource
 resource tradeoff                                                                 scaling of approximately 2 times without accuracy degradation.
1 Feature precision selection (Utest , Ti,m , Θ, C, CF );
  Input : Utest , Ti,m , Θ, C, CF                                                                              90
                                                                                                                        Accuracy-resource tradeoff on the synthetic dataset

  Output: Selected feature set Uselectedk = {F1,b1 , ..., Fn,bn }
2 k=0;                                                                                                         80

      /* Initialize with the highest precision feature set                    */
                                                                                                               70


                                                                                         Accuracy percentage
3 Uselectedk = {F1,m , ..., Fn,m }
4 accuracyk ← AccuracyEvaluation(Θ,C, Uselectedk )                                                             60
5 resourcek ← Ti,bi ∀ Fi,bi ∈ Uselectedk
                                                                                                               50
      /* while the lowest feature precision has not been selected             */
6 while Uselectedk 6= {F1,0 , ..., Fn,0 }                                                                      40
7 do
8    for i = 1 to n // For each candidate combination                                                          30
                                                                                                                                    Precision-tunable resource-aware selection
9    do                                                                                                                             Resource-aware selection
                /* drop Fi ’s precision one level                            */                                20
                                                                                                                    0     50     100    150     200   250        300     350     400
10              Ucandidatei ← Uselectedk \ Fi,bi ∨ {Fi,bi −1 }                                                                         Resource consumption

11              accuracycandidatei ← AccuracyEvaluation(Θ, C,
                                                                                    Figure 4. Algorithm performance comparison on the synthetic dataset.
                Ucandidatei );
12              resourcecandidatei ← Ti,bcanditatei ∀
                Fi,bcandidatei ∈ Ucandidatei ;
13              ∆accuracycandidatei =
                accuracyk − accuracycandidatei ;
14              ∆resourcecandidatei =                                              5.2                         Real Datasests
                resourcek − resourcecandidatei ;                                   We analyze three sensor based applications that benefit from feature
15          end                                                                    precision tuning. The first is a robot navigation task. The second and
16          update k=k+1;                                                          third dataset are activity recognition tasks.
17          Uselectedk ←
              argmin CF(∆accuracycandidate ,∆resourcecandidate );
            U∈Ucandidate                                                            Wall-Following Robot Navigation We analyze a public domain
18          update accuracyk ← AccuracyEvaluation(Θ,C,                             dataset that was collected as a mobile robot navigates through a room
            Uselectedk )                                                           following the wall in a clockwise direction, for 4 rounds, using 4
19          update resourcek ← Ti,bi ∀ Fi,bi ∈ Uselectedk                          ultrasound sensors positioned on the front, left, right and back of its
            Return: Uselectedk                                                     body [11]. Four states can be identified from the sensor readings:
20    end                                                                          Move-Forward, Slight-Right-Turn, Sharp-Right-Turn or Slight-Left-
                                                                                   Turn. The data-set has a precision of 8 bits and we further quantize it
                                                                                   at 5, 2 and 1 bits. We assume the four sensors have the same hardware
                                                                                   properties, so we set the variable αi equal to one for all of them
                                                                                   and use Equation 3 to generate the resources for the experiments, as
                                                                                   shown in Table 1. Furthermore, we add a random number between 0
     Algorithm 2: Classification accuracy evaluation algorithm                     and 5 to each cost to simulate non ideal performance conditions.
                                                                                      Figure 5 shows the cost-accuracy trade-off achieved by the pro-
1  AccuracyEvaluation (Θ,C,U);
                                                                                   posed precision-tunable heuristic and the trade-off achieved by a cost
  Input : Θ,C,U
                                                                                   aware method in red and blue, respectively. The gray crosses repre-
  Output: accuracy
                                                                                   sent all the possible operation points to choose from. Both heuris-
2 correct = 0;
                                                                                   tics display negligible accuracy degradation when their resource con-
3 for k = 1 to N // With N the number of instances in the testing set
                                                                                   sumption is scaled down a factor 2 (from 400 to 200). The slight ac-
4 do
            /* For each class we approximate the posterior probability given the   curacy gain achieved by the non-tunable heuristic can be due to the
               instance currently analyzed                                    */   discretization related improvements discussed in [32] and [29] . For a
5           P r(C|k) ← P r(k|C) · P r(C)                                           factor 4 resource scaling (from 400 to 100) the non-tunable heuristic
            /* We predict the class with the highest posterior probability    */   has already pruned 3 out of its four features which causes the accu-
6           cmaxk = argmaxP r(C|k)                                                 racy to degrade from 90% to 75%. The precision-tunable heuristic
                          c∈C                                                      keeps observing all features, yet reduces their precision which also
7           if cmaxk == ck then correct=correct+1;                                 produces a factor 4 resource consumption downscale but no accuracy
8     end                                                                          degradation.
9     update accuracy ← correct ÷ N ;
      Return: accuracy                                                              USC-HAD This dataset was designed as a benchmark for Human
                                                                                   Activity Detection (HAD) algorithm comparisons [34]. It was col-
                                                                                   lected by an Inertial Measurement Unit (IMU) placed in the subjects’


                                                                                   27
                               Accuracy-resource tradeoff on the Robot Navigation dataset                                         Accuracy-resource tradeoff on the USC-HAD dataset
                         100                                                                                             90

                          90                                                                                             80

                          80
   Accuracy percentage


                                                                                                                         70


                                                                                                   Accuracy percentage
                          70                                                                                             60

                          60
                                                                                                                         50

                          50
                                                                                                                         40

                          40
                                                 Precision-tunable resource-aware selection                              30
                                                 Resource-aware selection                                                                       Precision-tunable resource-aware selection
                          30                                                                                                                    Resource-aware selection
                               0     50    100       150   200    250    300       350      400                          20
                                                    Resource consumption                                                      0           500              1000          1500         2000
                                                                                                                                                      Resource consumption
  Figure 5. Performance comparison in the robot navigation application
                                                                                                   Figure 6. Trade-off comparison on the Human Activity Detection dataset
                                                                                                                    with gyroscope and accelerometer.

hip consisting of a 3-axis accelerometer and a 3-axis gyroscope and
it contains measurements for the identification of 12 different low-                              leaf that enables feature pruning.The resource parameters used in this
level daily activities. In accordance to previously performed Activity                            experiment are listed in 1.
Recognition analyses [7, 26], the activities that can be best classified                             Figure 7 shows the results from this experiment with the same
with naive Bayes and that are therefore used in this experiment are                               color coding as previous. The precision-tunable approach’s perfor-
Walking-forward, Running-Forward, Sitting and Sleeping.                                           mance is superior, as it achieves up to 12x resource savings (from
   We tuned the dataset’s precision from the original 8 bits to 5,4,3,2                           620 to 50) for a maximum accuracy degradation of 4% (from 80% to
and 1 bits. For resource assignment, we consider that the power con-                              76%). The non-tunable strategy displays accuracy losses of less than
sumption of a gyroscope can be up to 10 times that of an accelerom-                               5% up to a resource consumption scaling of 3x (620 to 200). For any
eter [34] so we set the corresponding αi variables to 1 and 10, re-                               resource down scaling larger than that, the accuracy degrades more
spectively, and use Equation 3 to calculate the resource consumption.                             than 10%.
Like in the previous experiment, we add a random number between
                                                                                                                                  Accuracy-resource tradeoff on the HAR-RIO dataset
0 and 5 to simulate non ideal behavior. The resource consumption                                                         90
assignments for this experiment are detailed in Table 1. Again, we
enable feature pruning through the addition of the 0 bit leaf to he                                                      80
model.
                                                                                                   Accuracy percentage


   Figure 6 shows the cost-accuracy trade-off curves achieved by the                                                     70
precision-tunable and the cost-aware only heuristics in red and blue,
respectively. The possible operating points are represented by gray                                                      60
crosses. For a resource consumption downscale of 2.5 (from 2130
to 845), the non-precision tunable heuristic suffers from an accuracy
                                                                                                                         50
degradation of 6% (88% to 82%), while there is no accuracy reduc-
tion with the precision-tunable method. Although the 6% accuracy
                                                                                                                         40
loss/2x cost saving of the non-tunable strategy could be acceptable                                                                             Precision-tunable resource-aware selection
in some situations, it is worth noting the limitations imposed by the                                                                           Resource-aware selection
                                                                                                                         30
available number of operating points. In addition to the 2.5x re-                                                             0     100         200       300      400     500     600       700
source downscale, only a 30x reduction (from 2130 to 65) is pos-                                                                                      Resource consumption
sible at the expense of accuracy degrading from 88% to 61%. The
                                                                                                                   Figure 7. Trade-off comparison on the Human Activity Recognition
precision-tunable strategy has, in contrast, the possibility to choose                                                               dataset with accelerometers.
from approximately 26 operation points, with up to 6x resource sav-
ings before accuracy is lower than 80%.

 HAR-RIO In this dataset, 5 activities (Sitting-Down, Standing-
                                                                                                  6                       CONCLUSIONS AND DISCUSSION
Up, Standing, Walking, and Sitting) can be identified from 8 hours
of recordings performed by 4 accelerometers positioned in the waist,                              Our main contribution in this paper was to enable efficient embedded
left thigh, right ankle and right arm of 4 healthy subjects [30]. The                             sensor fusion through a resource-aware naive Bayes model, capable
accelerometers are tri-axial which results in a total number of 12                                of exploiting variable precision features. By encapsulating various
features; {xi , yi , zi }, i = {1, 2, 3, 4}. For the experiments in this                          precision features within the model structure, we enable the possibil-
paper, 9 of those features were selected in accordance to previ-                                  ity to dynamically tune resource consumption and inference accuracy
ously performed classification algorithm benchmarking [30], namely                                according to the circumstances and available resources. We propose
{y1 , z1 , x2 , y2 , z2 , x3 , y3 , z4 , y4 , z4 }. The dataset’s precision is 8                  an algorithm that finds optimal operating points by reducing resource
bits, we quantize it at 4,3,2,1 bits and we add the ”null precision”                              consumption and minimizing accuracy degradation.


                                                                                                  28
                                                                                      tend for voice activity detection’, Solid-State Circuits, IEEE Journal of,
               Table 1.   Feature resources used for experiments                      51(1), 291–302, (2016).
                                                                                [4]   F. H. Bijarbooneh, W. Du, E. C. H. Ngai, X. Fu, and J. Liu, ‘Cloud-
                                 m bits feature precision                             assisted data fusion and sensor selection for internet of things’, IEEE
                                                                                      Internet of Things Journal, 3(3), 257–268, (June 2016).
   Dataset           α      10      8       5       4       3      2     1      [5]   Azzedine Boukerche, Antonio AF Loureiro, Eduardo F Nakamura, Ho-
   Synthetic                                                                          racio ABF Oliveira, Heitor S Ramos, and Leandro A Villas, ‘Cloud-
     Feat. 1         1     100      -       25      -        9      4     1           assisted computing for event-driven mobile services’, Mobile Networks
     Feat. 2        0.7    70       -      17.5     -       6.3    2.8   0.7          and Applications, 19(2), 161–170, (2014).
     Feat. 3        0.9    90       -      22.5     -       8.1    3.6   0.9    [6]   Jie Cheng and Russell Greiner, ‘Comparing Bayesian network classi-
     Feat. 4        0.8    80       -       20      -       7.2    3.2   0.8          fiers’, in Proceedings of the Fifteenth conference on Uncertainty in ar-
                                                                                      tificial intelligence, pp. 101–108. Morgan Kaufmann Publishers Inc.,
   Robot             1     100      -       25      -        -     4     1            (1999).
                                                                                [7]   Jian Cui and Bin Xu, ‘Cost-effective activity recognition on mo-
   USC-HAD                                                                            bile devices’, in Proceedings of the 8th International Conference on
    Accel.           1       -      64      25      16       9      4     1           Body Area Networks, BodyNets ’13, pp. 90–96, ICST, Brussels, Bel-
    Gyro.           10       -     640     250     160      90     40    10           gium, Belgium, (2013). ICST (Institute for Computer Sciences, Social-
   HAR-RIO           1       -      64      25      16      9      4     1            Informatics and Telecommunications Engineering).
                                                                                [8]   Artem Dementyev, Steve Hodges, Stephen Taylor, and Johan Smith,
                                                                                      ‘Power consumption analysis of bluetooth low energy, zigbee and ant
                                                                                      sensor nodes in a cyclic sleep scenario’, in Wireless Symposium (IWS),
                                                                                      2013 IEEE International, pp. 1–4. IEEE, (2013).
   We have compared our scheme with a state-of-the-art resource-                [9]   K. Frank, P. Robertson, S. Fortes Rodriguez, and R. Barco Moreno,
aware feature selection technique and we conclude that overall our                    ‘Faster bayesian context inference by using dynamic value ranges’,
scheme has better cost saving capabilities due to the rich variety of                 in Pervasive Computing and Communications Workshops (PERCOM
                                                                                      Workshops), 2010 8th IEEE International Conference on, pp. 50–55,
operational points it can choose from. We tested one artificial and                   (2010).
three public domain data corpora with the proposed methodology.                [10]   K. Frank, M. Rckl, and P. Robertson, ‘The bayeslet concept for modu-
Accuracy degradation was prevented while achieving resource usage                     lar context inference’, in Mobile Ubiquitous Computing, Systems, Ser-
scalings of 20x for the synthetic dataset, 4x for the Robot Naviga-                   vices and Technologies, 2008. UBICOMM ’08. The Second Interna-
                                                                                      tional Conference on, pp. 96–101, (Sept 2008).
tion application, 6x for the Human Activity Detection application              [11]   A. L. Freire, G. A. Barreto, M. Veloso, and A. T. Varela, ‘Short-term
with accelerometers and gyroscopes, and 12x for the Human Ac-                         memory mechanisms in neural network learning of robot navigation
tivity Recognition application with accelerometers. The non-tunable                   tasks: A case study’, in Robotics Symposium (LARS), 2009 6th Latin
precision heuristic achieved, in comparison, a resource scaling of 2x                 American, pp. 1–6, (Oct 2009).
for the synthetic dataset, 2x for the Robot Navigation application,            [12]   Nir Friedman, Dan Geiger, and Moises Goldszmidt, ‘Bayesian network
                                                                                      classifiers’, Journal of Machine Learning, 29(2), 131–163, (1997).
2.5x for the Human Activity Detection application with accelerom-              [13]   Isabelle Guyon and André Elisseeff, ‘An introduction to variable and
eters and gyroscopes, and 3x for the Human Activity Recognition                       feature selection’, J. Mach. Learn. Res., 3, 1157–1182, (March 2003).
application with accelerometers.                                               [14]   Josué Iglesias, Ana M. Bernardos, Paula Tarrı́o, José R. Casar, and
   There are many ways in which feature quality tuning can improve                    Henar Martı́n, ‘Design and validation of a light inference system to
                                                                                      support embedded context reasoning’, Personal and Ubiquitous Com-
hardware resource efficiency. We proved this concept by tuning fea-                   puting, 16(7), 781–797, (2012).
ture precision but the next step in our work will be to extend the             [15]   Peter Kinget and Michiel Steyaert, ‘Impact of transistor mismatch on
proposed method to other quality tuning paradigms beyond preci-                       the speed-accuracy-power trade-off of analog CMOS circuits.’, in Pro-
sion tunability such as varying levels of noisy sensing. This will                    ceedings of the IEEE Custom Integrated Circuits Conference, pp. 333–
potentially require the modification of the proposed multiple-level                   336, (1996).
                                                                               [16]   N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qen-
Bayesian Network as the relationship between nodes of different                       dro, and F. Kawsar, ‘Deepx: A software accelerator for low-power deep
qualities will not be deterministic anymore. Furthermore, we will                     learning inference on mobile devices’, in 2016 15th ACM/IEEE In-
explore more complex structures for applications that are not mod-                    ternational Conference on Information Processing in Sensor Networks
eled with sufficient accuracy under the naive Bayes independence                      (IPSN), pp. 1–12, (April 2016).
                                                                               [17]   Steven Lauwereins, Wannes Meert, Jort Gemmeke, and Marian Ver-
assumption.                                                                           helst, ‘Ultra-low-power voice-activity-detector through context- and
   The long term goal is to integrate the optimal feature precision se-               resource-cost-aware feature selection in decision trees’, in 2014 IEEE
lection scheme in an embedded sensory application, where the sys-                     International Workshop on Machine Learning for Signal Processing
tem dynamically and autonomously selects features and their pre-                      (MLSP), pp. 1–6, (Sept 2014).
cision given the current state of the hardware devices with limited            [18]   Jonathan Lester, Tanzeem Choudhury, Nicky Kern, Gaetano Borriello,
                                                                                      and Blake Hannaford, ‘A hybrid discriminative/generative approach for
computational overhead. This scheme could enable the seamless in-                     modeling human activities.’, in IJCAI, volume 5, pp. 766–772, (2005).
tegration of sensory based algorithms into smart environments which            [19]   Daniel Lowd and Pedro Domingos, ‘Learning arithmetic circuits’,
is one of the elements envisioned for the IoT.                                        arXiv preprint arXiv:1206.3271, (2012).
                                                                               [20]   Bert Moons, Bert De Brabandere, Luc Van Gool, and Marian Verhelst,
                                                                                      ‘Energy-efficient convnets through approximate computing’, in 2016
REFERENCES                                                                            IEEE Winter Conference on Applications of Computer Vision (WACV),
                                                                                      pp. 1–8. IEEE, (2016).
[1] Rajesh Arumugam, Vikas Reddy Enti, Liu Bingbing, Wu Xiaojun, Kr-           [21]   Bert Moons and Marian Verhelst, ‘Energy and accuracy in multi-stage
    ishnamoorthy Baskaran, Foong Foo Kong, A Senthil Kumar, Kang Dee                  stochastic computing’, in New Circuits and Systems Conference (NEW-
    Meng, and Goh Wai Kit, ‘Davinci: A cloud computing framework for                  CAS), 2014 IEEE 12th International, pp. 197–200. IEEE, (2014).
    service robots’, in Robotics and Automation (ICRA), 2010 IEEE Inter-       [22]   Bert Moons and Marian Verhelst, ‘Dvas: Dynamic voltage accuracy
    national Conference on, pp. 3084–3089. IEEE, (2010).                              scaling for increased energy-efficiency in approximate computing’, in
[2] Luigi Atzori, Antonio Iera, and Giacomo Morabito, ‘The internet of                Low Power Electronics and Design (ISLPED), 2015 IEEE/ACM Inter-
    things: A survey’, Computer networks, 54(15), 2787–2805, (2010).                  national Symposium on, pp. 237–242. IEEE, (2015).
[3] Komail MH Badami, Steven Lauwereins, Wannes Meert, and Marian              [23]   Bert Moons and Marian Verhelst, ‘A 40nm CMOS, 35mW to 270mW,
    Verhelst, ‘A 90 nm CMOS, power-proportional acoustic sensing fron-


                                                                               29
       precision-scalable cnn processor for run-time scalable embedded vi-
       sion’, in IEEE VLSI Symposium. IEEE, (2016).
[24]   Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks
       of Plausible Inference, Morgan Kaufmann, 1988.
[25]   Nico Piatkowski, Sangkyun Lee, and Katharina Morik, ‘Integer undi-
       rected graphical models for resource-constrained systems’, Neurocom-
       puting, 173, 9–23, (2016).
[26]   Olga Politi, Iosif Mporas, and Vasileios Megalooikonomou, ‘Human
       motion detection in daily activity tasks using wearable sensors’, in Sig-
       nal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd
       European, pp. 2315–2319. IEEE, (2014).
[27]   John G. Proakis and Dimitris K Manolakis, Digital Signal Processing
       (4th Edition), Pearson, 2006.
[28]   Yvan Saeys, Iaki Inza, and Pedro Larraaga, ‘A review of feature selec-
       tion techniques in bioinformatics’, Bioinformatics, 23(19), 2507–2517,
       (2007).
[29]   Sebastian Tschiatschek and Franz Pernkopf, ‘On Bayesian network
       classifiers with reduced precision parameters’, Pattern Analysis and
       Machine Intelligence, IEEE Transactions on, 37(4), 774–785, (2015).
[30]   Wallace Ugulino, Débora Cardador, Katia Vega, Eduardo Velloso, Ruy
       Milidiú, and Hugo Fuks, ‘Wearable computing: Accelerometers data
       classification of body postures and movements’, in Advances in Artifi-
       cial Intelligence-SBIA 2012, 52–61, Springer, (2012).
[31]   Zhixiang Xu, Matt J Kusner, Kilian Q Weinberger, Minmin Chen, and
       Olivier Chapelle, ‘Classifier cascades and trees for minimizing feature
       evaluation cost’, The Journal of Machine Learning Research, 15(1),
       2113–2144, (2014).
[32]   Ying Yang and Geoffrey I Webb, ‘Proportional k-interval discretization
       for naive-Bayes classifiers’, in ECML, pp. 564–575. Springer, (2001).
[33]   Harry Zhang, ‘The optimality of naive Bayes’, in Proceedings of
       FLAIRS, (2004).
[34]   Mi Zhang and Alexander A. Sawchuk, ‘USC-HAD: A daily activity
       dataset for ubiquitous activity recognition using wearable sensors’, in
       ACM International Conference on Ubiquitous Computing (Ubicomp)
       Workshop on Situation, Activity and Goal Awareness (SAGAware),
       Pittsburgh, Pennsylvania, USA, (September 2012).


                                                                                   30

</pre>