=Paper= {{Paper |id=None |storemode=property |title=Reactive Bayesian Network Computation using Feedback Control: An Empirical Study |pdfUrl=https://ceur-ws.org/Vol-962/paper06.pdf |volume=Vol-962 |dblpUrl=https://dblp.org/rec/conf/uai/MengshoelIR12 }} ==Reactive Bayesian Network Computation using Feedback Control: An Empirical Study== https://ceur-ws.org/Vol-962/paper06.pdf
    Reactive Bayesian Network Computation using Feedback Control:
                         An Empirical Study



                             Ole J. Mengshoel, Abe Ishihara, and Erik Reed
                                        Carnegie Mellon University
                                         Mo¤ett Field, CA 94035
                          {ole.mengshoel, abe.ishihara, erik.reed}@sv.cmu.edu


                     Abstract                             bility provides new opportunities for AI systems and
                                                          also raises several interesting research questions [21].
                                                          Can BN-based algorithms be implemented, adapted,
     This paper investigates the challenge of inte-
                                                          or “wrapped”such that they reactively (or in soft real-
     grating intelligent systems into varying com-
     putational platforms and application mixes           time) compute meaningful results across a myriad of
     while providing reactive (or soft real-time)         commercially available computers? Can algorithms be
     response. We integrate Bayesian network              optimized such that users may generally continue to
                                                          operate their computers normally while also running
     computation with feedback control, thereby
                                                          AI systems–that is, continue to check email, chat, and
     achieving our reactive objective. As a case
                                                          partake in social networking?
     study we investigate fault diagnosis using
     Bayesian networks. While we consider the             To start answering these questions, this paper devel-
     likelihood weighting and junction tree propa-        ops an architecture that integrates Bayesian network
     gation Bayesian network inference algorithms         computation and feedback control. The architecture
     in some detail, we hypothesize that the tech-        contains two feedback loops, a lower-level inner loop
     niques developed can be broadly applied to           and a higher-level outer loop. What drives the inner
     achieve reactive intelligent systems. In this        loop is the di¤erence between desired completion time
     paper’s empirical study we demonstrate re-           (or setpoint, set to achieve reactivity) and actual com-
     active fault diagnosis for an electrical power       pletion time. The goal of the outer loop is to trade o¤
     system.                                              between higher-level issues such as speed of inference,
                                                          resource allocation, and accuracy.
                                                          While the architecture is general, we have for valida-
1    INTRODUCTION                                         tion purposes used experimental data from an elec-
                                                          trical power network located at the NASA Ames Re-
Substantial progress has been made over the last few
                                                          search Center [30]. We demonstrate our novel ap-
decades in the areas of learning and reasoning using
                                                          proach in the area of fault diagnosis for this electri-
Bayesian networks (BNs) [29]. While most BN in-
                                                          cal power system, and investigate the BN inference
ference problems are computationally hard (NP-hard
                                                          algorithms of likelihood weighting [34], variable elim-
or worse) in the general case [6, 33, 28], e¢ cient al-
                                                          ination [20, 9], junction tree propagation [18, 35],
gorithms have been developed and demonstrated in
                                                          and belief propagation [29, 26]. Results for …ve di¤er-
a wide range of automated reasoning applications in-
                                                          ent computers and three di¤erent operating systems
cluding model-based diagnosis [19, 24, 32, 31].
                                                          are demonstrated. We show system identi…cation re-
There has recently been an explosion in the number        sults for both junction tree propagation and likelihood
and types of computers available, capable of running      weighting, and demonstrate outer loop control by suc-
implementations of sophisticated algorithms including     cessfully changing the setpoint and BN inference algo-
BN-based algorithms. This is a result of the advance-     rithm employed.
ments in both hardware and software platforms now
                                                          By integrating control theoretic techniques and un-
powering computers, which range from smart phones
                                                          certainty processing using BNs, this research aims
and tablet PCs to high-end gaming machines and high-
                                                          to: improve the reactivity and adaptability of uncer-
performance computing (HPC) clusters.
                                                          tainty processing in di¤erent applications; improve the
The proliferation of computers of highly varying capa-    timeliness of computation under dynamic workloads
on varying computational platforms (smart phones,            rant the use of a hard real-time operating system,
GPUs, multi-core CPUs, Hadoop clusters, ...); and re-        which is often used in aerospace and other applica-
duce the e¤ort and cost associated with developing and       tions where safety is paramount. Our approach is
deploying BN-based software applications. Our goals          tailored to soft real-time settings where low-criticality
are similar to those of anytime algorithms [8, 37, 3],       tasks can, if needed, be down-prioritized, suspended
which provide a way to trade o¤ between computation          or terminated in order to provide reactive response for
time and solution quality. However, our feedback ap-         high-criticality tasks.
proach is more general in that it handles both anytime
                                                             The rest of this paper is organized as follows. In
and non-anytime algorithms. In fact, we show exper-
                                                             Section 2 we provide background on Bayesian net-
imentally how our approach enables us to gracefully
                                                             work computation, feedback control, and anytime al-
switch, on-line, from a non-anytime algorithm (e.g.,
                                                             gorithms. Section 3 discusses our architecture inte-
junction tree propagation) to an anytime algorithm
                                                             grating feedback control and BN inference algorithms.
(e.g., likelihood weighting) while achieving real-time
                                                             In Section 4 we present the experimental setting and
response.
                                                             electrical power system data, while Section 5 discusses
We are investigating control theory as it applies to soft-   empirical results. We conclude and outline future re-
ware and our actuators are computational actuators           search in Section 6.
rather than control surfaces on an aircraft or wheels
on a robotic vehicle. For example, our actuators may
change the number of processes being handled by a            2     PRELIMINARIES
computer or the input parameters to C or Java pro-
grams that implement algorithms for BN computation.          2.1   BAYESIAN NETWORK INFERENCE
In contrast, the great majority of previous control the-
ory research has, with several notable exceptions in         BNs represent multivariate probability distributions
computing and software [12, 2], focused on physical          and are used for reasoning and learning under uncer-
systems governed by Newtonian mechanics. In partic-          tainty [29]. Probability theory and graph theory form
ular, we know of no similar work in the area of BN           the basis of BNs: Roughly speaking, random vari-
computation, including computation for diagnosis or          ables are represented as nodes in a directed acyclic
prognosis, where control theory is applied.                  graph (DAG), while conditional dependencies are rep-
                                                             resented as graph edges. Each node is parameterized
The approach developed in this paper applies in sit-         by a conditional probability density or distribution. A
uations where the computational load on a com-               BN is a compact representation of a joint probability
puter running AI (here, BN) applications is a key            distribution if its graph is relatively sparse.
concern and where, simultaneously, we can priori-
tize and control the computational processes. Speci…-        Formally, we let X be the BN nodes, E X the ev-
cally, we partition computational processes into high-       idence nodes, and e the evidence. While the nodes
criticality, medium-criticality, and low-criticality; the    X can represent either discrete and continuous ran-
high-critical processes are reactive. Our experiments        dom variables, we focus in this paper on the discrete
are in the area of electrical power system diagnosis.        case in which a node X 2 X has a …nite number of
However, the techniques are more widely applicable           states (X) = fx1 ; : : : xm g. A BN factorizes a joint
and we now outline a few areas where this approach           distribution Pr(X), and allows for di¤erent probabilis-
may prove powerful. Examples of high-criticality tasks       tic queries to be formulated and supported by e¢ -
in which Bayesian networks (and similar probabilis-          cient algorithms; they all assume that nodes in E are
tic graphical models) can be used and for which one          clamped to values e. Considering the remaining nodes
might want to provide a reactive response include med-       R = X E, the probabilistic queries are with respect
ical monitoring, sensor fusion, speech recognition on        to the posterior distribution P (R j e). Speci…cally,
a smartphone, gesture recognition, and video recogni-        computation of most probable explanations (MPEs)
tion. Examples of low-criticality tasks, which would be      amounts to …nding an MPE over R, or MPE(e). Com-
down-prioritized, suspended or killed by our approach        putation of marginals (or beliefs) amounts to inferring
(depending on circumstances) include virus scanning,         for one or more query nodes Q        R, where Q 2 Q,
backups, disk defragmentation, software updates, and         the posterior probabilities Pr(Q j e) which we denote
volunteer computing (such as Seti@Home and Fold-             BEL(Q; e). In diagnosis using BNs [19, 24, 32, 31], the
ing@Home).                                                   terminology health nodes H, where Q = H, is often
                                                             used. By picking, for each Q 2 Q, a most likely state
What these applications have in common is that reac-         q 2 (Q), we obtain the most likely states MLS(Q; e)
tivity is important for some computational processes.        from BEL(Q; e). Computation of the maximum a pos-
However, reactivity is not important enough to war-          teriori probability (MAP) generalizes MPE computa-
tion and …nds a most probable instantiation over nodes        to achieve some higher level objective such as ensur-
Q R, MAP(Q; e). MAP can be approximated using                 ing a comfortable ride on a passenger transport jet, an
MPE and MLS; these two approximations are of inter-           average production rate on a robotic assembly line, or
est because of the greater computational complexity of        a maximal duration of a credit card transaction.
MAP [28] compared to MPE and BEL [6].
                                                              Recently, control theory has been applied to comput-
Di¤erent BN inference algorithms can be used to per-          ing systems including control of HTTP servers [11],
form the above BEL, MPE, and MAP computations.                email servers [27], quality of service assurance [36], in-
We distinguish between exact and inexact algorithms.          ternet tra¢ c control [13], and load balancing. Typi-
Exact algorithms for BEL and MPE computation                  cally, computing systems are di¤erent from traditional
include belief propagation in singly connected BNs            feedback control applications in robotics and aircraft.
[29], junction tree propagation [18, 35], conditioning        First, modeling of the plant does not typically start
[29, 14], variable elimination [20, 9], and arithmetic cir-   from Newtonian mechanics; rather it often begins with
cuit evaluation [7, 4]. Inexact algorithms such as loopy      a black box approach. Second, actuation can in some
belief propagation1 [29, 26] and likelihood weighting         cases be almost instantaneous, such as ‡ipping a bit, or
[34] have been used to compute marginals; they have           writing a short integer to memory, i.e. specifying the
also been used to compute MPEs [16, 25] and MAPs              maximum number of connections to a server. Third,
[28]. In this paper we focus on computation of mar-           measurements are often non-noisy but delayed. In ana-
ginals; however the framework also applies to other           log sensing, …ltering is used to remove noise, and at the
queries including MPE and MAP.                                same time can introduce signi…cant (and unwanted)
                                                              phase lag. However, in computing systems, a more
2.2    ANYTIME REASONING                                      di¢ cult delay appears at the measurement. In appli-
                                                              cations such as control of an email server, the (dis-
An anytime algorithm improves its solutions accord-           crete) delay is associated with the completion of a Re-
ing to the computational resource allocated to it, and        mote Procedure Call (RPC). This delay is usually not
returns an (approximate) answer if interrupted [8].           known. Finally, disturbances can signi…cantly impact
Fundamentally, this iterative improvement algorithm           performance. Take the example of the IBM Domino
framework provides a way to trade o¤ between compu-           server [12]: Certain combinations of requests, made
tation time and solution quality. Anytime algorithms          independently by di¤erent users impact CPU utiliza-
are a useful tool for real-time system design, and there      tion in a nonlinear manner; this can be regarded as
are also results on the composition of anytime algo-          a stochastic disturbance. These disturbances, which
rithms [37]. Originally, the anytime approach was             may depend on time of day and day of week, can have
developed for single agents, recently it was extended         a signi…cant impact.
to handle multiple agents [3].
Among BN inference algorithms, stochastic local               3   CONTROLLING BAYESIAN
search [16, 28, 22, 25], likelihood weighting [34], loopy         NETWORK COMPUTATION
belief propagation [29, 26], and conditioning [29, 14]
can be considered anytime algorithms.           However,
many important and high-performing algorithms—                Our goal is to support the application of both non-
including junction tree propagation [18, 35], variable        anytime and anytime BN inference algorithms in re-
elimination [20, 9], and arithmetic circuit evaluation        active settings by introducing techniques from feed-
[7, 4]— are unfortunately not anytime algorithms.             back control. A key concept in feedback control is
                                                              the plant. Here, the plant to be controlled is a dy-
                                                              namic system operating in the discrete time domain.
2.3    FEEDBACK CONTROL                                       This de…nition encompasses computers including mo-
                                                              bile phones, tablets, laptops, desktops, and multi-core
Feedback control involves the manipulation of a sys-
                                                              systems. In this paper, the plant is a digital computer
tem in which data, either measured or estimated, is
                                                              performing a high-criticality (or reactive) process. The
transformed and utilized to modify the behavior of the
                                                              reactive process runs, in our case, a BN used for di-
system. This behavior is typically de…ned in terms of
                                                              agnosis. We also assume that on this computer, low-
metrics such as stability, boundedness, response time,
                                                              and medium-criticality (or background ) processes are
or over-shoot. Improving such metrics by means of
                                                              running and are competing for CPU cycles, memory,
feedback control is typically motivated by the desire
                                                              and other computer resources. The impact that these
   1
    Pearl’s main emphasis was originally on exact propa-      background processes have on the reactive process and
gation in singly connected BNs, however he also mentioned     its ultimate outcome (detection of a failure event) de-
inexact propagation in arbitrary BNs [29, Exercise 4.7],      pends on a number of parameters such as operating
system, RAM, processor type and clock-speed, CPU
cache, etc. Since hardware and software vary im-
mensely, this represents a source of signi…cant uncer-
tainty and presents challenges for both modeling and
control.
We introduce the control system framework illustrated
in Figure 1. The framework supports both anytime
(and inexact) as well as non-anytime (and typically ex-
act) algorithms, and distinguishes between inner and
outer control loops because their objectives di¤er.
Inner Loop: The goal of the inner loop, which is
traditional to feedback control, is to make careful ad-
justments according to the parameters set by the outer
loop. These are key parameters in our integrated ap-
proach: r(k) is desired completion time (or Setpoint)
for sample time k. We would like the computational
process to …nish within this time. y(k) is the actual
completion time (or just Actual ) for sample time k.
e(k) = r(k) - y(k) is the error signal at time k; u(k)
is the maximal number of low-criticality processes (or      Figure 1: Our integrated architecture, where we wrap
Max Processes). u(k), also known as the control law,        Bayesian network computation into two feedback con-
is taken to be a proportional-derivative controller [12].   trol loops, a traditional inner loop where the Controller
We partition computational processes into high-             controls a Plant (here, a Computer) and a higher-level
criticality (and having an r(k) value associated with       outer loop. While there are several options, this paper
them), medium-criticality, and low-criticality (subject     uses desired and actual computation time for setpoint
to actuation by control system).       High-criticality     r(k) and output y(k) respectively.
processes are reactive and essential.         Medium-
criticality processes are non-reactive but essential.
Low-criticality processes are non-reactive and non-         particle-based algorithms such as likelihood weighting
essential. The actual (or measured) number of low-          and particle …ltering, p(k)— the number of particles—
criticality processes (or Actual Processes) at sample       is an example of such a parameter. One would like to
time k is denoted v(k).                                     use a very large number of particles for simulation,
                                                            since this improves the accuracy of the estimate of
Our approach uses simple black-box prediction tech-         the posterior P (H(k) j e(k)), however this may take
niques, as detailed below. It is the combination of this    too much time and one needs to carefully restrict the
black-box approach with the use of a desired compu-         number of particles. In most experimental results with
tation time (setpoint r(k) in Figure 1) and an actual       likelihood weighting reported in this paper, we assume
computation time (output y(k) in Figure 1) that makes       that p(k) = p and r(k) = r are …xed (or stationary).2
it work. The actual computation time is just a mea-
surement of how long a BN computation took. The
desired computation time depends on the frequency of        4    METHODS AND DATA
our reactive process and other factors. For example, a
frequency of 10 Hz means that the desired computa-          System health management, which may utilize
tion time is upper-bounded by 100 milliseconds.             Bayesian network, can integrate information from het-
                                                            erogenous sensors and perform computation for the
Outer Loop: The goal of the outer loop is to jointly        purpose of fault diagnosis, prognosis, and mitigation
optimize speed of inference, resource allocation, accu-     [19, 32, 31, 5]. Electrical power system fault diagno-
racy, and other factors. Unlike inexact anytime algo-       sis using Bayesian networks is the main focus of the
rithms, which have a similar objective, our approach        experiments in this paper.
can use exact algorithms. If, for example, the exact
junction tree propagation and variable elimination al-      Data and Bayesian Network: We used experimen-
gorithms are employed, reduced accuracy is not an is-       tal data from a real-world electrical power network,
sue. However, accuracy is in general important, and         known as ADAPT, located at the NASA Ames Re-
accuracy versus computation time trade-o¤s are per-            2
                                                                 However, in Section 5.3 (see Figure 6) we discuss how
formed in the outer loop of the framework. For inexact      and why r(k) can be changed over time.
      Com-         OS       Data    R2      MSE           powerful, due to its many cores and large memories,
      puter                 Set                           while the Old Server is least powerful due to its com-
      Laptop       Win7         1   0.824   0.606
      Laptop       Win7         2   0.906   1.91          paratively small memories. As re‡ected in Table 1,
      Desktop      Win7         1   0.810   0.315         the operating systems Windows 7 (Win7) and several
      Desktop      Win7         2   0.906   0.220         Linux variants–Ubuntu (Ubu9 and Ubu11) and Suse
      High End     Suse11       1   0.778   0.024         (Suse11) were employed in experiments.
      High End     Suse11       2   0.807   0.0051
      Server       Ubu9         1   0.935   1.17          Disturbance Generation: In the following, we
      Server       Ubu9         2   0.946   0.450         model for simplicity low-criticality process distur-
      Old Server   Ubu11        1   0.857   0.067         bances to a computer (including user disturbances) as
      Old Server   Ubu11        2   0.813   0.0188        a Poisson process with rate . In the experiments, the
                                                          low-criticality OS processes are CPU-intensive and ex-
Table 1: Estimation of model parameters for di¤erent
                                                          ecute mathematical operations in a tight loop; there
computers, operating systems (OSs), and data sets.
                                                          is no I/O. This introduces a stochastic delay in the
                                                          actuation of the plant: even if the control input u(k)
search Center [30]. ADAPT contains capabilities for       increases at the next time step, u(k + 1) > u(k), the
power generation, power distribution, and loads rep-      probability of the number of processes actually increas-
resentative for what can be found in aerospace vehi-      ing is low. The Poisson disturbance term is regarded
cles. For the purpose of this paper we focus on a small   as unmodeled dynamics and is not explicitly compen-
part of ADAPT containing the following components:        sated for in the error term of the ARX model (1). It is
EY183 (relay), DC482 (DC load), E181 (voltage sen-        of interest to model this phenomenon by a stochastic
sor), IT181 (current sensor), and ESH183 (relay feed-     delay in the control input to the plant.
back sensor). Scenarios are taken from Tier 2 of the      Note that our approach handles situations in which
DX 2009 competition data set.3 These scenarios con-       the resource consumption of low-criticality processes
sist of nominal runs, with no faults in ADAPT, as well    varies dramatically. For example, there can be many
as runs involving one or more faults in components        low-criticality processes (executing in parallel) that are
or sensors, diagnosed using BNT’s implementation4         mostly idle. On the other hand, even a few resource-
of likelihood weighting (LW) [34], variable elimination   hungry low-criticality processes could be too much to
(VE) [20, 9], junction tree propagation (JTP) [18, 35],   handle. If many “mostly idle” processes are present,
loopy belief propagation (LBP) [26], and Pearl’s belief   our controller automatically sets the “Max number of
propagation (PBP) [29].5 It has previously been es-       processes” parameter u(k) higher than if there are a
tablished that BNs perform very well in this domain,      few resource-hungry low-criticality processes. See Fig-
with the BN-based ProDiagnose system [32, 31] having      ure 6 and Figure 5 for how u(k) is varied by our con-
the best performance in 3 of 4 of the DX international    troller, due to the varying computational load of low-
diagnostic challenges in 2009 and 2010.6                  criticality and Bayesian network processes. The con-
Computational Platforms: Five di¤erent comput-            troller maps, see Figure 1, from a setpoint r(k) (Com-
ers, representing a broad spectrum of computing capa-     putation time) to a control signal u(k) (Max number
bility, were used. The Laptop computer is a 2.40GHz       of processes), and so is able to handle the varying re-
Intel i5 M520 with 4 cores, 8 GB of RAM, and 3 MB         source use of di¤erent processes.
cache memory. The Desktop computer is a 3.20GHz
AMD Phenom II X6 1090T with 6 cores, 8 GB of              5     EMPIRICAL RESULTS
RAM, and 3 MB cache memory. The High End com-
puter is a 2.00GHz Intel Xeon X7550 with 64 cores, 126
                                                          In this section we present experimental results on
GB of RAM, and 18 MB cache memory. The Server
                                                          ADAPT data for our reactive approach, as enabled
computer is a 2.50GHz Intel Core 2 Quad Q8300 with
                                                          by feedback control and summarized in Figure 1.
4 cores, 8GB GB of RAM, and 2 MB cache mem-
ory. The Old Server computer is a 2.80GHz Intel Xeon
with 4 cores, 4 GB of RAM, and 1 MB cache mem-            5.1   COMPUTATION AS A PLANT
ory. Broadly speaking, the High End computer is most
                                                          Control Input: The input to the plant (i.e., the
  3
      http://www.dx-competition.org/                      computer in Figure 1), which ultimately is computed
    4
      http://code.google.com/p/bnt/                       by a control algorithm, is denoted by u(k). This in-
    5
      The   speci…c  BNT    functions   we   use   are
                                                          put determines the number of low-criticality processes
pearl_inf_engine (PBP), belprop_inf_engine (LBP),
likelihood_weighting_inf_engine (LW), jtree_inf_engine    that are allowed to run at any given sample time. De-
(JT), and var_elim_inf_engine (VE).                       note the actual number of low-criticality processes by
    6
      http://www.dx-competition.org/                      v(k) where k denotes a sampling index. In the exam-
            (a) Junction tree propagation (JTP)                              (b) Likelihood weighting (LW)




Figure 2: Results of system identi…cation. Two di¤erent square waves used as input u(k) to system identi…cation,
where we compare Actual completion time y(k) and Model completion time ym (k) on a Laptop for (a) junction
tree propagation versus (b) likelihood weighting.


ples that are used in this paper, if v(k) u(k), then            methods exist to estimate (i) in both batch and on-
processes are terminated until v(k) < u(k).7 The ter-           line modes. Similar ARX models have been used in
mination process is nearly instantaneous with respect           modeling a number of digital processes [12].
to the sampling period.
                                                                Open-loop Modeling and Generalization: Fig-
Plant Modeling: There are several approaches to the             ure 2 shows the result of open-loop system identi…ca-
modeling of computation for control applications. The           tion, speci…cally the performance of a model where the
approach taken here is termed linear Auto-Regressive            parameters were estimated using least-squares (batch-
modeling with eXogenous input (linear ARX). Nonlin-             mode). This …gure compares, for varying sample time
ear approaches, such as discrete time neural networks,          k, the Actual computation time y(k) (see Figure 1)
may also be used. Nonlinear modeling is more complex            with our model’s predicted computation time ym (k)
yet may be able to capture inherent nonlinear behav-            (or just Model ). For both JTP and LW, the rate of
ior otherwise unaccounted for in ARX modeling. On               Poisson process creation is 45 seconds for LW and the
the other hand, linear modeling is generally simpler to         sampling rate is 51 Hz. Figure 2(a) depicts the perfor-
understand and implement.8                                      mance for JTP, while Figure 2(b) shows LW. Overall,
                                                                the …t between the model and actual behavior is quite
Suppose the plant is described by an ARX model with
                                                                good in both cases, perhaps slightly less so for JTP.
an additive noise term. Denote P (i) , y (i) (k), u(k), and
                                                                The following may be the explanation why the JTP
  (k), the plant operator, scalar output, input, and
                                                                graph is more “blocky” and less of a …t than the LW
noise at sample time k, respectively. The integer i
                                                                graph: JTP is signi…cantly faster than LW, thus JTP
denotes a speci…c plant or device, such as a particular
                                                                is more sensitive to the other computational processes.
laptop or desktop. In general, we will assume that we
have a …nite class of plants P (i) for i 2 [1; M ].             Further details on how open-loop system identi…cation
                                                                is performed and generalizes are provided in Figure 3.9
In general, the relationship between these quantities is
                                                                Here, for (1) we simpli…ed y (i) (k) to y(k) and used a
given by:
                                                                …rst order discrete time model
                                  T
       y (i) (k) = P (i) u(k) =       (k   d) (i) + (k)   (1)             y(k) = c1 y(k    1) + c2 u(k   1) + c3       (2)
where T (k d) denotes the regression vector and con-            to model a speci…c plant’s input output behavior. In-
sists of a tapped delay line of input and output mea-           put signals used to generate the training and testing
surements, and (i) denotes a vector of real-valued pa-          data are shown in Figure 3(a), and consist of pseudo-
rameters corresponding to the ith plant. A number of            random u(k) square waves. Figure 3(b) shows the re-
    7
                                                                sult of open-loop system identi…cation for two di¤er-
      Currently, we randomly terminate one of the low-          ent computers. Figure 3(b) shows the performance of
criticality OS processes. Additional information, such as
process priority, can easily be used to guide termination.      the model where the parameters were estimated us-
    8
      The ARX model was chosen for several reasons; most        ing least-squares (batch-mode), and depicts the per-
importantly it gave solid system identi…cation perfor-          formance of the laptop (top) and the server (bottom)
mance. Allowing the plant model itself to be a probabilistic    using the regression parameters obtained via laptop-
graphical model would be interesting, and stochastic ap-        generated data. It is clear from these empirical results
proaches (based on stochastic di¤erential equations) have
                                                                   9
been developed in control theory [10, 15]. In light of its           While not illustrated in Figure 3, we note that closed
performance, we believe that the standard ARX approach          loop system identi…cation is also required and presents ad-
is well-justi…ed and leave stochastic control to future work.   ditional data challenges, for example, data collinearities.
                         (a) Inputs u(k)                    (b) Outputs: from u(k) to y(k) and ym (k)




Figure 3: System identi…cation and generalization. (a) Two di¤erent squarewaves, Squarewave 1 and Squarewave
2, used as input u(k). Squarewave 1 was used for training of ym (k) and Squarewave 2 was used for testing. (b)
Application to two di¤erent computers, comparing Actual completion time y(k) and Model completion time
ym (k): (top) Laptop–good …t and (bottom) Server–decent …t.


that models obtained on one platform (here, laptop)          are real-valued controller gain parameters. H(z) is the
performs best on that platform. The models are likely        transfer function from the error signal e(k) to the input
to perform less well on another platform (here, server),     to the plant u(k). Performance for a …xed setpoint is
but are still of some value.                                 shown in Figure 6(a). Computation time is maintained
                                                             at approximately r(k) = 2 sec, however since this is a
5.2   INNER LOOP CONTROL                                     soft real-time approach there are excursions above this
                                                             setpoint.
The objective of the control system is to minimize the
error signal given by e(k) = r(k) y(k) where r(k)            5.3    OUTER LOOP CONTROL
denotes the reference or desired computational time
required for BN computation. The output, y(k), de-           Our proposed research is based on controlling BN com-
notes the actual time the computer takes to complete         putation in an inner loop as well as in an outer loop
the BN computation and generate the posterior prob-          (see Figure 1); so far we discussed the inner loop. We
abilities BEL(Q; e) as discussed above.                      now brie‡y discuss the outer loop, where the output
                                                             BEL(Q; e) of BN computation, as well as other factors
From the control engineering perspective, there are          external to the inner loop, may change desired comple-
several key issues: (1) Static uncertainty in plant para-    tion time and also plant behavior (sampling frequency,
meters: A tablet PC running Windows Vista will be-           BN computation algorithm, number of particles as-
have di¤erently from a High Performance Computing            suming a simulation algorithm, etc.).
(HPC) cluster running openSUSE Linux. A control
system designed for one computer may not perform             Adaptation to Computational Platform (see
well on another. In other words, the control system pa-      Figure 1(c)): Table 1 summarizes results of run-
rameters are uncertain and need to be estimated. (2)         ning likelihood weighting on the ADAPT BN, using
Stochastic disturbances: The process that determines         …ve di¤erent computers and three di¤erent operating
when a low-criticality process is generated by a user        systems. System identi…cation was performed on all
is, in general, stochastic. This is, in itself, an active    computers using the two input data sets shown in Fig-
area of research [1]. Furthermore, the impact a partic-      ure 3(a). Batch least squares was used to determine
ular low-criticality process has on the high-criticality     the parameters of a …rst order linear ARX model. Ta-
process may vary substantially. (3) Stochastic delay         ble 1 shows the resulting R2 and mean squared error
in the input process: When the control input u(k) in-        (MSE) of the parameter …t for the various computers
creases, the actual number of processes may or may           and operating systems. In most cases, we observed a
not increase. This represents a stochastic delay in the      low MSE.10
actuation of the digital system being controlled.              10
                                                                 It is important to note the underlying variations in
To illustrate how our approach handles the above, we         the ARX model parameters, which are used by the control
                                                             design process. The diverse set of parameters, omitted
optimized a linear controller given by the following Z-      here to save space, illustrate the bene…t of using techniques
transfer function: H(z) = 1 Kz 1 , where K and               from adaptive feedback control, speci…cally learning from
Figure 4: Actual computation time y(k) as a function        Figure 5: Experiment on Laptop using 1 Hz sampling
of actual number of processes v(k) for …ve di¤erent         rate, showing a successful switch of BN inference algo-
BN inference algorithms LBP, VE, JTP, LW, and PBP           rithm from junction tree propagation (JTP) to likeli-
running on High End computer.                               hood weighting (LW) using feedback control.


Optimizing BN Algorithm and Parameters (see
                                                            our knowledge, the …rst demonstration of a successful
Figure 1(b)): In Figure 4, we show the steady state
                                                            on-line switch between two very di¤erent inference al-
computation time results for the …ve BN inference al-
                                                            gorithms (from JTP to LW) while a desired completion
gorithms discussed above. Here, steady state is de…ned
                                                            time is maintained.
loosely as the time k such that y(k + 1) = y(k) = yss
for all k      k . Plugging this into (2) yields yss =      Changing the Setpoint (see Figure 1(a)): There
c2 uss +c3                                                  is both a supply side and a demand side in computing.
   1 c1    , where u(k) = uss 8k k . Thus, the linear
gain coe¢ cient (slopes in Figure 4) for each algorithm     On the supply side, one can control the supply of com-
is given by 1 c2c1 .                                        putational resources, in the form of computers, CPUs,
                                                            CPU threads, or GPU threads. On the demand side,
The results fall into three groups: fast (JTP), medium
                                                            the outer loop can vary the Setpoint r(k), perhaps in
(VE and LW), and slow (LBP and PBP) computa-
                                                            combination with varying other outer loop parameters
tions. Also, the standard error in computation time
                                                            such as sampling frequency fC (k) and number of par-
varies between the algorithms, with JTP again being
                                                            ticles p(k) used in likelihood weighting [34] or particle
the best with a very small standard error.
                                                            …ltering [17].
While JTP is exact and fast, its Achilles’heel is mem-
                                                            One reason for the outer loop to vary the com-
ory consumption [23]. Consequently, it can be neces-
                                                            putational resources allocated to the high-criticality
sary, when running other memory-intensive processes,
                                                            process is illustrated in the following. Suppose, for
to use a less memory-intensive but inexact algorithm
                                                            k < k , that P (H(k) j e(k)) suggested that there was
like LW. How should one switch between two algo-
                                                            one or more faults in the ADAPT electrical power sys-
rithms, say the non-anytime algorithm JTP and the
                                                            tem, while P (H(k ) j e(k )) indicated that this was a
anytime algorithm LW, that have very di¤erent com-
                                                            false alarm. In this case, we may want to be less strin-
putational resource requirements but operate on the
                                                            gent about ensuring that computation of P (H(k + 1)
same BN? Feedback control can help in this regard,
                                                            j e(k + 1)), P (H(k + 2) j e(k + 2)), etc. …nishes in
see Figure 5. From a control perspective, this is consid-
                                                            a timely fashion, in other words it makes sense to put
ered dynamic uncertainty in plant parameters: Given
                                                            r(k + 1) > r(k ). Figure 6(b) shows how this type of
a computer P (i) , a change in the BN algorithm will
                                                            step change, at k     23:28, is supported by our control-
impact the way the plant, P (i) , responds. Here, we
                                                            theoretic framework. The baseline of not varying r(k)
switch from JTP to LW around 12:20, while maintain-
                                                            is shown in Figure 6(a). We here assume that sam-
ing the setpoint (on average) after a transient period
                                                            ple frequency fC (k) = fC is constant, and consider (i)
lasting around 10 seconds.11 This is, to the best of
                                                            r(k 1) = r(k) < 1=fC (as when r(k) = 2 sec in both
                                                            Figure 6(a) and Figure 6(b)) versus (ii) r(k) 1=fC
the computational environment and adapt to changes.
  11
     Note, our approach does not make any hard real-time    (as when r(k) = 4 sec Figure 6(b)). The advantage
guarantees, only soft ones, and consequently the actual     of (i) is that there is a much greater chance that BN
computation time is sometimes greater than the setpoint.    computation …nishes before a new computational cy-
                 (a) Fixed Setpoint r(k)                                 (b) Varying Setpoint r(k)




Figure 6: Outerloop optimization for LW, using: (a) …xed Setpoint r(k) = 2 and (b) change of Setpoint,
approximately at time 23:28, from r(k) = 2 to r(k) = 4.


cle starts, which is essential when EPS faults are more     Bayesian network inference algorithms including both
likely. The advantage of (ii), on the other hand, is that   exact and inexact algorithms. As a consequence, our
more low-criticality processes are allowed to run.          approach enables the use of exact but non-anytime al-
                                                            gorithms (like variable elimination [20, 9], junction tree
The trade-o¤ between fast inference for the high-
                                                            propagation [18, 35], and arithmetic circuit evaluation
criticality process versus running many low-criticality
                                                            [7, 4]) in reactive settings. The bene…t of this is that
processes is illustrated in Figure 6. Figure 6(b)’s
                                                            such exact algorithms often perform very well, how-
bottom panel shows an increase in the number of
                                                            ever they do have limitations and consequently there
processes, on average, as a result of the increase in
                                                            are situations where they are unsuitable. With our ap-
r(k) at k      23:28; no similar increase can be found
                                                            proach, one can use these exact algorithms and then
in Figure 6(a)’s bottom panel.
                                                            switch to an inexact (often anytime) algorithm only if
                                                            needed, rather than having to always use an anytime
6   DISCUSSION AND OUTLOOK                                  algorithm.
                                                            We are in this paper using rather basic control theory
Deploying BN algorithms and other resource-intensive
                                                            ideas. This enables new results and many opportuni-
AI algorithms can be a challenge when there are non-
                                                            ties for future work, both theoretical and experimen-
trivial constraints on computational resources in ap-
                                                            tal, and we invite other researchers to participate in
plications. In this paper, we have focused on support-
                                                            the exploration of this exciting area of reseach. We
ing requirements for reactive response without requir-
                                                            are, for example, developing adaptive control meth-
ing dedicated hard real-time computational resources
                                                            ods that leverage online system identi…cation of the
according to worst-case computational requirements.
                                                            process. There are also many interesting research op-
We have focused on reactive diagnosis using BNs,
                                                            portunities related to the use of BN posteriors as well
motivated by domains with some but uncertain do-
                                                            as their accuracy, and perhaps multiple BNs at dif-
main knowledge (hence probabilistic graphical models,
                                                            ferent levels of detail, in the outer control loop. In
speci…cally BNs) as well as uncertainty with respect to
                                                            this area, there is a strong connection to anytime al-
the computational platform and environment (hence
                                                            gorithms and metareasoning that can be further in-
feedback control).
                                                            vestigated, enabling more reactive and more capable
An alternative approach to achieving reactive response      intelligent systems.
is the use of anytime algorithms. The motivation be-
hind anytime inference— namely the goal of intelligent
and reactive systems— and our work is quite similar.
However, the approaches are very di¤erent. Anytime          Acknowledgements
algorithms are inherently inexact and produce solu-
tions whose quality gradually improve with computa-
tion time [37]. We focus on what can done, on the           This material is based upon work supported, in part,
computing system level, for a broad range of existing       by NSF awards CCF0937044 and ECCS0931978.
References                                                    to enforce policies for interrelated metrics with
                                                              application to the Apache Web server. In Proc. of
 [1] T. Beauvisage. Computer usage in daily life.             Network Operations and Management Symposium
     In Proc. of the 27th International Conference on         (NOMS-02), pages 219–234, 2002.
     Human factors in Computing Systems (CHI-09),
     pages 575–584, Boston, MA, 2009.                     [12] J. Hellerstein, Y. Diao, S. Parekh, and D. M.
                                                               Tilbury. Feedback Control of Computing Systems.
 [2] Y. Brun, G. Di Marzo Serugendo, C. Gacek,                 Wiley, 2004.
     H. Giese, H. Kienle, M. Litoiu, H. Müller,
     M. Pezzè, and M. Shaw. Engineering self-adaptive     [13] C. Hollot, V. Misra, D. Towsley, and W. Gong.
     systems through feedback loops. In B. H. Cheng,           On designing improved controllers for AQM
     R. Lemos, H. Giese, P. Inverardi, and J. Magee,           routers supporting TCP ‡ows. In Proc. of IEEE
     editors, Software Engineering for Self-Adaptive           INFOCOM, pages 1726–1734, 2000.
     Systems, pages 48–70. Springer-Verlag, 2009.
                                                          [14] E. J. Horvitz, H. J. Suermondt, and G. F. Cooper.
 [3] A. Carlin and S. Zilberstein. Decentralized moni-         Bounded conditioning: Flexible inference for de-
     toring of distributed anytime algorithms. In Proc.        cisions under scarce resources. In Proc. of the
     of the Tenth International Conference on Au-              Fifth Conference on Uncertainty in Arti…cial In-
     tonomous Agents and Multiagent Systems, pages             telligence (UAI-89), pages 182–193, Windsor, On-
     157–164, Taipei, Taiwan, 2011.                            tario, 1989.

 [4] M. Chavira and A. Darwiche. Compiling Bayesian       [15] A. K. Ishihara, J. van Doornik, and S. Ben-
     networks using variable elimination. In Proc.             Menahem. Stochastic stability of a neural-net ro-
     of the Twentieth International Joint Conference           bot controller subject to signal-dependent noise in
     on Arti…cial Intelligence (IJCAI-07), pages 2443–         the learning rule. International Journal of Adap-
     2449, Hyderabad, India, 2007.                             tive Control and Signal Processing, 24(6):445–
                                                               466, 2010.
 [5] A. Choi, A. Darwiche, L. Zheng, and O. J. Meng-
     shoel. A tutorial on Bayesian networks for system    [16] K. Kask and R. Dechter. Stochastic local search
     health management. In A. Srivastava and J. Han,           for Bayesian networks. In Proc. of the Seventh
     editors, Data Mining in Systems Health Manage-            International Workshop on Arti…cial Intelligence
     ment: Detection, Diagnostics, and Prognostics.            and Statistics (AISTATS-99), Fort Lauderdale,
     Chapman and Hall/CRC Press, 2011.                         FL, Jan 1999.

 [6] F. G. Cooper. The computational complexity of        [17] D. Koller and U. Lerner. Sampling in Fac-
     probabilistic inference using Bayesian belief net-        tored Dynamic Systems. In A. Doucet, J. F. G.
     works. Arti…cial Intelligence, 42:393–405, 1990.          de Freitas, and N. Gordon, editors, Sequential
                                                               Monte Carlo Methods In Practice, pages 445–464.
 [7] A. Darwiche. A di¤erential approach to infer-             Springer-Verlag, 2001.
     ence in Bayesian networks. Journal of the ACM,
     50(3):280–305, 2003.                                 [18] S. Lauritzen and D. J. Spiegelhalter. Local com-
                                                               putations with probabilities on graphical struc-
 [8] T. Dean and M. S. Boddy. An analysis of                   tures and their application to expert systems
     time-dependent planning. In Proc. of the Sev-             (with discussion). Journal of the Royal Statistical
     enth National Conference on Arti…cial Intelli-            Society series B, 50(2):157–224, 1988.
     gence (AAAI-88), pages 49–54, St. Paul, MN,
     1988.                                                [19] U. Lerner, R. Parr, D. Koller, and G. Biswas.
                                                               Bayesian fault detection and diagnosis in dynamic
 [9] R. Dechter. Bucket elimination: A unifying                systems. In Proc. of the Seventeenth national
     framework for reasoning. Arti…cial Intelligence,          Conference on Arti…cial Intelligence (AAAI-00),
     113(1-2):41–85, 1999.                                     pages 531–537, 2000.

[10] H. Deng, M. Krstic, and R. J. Williams. Stabi-       [20] Z. Li and B. D’Ambrosio. E¢ cient inference in
     lization of stochastic nonlinear systems driven by        Bayes nets as a combinatorial optimization prob-
     noise of unknown covariance. IEEE Transactions            lem. International Journal of Approximate Rea-
     on Automatic Control, 46(8):1237 –53, 2001.               soning, 11(1):55–81, 1994.

[11] Y. Diao, N. Gandhi, J. L. Hellerstein, S. Parekh,    [21] O. J. Mengshoel. Designing resource-bounded
     and D. M. Tilbury. Using MIMO feedback control            reasoners using Bayesian networks: System
    health monitoring and diagnosis. In Proceedings            quality control techniques for fault diagnosis in
    of the 18th International Workshop on Principles           hybrid domains. In Proc. of the Annual Confer-
    of Diagnosis (DX-07), pages 330–337, Nashville,            ence of the PHM Society 2011 (PHM-11), Mon-
    TN, 2007.                                                  treal, Canada, 2011.

[22] O. J. Mengshoel. Understanding the role of noise      [32] B. W. Ricks and O. J. Mengshoel. Methods for
     in stochastic local search: Analysis and experi-           probabilistic fault diagnosis: An electrical power
     ments. Arti…cial Intelligence, 172(8-9):955–990,           system case study. In Proc. of Annual Conference
     2008.                                                      of the PHM Society, 2009 (PHM-09), San Diego,
                                                                CA, 2009.
[23] O. J. Mengshoel. Understanding the scalability
     of bayesian network inference using clique tree       [33] D. Roth. On the hardness of approximate reason-
     growth curves. Arti…cial Intelligence, 174:984–            ing. Arti…cial Intelligence, 82:273–302, 1996.
     1006, 2010.
                                                           [34] R. Shachter and M. Peot. Simulation approaches
[24] O. J. Mengshoel, M. Chavira, K. Cascio, S. Poll,           to general probabilistic inference on belief net-
     A. Darwiche, and S. Uckun. Probabilistic model-            works. In Uncertainty in Arti…cial Intelligence
     based diagnosis: An electrical power system case           5, pages 221–231, Amsterdam, 1990. Elsevier.
     study. IEEE Trans. on Systems, Man, and Cy-
     bernetics, 40(5):874–885, 2010.                       [35] P. P. Shenoy. A valuation-based language for ex-
                                                                pert systems. International Journal of Approxi-
[25] O. J. Mengshoel, D. Roth, and D. C. Wilkins.               mate Reasoning, 5(3):383–411, 1989.
     Portfolios in stochastic local search:     E¢ -
     ciently computing most probable explanations in       [36] C.-Z. Xu, B. Liu, and J. Wei. Model predictive
     Bayesian networks. Journal of Automated Rea-               feedback control for QoS assurance in webservers.
     soning, 46(2):103–160, 2011.                               Computer, 41:66–72, March 2008.
                                                           [37] S. Zilberstein and S. Russell. Optimal composi-
[26] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy
                                                                tion of real-time systems. Arti…cial Intelligence,
     belief propagation for approximate inference: An
                                                                82:181–213, 1996.
     empirical study. In Proc. of the Fifteenth Confer-
     ence on Uncertainty in AI (UAI-99), pages 467–
     475, 1999.

[27] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury,
     T. Jayram, and J. Bigus. Using control theory
     to achieve service level objectives in performance
     management. Real-Time Systems, 23(1):841–854,
     2002.

[28] J. D. Park and A. Darwiche. Complexity results
     and approximation strategies for MAP explana-
     tions. Journal of Arti…cial Intelligence Research
     (JAIR), 21:101–133, 2004.

[29] J. Pearl. Probabilistic Reasoning in Intelligent
     Systems: Networks of Plausible Inference. Mor-
     gan Kaufmann, San Mateo, CA, 1988.

[30] S. Poll, A. Patterson-Hine, J. Camisa, D. Garcia,
     D. Hall, C. Lee, O. J. Mengshoel, C. Neukom,
     D. Nishikawa, J. Ossenfort, A. Sweet, S. Yen-
     tus, I. Roychoudhury, M. Daigle, G. Biswas, and
     X. Koutsoukos. Advanced diagnostics and prog-
     nostics testbed. In Proc. of the 18th International
     Workshop on Principles of Diagnosis (DX-07),
     pages 178–185, Nashville, TN, 2007.

[31] B. W. Ricks, C. Harrison, and O. J. Mengshoel.
     Integrating probabilistic reasoning and statistical