=Paper= {{Paper |id=None |storemode=property |title=Reactive Bayesian Network Computation using Feedback Control: An Empirical Study |pdfUrl=https://ceur-ws.org/Vol-962/paper06.pdf |volume=Vol-962 |dblpUrl=https://dblp.org/rec/conf/uai/MengshoelIR12 }} ==Reactive Bayesian Network Computation using Feedback Control: An Empirical Study== https://ceur-ws.org/Vol-962/paper06.pdf

Reactive Bayesian Network Computation using Feedback Control:
An Empirical Study

Ole J. Mengshoel, Abe Ishihara, and Erik Reed
Carnegie Mellon University
Mo¤ett Field, CA 94035
{ole.mengshoel, abe.ishihara, erik.reed}@sv.cmu.edu

Abstract bility provides new opportunities for AI systems and
also raises several interesting research questions [21].
Can BN-based algorithms be implemented, adapted,
This paper investigates the challenge of inte-
or “wrapped”such that they reactively (or in soft real-
grating intelligent systems into varying com-
putational platforms and application mixes time) compute meaningful results across a myriad of
while providing reactive (or soft real-time) commercially available computers? Can algorithms be
response. We integrate Bayesian network optimized such that users may generally continue to
operate their computers normally while also running
computation with feedback control, thereby
AI systems–that is, continue to check email, chat, and
achieving our reactive objective. As a case
partake in social networking?
study we investigate fault diagnosis using
Bayesian networks. While we consider the To start answering these questions, this paper devel-
likelihood weighting and junction tree propa- ops an architecture that integrates Bayesian network
gation Bayesian network inference algorithms computation and feedback control. The architecture
in some detail, we hypothesize that the tech- contains two feedback loops, a lower-level inner loop
niques developed can be broadly applied to and a higher-level outer loop. What drives the inner
achieve reactive intelligent systems. In this loop is the di¤erence between desired completion time
paper’s empirical study we demonstrate re- (or setpoint, set to achieve reactivity) and actual com-
active fault diagnosis for an electrical power pletion time. The goal of the outer loop is to trade o¤
system. between higher-level issues such as speed of inference,
resource allocation, and accuracy.
While the architecture is general, we have for valida-
1 INTRODUCTION tion purposes used experimental data from an elec-
trical power network located at the NASA Ames Re-
Substantial progress has been made over the last few
search Center [30]. We demonstrate our novel ap-
decades in the areas of learning and reasoning using
proach in the area of fault diagnosis for this electri-
Bayesian networks (BNs) [29]. While most BN in-
cal power system, and investigate the BN inference
ference problems are computationally hard (NP-hard
algorithms of likelihood weighting [34], variable elim-
or worse) in the general case [6, 33, 28], e¢ cient al-
ination [20, 9], junction tree propagation [18, 35],
gorithms have been developed and demonstrated in
and belief propagation [29, 26]. Results for …ve di¤er-
a wide range of automated reasoning applications in-
ent computers and three di¤erent operating systems
cluding model-based diagnosis [19, 24, 32, 31].
are demonstrated. We show system identi…cation re-
There has recently been an explosion in the number sults for both junction tree propagation and likelihood
and types of computers available, capable of running weighting, and demonstrate outer loop control by suc-
implementations of sophisticated algorithms including cessfully changing the setpoint and BN inference algo-
BN-based algorithms. This is a result of the advance- rithm employed.
ments in both hardware and software platforms now
By integrating control theoretic techniques and un-
powering computers, which range from smart phones
certainty processing using BNs, this research aims
and tablet PCs to high-end gaming machines and high-
to: improve the reactivity and adaptability of uncer-
performance computing (HPC) clusters.
tainty processing in di¤erent applications; improve the
The proliferation of computers of highly varying capa- timeliness of computation under dynamic workloads
on varying computational platforms (smart phones, rant the use of a hard real-time operating system,
GPUs, multi-core CPUs, Hadoop clusters, ...); and re- which is often used in aerospace and other applica-
duce the e¤ort and cost associated with developing and tions where safety is paramount. Our approach is
deploying BN-based software applications. Our goals tailored to soft real-time settings where low-criticality
are similar to those of anytime algorithms [8, 37, 3], tasks can, if needed, be down-prioritized, suspended
which provide a way to trade o¤ between computation or terminated in order to provide reactive response for
time and solution quality. However, our feedback ap- high-criticality tasks.
proach is more general in that it handles both anytime
The rest of this paper is organized as follows. In
and non-anytime algorithms. In fact, we show exper-
Section 2 we provide background on Bayesian net-
imentally how our approach enables us to gracefully
work computation, feedback control, and anytime al-
switch, on-line, from a non-anytime algorithm (e.g.,
gorithms. Section 3 discusses our architecture inte-
junction tree propagation) to an anytime algorithm
grating feedback control and BN inference algorithms.
(e.g., likelihood weighting) while achieving real-time
In Section 4 we present the experimental setting and
response.
electrical power system data, while Section 5 discusses
We are investigating control theory as it applies to soft- empirical results. We conclude and outline future re-
ware and our actuators are computational actuators search in Section 6.
rather than control surfaces on an aircraft or wheels
on a robotic vehicle. For example, our actuators may
change the number of processes being handled by a 2 PRELIMINARIES
computer or the input parameters to C or Java pro-
grams that implement algorithms for BN computation. 2.1 BAYESIAN NETWORK INFERENCE
In contrast, the great majority of previous control the-
ory research has, with several notable exceptions in BNs represent multivariate probability distributions
computing and software [12, 2], focused on physical and are used for reasoning and learning under uncer-
systems governed by Newtonian mechanics. In partic- tainty [29]. Probability theory and graph theory form
ular, we know of no similar work in the area of BN the basis of BNs: Roughly speaking, random vari-
computation, including computation for diagnosis or ables are represented as nodes in a directed acyclic
prognosis, where control theory is applied. graph (DAG), while conditional dependencies are rep-
resented as graph edges. Each node is parameterized
The approach developed in this paper applies in sit- by a conditional probability density or distribution. A
uations where the computational load on a com- BN is a compact representation of a joint probability
puter running AI (here, BN) applications is a key distribution if its graph is relatively sparse.
concern and where, simultaneously, we can priori-
tize and control the computational processes. Speci…- Formally, we let X be the BN nodes, E X the ev-
cally, we partition computational processes into high- idence nodes, and e the evidence. While the nodes
criticality, medium-criticality, and low-criticality; the X can represent either discrete and continuous ran-
high-critical processes are reactive. Our experiments dom variables, we focus in this paper on the discrete
are in the area of electrical power system diagnosis. case in which a node X 2 X has a …nite number of
However, the techniques are more widely applicable states (X) = fx1 ; : : : xm g. A BN factorizes a joint
and we now outline a few areas where this approach distribution Pr(X), and allows for di¤erent probabilis-
may prove powerful. Examples of high-criticality tasks tic queries to be formulated and supported by e¢ -
in which Bayesian networks (and similar probabilis- cient algorithms; they all assume that nodes in E are
tic graphical models) can be used and for which one clamped to values e. Considering the remaining nodes
might want to provide a reactive response include med- R = X E, the probabilistic queries are with respect
ical monitoring, sensor fusion, speech recognition on to the posterior distribution P (R j e). Speci…cally,
a smartphone, gesture recognition, and video recogni- computation of most probable explanations (MPEs)
tion. Examples of low-criticality tasks, which would be amounts to …nding an MPE over R, or MPE(e). Com-
down-prioritized, suspended or killed by our approach putation of marginals (or beliefs) amounts to inferring
(depending on circumstances) include virus scanning, for one or more query nodes Q R, where Q 2 Q,
backups, disk defragmentation, software updates, and the posterior probabilities Pr(Q j e) which we denote
volunteer computing (such as Seti@Home and Fold- BEL(Q; e). In diagnosis using BNs [19, 24, 32, 31], the
ing@Home). terminology health nodes H, where Q = H, is often
used. By picking, for each Q 2 Q, a most likely state
What these applications have in common is that reac- q 2 (Q), we obtain the most likely states MLS(Q; e)
tivity is important for some computational processes. from BEL(Q; e). Computation of the maximum a pos-
However, reactivity is not important enough to war- teriori probability (MAP) generalizes MPE computa-
tion and …nds a most probable instantiation over nodes to achieve some higher level objective such as ensur-
Q R, MAP(Q; e). MAP can be approximated using ing a comfortable ride on a passenger transport jet, an
MPE and MLS; these two approximations are of inter- average production rate on a robotic assembly line, or
est because of the greater computational complexity of a maximal duration of a credit card transaction.
MAP [28] compared to MPE and BEL [6].
Recently, control theory has been applied to comput-
Di¤erent BN inference algorithms can be used to per- ing systems including control of HTTP servers [11],
form the above BEL, MPE, and MAP computations. email servers [27], quality of service assurance [36], in-
We distinguish between exact and inexact algorithms. ternet tra¢ c control [13], and load balancing. Typi-
Exact algorithms for BEL and MPE computation cally, computing systems are di¤erent from traditional
include belief propagation in singly connected BNs feedback control applications in robotics and aircraft.
[29], junction tree propagation [18, 35], conditioning First, modeling of the plant does not typically start
[29, 14], variable elimination [20, 9], and arithmetic cir- from Newtonian mechanics; rather it often begins with
cuit evaluation [7, 4]. Inexact algorithms such as loopy a black box approach. Second, actuation can in some
belief propagation1 [29, 26] and likelihood weighting cases be almost instantaneous, such as ‡ipping a bit, or
[34] have been used to compute marginals; they have writing a short integer to memory, i.e. specifying the
also been used to compute MPEs [16, 25] and MAPs maximum number of connections to a server. Third,
[28]. In this paper we focus on computation of mar- measurements are often non-noisy but delayed. In ana-
ginals; however the framework also applies to other log sensing, …ltering is used to remove noise, and at the
queries including MPE and MAP. same time can introduce signi…cant (and unwanted)
phase lag. However, in computing systems, a more
2.2 ANYTIME REASONING di¢ cult delay appears at the measurement. In appli-
cations such as control of an email server, the (dis-
An anytime algorithm improves its solutions accord- crete) delay is associated with the completion of a Re-
ing to the computational resource allocated to it, and mote Procedure Call (RPC). This delay is usually not
returns an (approximate) answer if interrupted [8]. known. Finally, disturbances can signi…cantly impact
Fundamentally, this iterative improvement algorithm performance. Take the example of the IBM Domino
framework provides a way to trade o¤ between compu- server [12]: Certain combinations of requests, made
tation time and solution quality. Anytime algorithms independently by di¤erent users impact CPU utiliza-
are a useful tool for real-time system design, and there tion in a nonlinear manner; this can be regarded as
are also results on the composition of anytime algo- a stochastic disturbance. These disturbances, which
rithms [37]. Originally, the anytime approach was may depend on time of day and day of week, can have
developed for single agents, recently it was extended a signi…cant impact.
to handle multiple agents [3].
Among BN inference algorithms, stochastic local 3 CONTROLLING BAYESIAN
search [16, 28, 22, 25], likelihood weighting [34], loopy NETWORK COMPUTATION
belief propagation [29, 26], and conditioning [29, 14]
can be considered anytime algorithms. However,
many important and high-performing algorithms— Our goal is to support the application of both non-
including junction tree propagation [18, 35], variable anytime and anytime BN inference algorithms in re-
elimination [20, 9], and arithmetic circuit evaluation active settings by introducing techniques from feed-
[7, 4]— are unfortunately not anytime algorithms. back control. A key concept in feedback control is
the plant. Here, the plant to be controlled is a dy-
namic system operating in the discrete time domain.
2.3 FEEDBACK CONTROL This de…nition encompasses computers including mo-
bile phones, tablets, laptops, desktops, and multi-core
Feedback control involves the manipulation of a sys-
systems. In this paper, the plant is a digital computer
tem in which data, either measured or estimated, is
performing a high-criticality (or reactive) process. The
transformed and utilized to modify the behavior of the
reactive process runs, in our case, a BN used for di-
system. This behavior is typically de…ned in terms of
agnosis. We also assume that on this computer, low-
metrics such as stability, boundedness, response time,
and medium-criticality (or background ) processes are
or over-shoot. Improving such metrics by means of
running and are competing for CPU cycles, memory,
feedback control is typically motivated by the desire
and other computer resources. The impact that these
1
Pearl’s main emphasis was originally on exact propa- background processes have on the reactive process and
gation in singly connected BNs, however he also mentioned its ultimate outcome (detection of a failure event) de-
inexact propagation in arbitrary BNs [29, Exercise 4.7], pends on a number of parameters such as operating
system, RAM, processor type and clock-speed, CPU
cache, etc. Since hardware and software vary im-
mensely, this represents a source of signi…cant uncer-
tainty and presents challenges for both modeling and
control.
We introduce the control system framework illustrated
in Figure 1. The framework supports both anytime
(and inexact) as well as non-anytime (and typically ex-
act) algorithms, and distinguishes between inner and
outer control loops because their objectives di¤er.
Inner Loop: The goal of the inner loop, which is
traditional to feedback control, is to make careful ad-
justments according to the parameters set by the outer
loop. These are key parameters in our integrated ap-
proach: r(k) is desired completion time (or Setpoint)
for sample time k. We would like the computational
process to …nish within this time. y(k) is the actual
completion time (or just Actual ) for sample time k.
e(k) = r(k) - y(k) is the error signal at time k; u(k)
is the maximal number of low-criticality processes (or Figure 1: Our integrated architecture, where we wrap
Max Processes). u(k), also known as the control law, Bayesian network computation into two feedback con-
is taken to be a proportional-derivative controller [12]. trol loops, a traditional inner loop where the Controller
We partition computational processes into high- controls a Plant (here, a Computer) and a higher-level
criticality (and having an r(k) value associated with outer loop. While there are several options, this paper
them), medium-criticality, and low-criticality (subject uses desired and actual computation time for setpoint
to actuation by control system). High-criticality r(k) and output y(k) respectively.
processes are reactive and essential. Medium-
criticality processes are non-reactive but essential.
Low-criticality processes are non-reactive and non- particle-based algorithms such as likelihood weighting
essential. The actual (or measured) number of low- and particle …ltering, p(k)— the number of particles—
criticality processes (or Actual Processes) at sample is an example of such a parameter. One would like to
time k is denoted v(k). use a very large number of particles for simulation,
since this improves the accuracy of the estimate of
Our approach uses simple black-box prediction tech- the posterior P (H(k) j e(k)), however this may take
niques, as detailed below. It is the combination of this too much time and one needs to carefully restrict the
black-box approach with the use of a desired compu- number of particles. In most experimental results with
tation time (setpoint r(k) in Figure 1) and an actual likelihood weighting reported in this paper, we assume
computation time (output y(k) in Figure 1) that makes that p(k) = p and r(k) = r are …xed (or stationary).2
it work. The actual computation time is just a mea-
surement of how long a BN computation took. The
desired computation time depends on the frequency of 4 METHODS AND DATA
our reactive process and other factors. For example, a
frequency of 10 Hz means that the desired computa- System health management, which may utilize
tion time is upper-bounded by 100 milliseconds. Bayesian network, can integrate information from het-
erogenous sensors and perform computation for the
Outer Loop: The goal of the outer loop is to jointly purpose of fault diagnosis, prognosis, and mitigation
optimize speed of inference, resource allocation, accu- [19, 32, 31, 5]. Electrical power system fault diagno-
racy, and other factors. Unlike inexact anytime algo- sis using Bayesian networks is the main focus of the
rithms, which have a similar objective, our approach experiments in this paper.
can use exact algorithms. If, for example, the exact
junction tree propagation and variable elimination al- Data and Bayesian Network: We used experimen-
gorithms are employed, reduced accuracy is not an is- tal data from a real-world electrical power network,
sue. However, accuracy is in general important, and known as ADAPT, located at the NASA Ames Re-
accuracy versus computation time trade-o¤s are per- 2
However, in Section 5.3 (see Figure 6) we discuss how
formed in the outer loop of the framework. For inexact and why r(k) can be changed over time.
Com- OS Data R2 MSE powerful, due to its many cores and large memories,
puter Set while the Old Server is least powerful due to its com-
Laptop Win7 1 0.824 0.606
Laptop Win7 2 0.906 1.91 paratively small memories. As re‡ected in Table 1,
Desktop Win7 1 0.810 0.315 the operating systems Windows 7 (Win7) and several
Desktop Win7 2 0.906 0.220 Linux variants–Ubuntu (Ubu9 and Ubu11) and Suse
High End Suse11 1 0.778 0.024 (Suse11) were employed in experiments.
High End Suse11 2 0.807 0.0051
Server Ubu9 1 0.935 1.17 Disturbance Generation: In the following, we
Server Ubu9 2 0.946 0.450 model for simplicity low-criticality process distur-
Old Server Ubu11 1 0.857 0.067 bances to a computer (including user disturbances) as
Old Server Ubu11 2 0.813 0.0188 a Poisson process with rate . In the experiments, the
low-criticality OS processes are CPU-intensive and ex-
Table 1: Estimation of model parameters for di¤erent
ecute mathematical operations in a tight loop; there
computers, operating systems (OSs), and data sets.
is no I/O. This introduces a stochastic delay in the
actuation of the plant: even if the control input u(k)
search Center [30]. ADAPT contains capabilities for increases at the next time step, u(k + 1) > u(k), the
power generation, power distribution, and loads rep- probability of the number of processes actually increas-
resentative for what can be found in aerospace vehi- ing is low. The Poisson disturbance term is regarded
cles. For the purpose of this paper we focus on a small as unmodeled dynamics and is not explicitly compen-
part of ADAPT containing the following components: sated for in the error term of the ARX model (1). It is
EY183 (relay), DC482 (DC load), E181 (voltage sen- of interest to model this phenomenon by a stochastic
sor), IT181 (current sensor), and ESH183 (relay feed- delay in the control input to the plant.
back sensor). Scenarios are taken from Tier 2 of the Note that our approach handles situations in which
DX 2009 competition data set.3 These scenarios con- the resource consumption of low-criticality processes
sist of nominal runs, with no faults in ADAPT, as well varies dramatically. For example, there can be many
as runs involving one or more faults in components low-criticality processes (executing in parallel) that are
or sensors, diagnosed using BNT’s implementation4 mostly idle. On the other hand, even a few resource-
of likelihood weighting (LW) [34], variable elimination hungry low-criticality processes could be too much to
(VE) [20, 9], junction tree propagation (JTP) [18, 35], handle. If many “mostly idle” processes are present,
loopy belief propagation (LBP) [26], and Pearl’s belief our controller automatically sets the “Max number of
propagation (PBP) [29].5 It has previously been es- processes” parameter u(k) higher than if there are a
tablished that BNs perform very well in this domain, few resource-hungry low-criticality processes. See Fig-
with the BN-based ProDiagnose system [32, 31] having ure 6 and Figure 5 for how u(k) is varied by our con-
the best performance in 3 of 4 of the DX international troller, due to the varying computational load of low-
diagnostic challenges in 2009 and 2010.6 criticality and Bayesian network processes. The con-
Computational Platforms: Five di¤erent comput- troller maps, see Figure 1, from a setpoint r(k) (Com-
ers, representing a broad spectrum of computing capa- putation time) to a control signal u(k) (Max number
bility, were used. The Laptop computer is a 2.40GHz of processes), and so is able to handle the varying re-
Intel i5 M520 with 4 cores, 8 GB of RAM, and 3 MB source use of di¤erent processes.
cache memory. The Desktop computer is a 3.20GHz
AMD Phenom II X6 1090T with 6 cores, 8 GB of 5 EMPIRICAL RESULTS
RAM, and 3 MB cache memory. The High End com-
puter is a 2.00GHz Intel Xeon X7550 with 64 cores, 126
In this section we present experimental results on
GB of RAM, and 18 MB cache memory. The Server
ADAPT data for our reactive approach, as enabled
computer is a 2.50GHz Intel Core 2 Quad Q8300 with
by feedback control and summarized in Figure 1.
4 cores, 8GB GB of RAM, and 2 MB cache mem-
ory. The Old Server computer is a 2.80GHz Intel Xeon
with 4 cores, 4 GB of RAM, and 1 MB cache mem- 5.1 COMPUTATION AS A PLANT
ory. Broadly speaking, the High End computer is most
Control Input: The input to the plant (i.e., the
3
http://www.dx-competition.org/ computer in Figure 1), which ultimately is computed
4
http://code.google.com/p/bnt/ by a control algorithm, is denoted by u(k). This in-
5
The speci…c BNT functions we use are
put determines the number of low-criticality processes
pearl_inf_engine (PBP), belprop_inf_engine (LBP),
likelihood_weighting_inf_engine (LW), jtree_inf_engine that are allowed to run at any given sample time. De-
(JT), and var_elim_inf_engine (VE). note the actual number of low-criticality processes by
6
http://www.dx-competition.org/ v(k) where k denotes a sampling index. In the exam-
(a) Junction tree propagation (JTP) (b) Likelihood weighting (LW)

Figure 2: Results of system identi…cation. Two di¤erent square waves used as input u(k) to system identi…cation,
where we compare Actual completion time y(k) and Model completion time ym (k) on a Laptop for (a) junction
tree propagation versus (b) likelihood weighting.

ples that are used in this paper, if v(k) u(k), then methods exist to estimate (i) in both batch and on-
processes are terminated until v(k) < u(k).7 The ter- line modes. Similar ARX models have been used in
mination process is nearly instantaneous with respect modeling a number of digital processes [12].
to the sampling period.
Open-loop Modeling and Generalization: Fig-
Plant Modeling: There are several approaches to the ure 2 shows the result of open-loop system identi…ca-
modeling of computation for control applications. The tion, speci…cally the performance of a model where the
approach taken here is termed linear Auto-Regressive parameters were estimated using least-squares (batch-
modeling with eXogenous input (linear ARX). Nonlin- mode). This …gure compares, for varying sample time
ear approaches, such as discrete time neural networks, k, the Actual computation time y(k) (see Figure 1)
may also be used. Nonlinear modeling is more complex with our model’s predicted computation time ym (k)
yet may be able to capture inherent nonlinear behav- (or just Model ). For both JTP and LW, the rate of
ior otherwise unaccounted for in ARX modeling. On Poisson process creation is 45 seconds for LW and the
the other hand, linear modeling is generally simpler to sampling rate is 51 Hz. Figure 2(a) depicts the perfor-
understand and implement.8 mance for JTP, while Figure 2(b) shows LW. Overall,
the …t between the model and actual behavior is quite
Suppose the plant is described by an ARX model with
good in both cases, perhaps slightly less so for JTP.
an additive noise term. Denote P (i) , y (i) (k), u(k), and
The following may be the explanation why the JTP
(k), the plant operator, scalar output, input, and
graph is more “blocky” and less of a …t than the LW
noise at sample time k, respectively. The integer i
graph: JTP is signi…cantly faster than LW, thus JTP
denotes a speci…c plant or device, such as a particular
is more sensitive to the other computational processes.
laptop or desktop. In general, we will assume that we
have a …nite class of plants P (i) for i 2 [1; M ]. Further details on how open-loop system identi…cation
is performed and generalizes are provided in Figure 3.9
In general, the relationship between these quantities is
Here, for (1) we simpli…ed y (i) (k) to y(k) and used a
given by:
…rst order discrete time model
T
y (i) (k) = P (i) u(k) = (k d) (i) + (k) (1) y(k) = c1 y(k 1) + c2 u(k 1) + c3 (2)
where T (k d) denotes the regression vector and con- to model a speci…c plant’s input output behavior. In-
sists of a tapped delay line of input and output mea- put signals used to generate the training and testing
surements, and (i) denotes a vector of real-valued pa- data are shown in Figure 3(a), and consist of pseudo-
rameters corresponding to the ith plant. A number of random u(k) square waves. Figure 3(b) shows the re-
7
sult of open-loop system identi…cation for two di¤er-
Currently, we randomly terminate one of the low- ent computers. Figure 3(b) shows the performance of
criticality OS processes. Additional information, such as
process priority, can easily be used to guide termination. the model where the parameters were estimated us-
8
The ARX model was chosen for several reasons; most ing least-squares (batch-mode), and depicts the per-
importantly it gave solid system identi…cation perfor- formance of the laptop (top) and the server (bottom)
mance. Allowing the plant model itself to be a probabilistic using the regression parameters obtained via laptop-
graphical model would be interesting, and stochastic ap- generated data. It is clear from these empirical results
proaches (based on stochastic di¤erential equations) have
9
been developed in control theory [10, 15]. In light of its While not illustrated in Figure 3, we note that closed
performance, we believe that the standard ARX approach loop system identi…cation is also required and presents ad-
is well-justi…ed and leave stochastic control to future work. ditional data challenges, for example, data collinearities.
(a) Inputs u(k) (b) Outputs: from u(k) to y(k) and ym (k)

Figure 3: System identi…cation and generalization. (a) Two di¤erent squarewaves, Squarewave 1 and Squarewave
2, used as input u(k). Squarewave 1 was used for training of ym (k) and Squarewave 2 was used for testing. (b)
Application to two di¤erent computers, comparing Actual completion time y(k) and Model completion time
ym (k): (top) Laptop–good …t and (bottom) Server–decent …t.

that models obtained on one platform (here, laptop) are real-valued controller gain parameters. H(z) is the
performs best on that platform. The models are likely transfer function from the error signal e(k) to the input
to perform less well on another platform (here, server), to the plant u(k). Performance for a …xed setpoint is
but are still of some value. shown in Figure 6(a). Computation time is maintained
at approximately r(k) = 2 sec, however since this is a
5.2 INNER LOOP CONTROL soft real-time approach there are excursions above this
setpoint.
The objective of the control system is to minimize the
error signal given by e(k) = r(k) y(k) where r(k) 5.3 OUTER LOOP CONTROL
denotes the reference or desired computational time
required for BN computation. The output, y(k), de- Our proposed research is based on controlling BN com-
notes the actual time the computer takes to complete putation in an inner loop as well as in an outer loop
the BN computation and generate the posterior prob- (see Figure 1); so far we discussed the inner loop. We
abilities BEL(Q; e) as discussed above. now brie‡y discuss the outer loop, where the output
BEL(Q; e) of BN computation, as well as other factors
From the control engineering perspective, there are external to the inner loop, may change desired comple-
several key issues: (1) Static uncertainty in plant para- tion time and also plant behavior (sampling frequency,
meters: A tablet PC running Windows Vista will be- BN computation algorithm, number of particles as-
have di¤erently from a High Performance Computing suming a simulation algorithm, etc.).
(HPC) cluster running openSUSE Linux. A control
system designed for one computer may not perform Adaptation to Computational Platform (see
well on another. In other words, the control system pa- Figure 1(c)): Table 1 summarizes results of run-
rameters are uncertain and need to be estimated. (2) ning likelihood weighting on the ADAPT BN, using
Stochastic disturbances: The process that determines …ve di¤erent computers and three di¤erent operating
when a low-criticality process is generated by a user systems. System identi…cation was performed on all
is, in general, stochastic. This is, in itself, an active computers using the two input data sets shown in Fig-
area of research [1]. Furthermore, the impact a partic- ure 3(a). Batch least squares was used to determine
ular low-criticality process has on the high-criticality the parameters of a …rst order linear ARX model. Ta-
process may vary substantially. (3) Stochastic delay ble 1 shows the resulting R2 and mean squared error
in the input process: When the control input u(k) in- (MSE) of the parameter …t for the various computers
creases, the actual number of processes may or may and operating systems. In most cases, we observed a
not increase. This represents a stochastic delay in the low MSE.10
actuation of the digital system being controlled. 10
It is important to note the underlying variations in
To illustrate how our approach handles the above, we the ARX model parameters, which are used by the control
design process. The diverse set of parameters, omitted
optimized a linear controller given by the following Z- here to save space, illustrate the bene…t of using techniques
transfer function: H(z) = 1 Kz 1 , where K and from adaptive feedback control, speci…cally learning from
Figure 4: Actual computation time y(k) as a function Figure 5: Experiment on Laptop using 1 Hz sampling
of actual number of processes v(k) for …ve di¤erent rate, showing a successful switch of BN inference algo-
BN inference algorithms LBP, VE, JTP, LW, and PBP rithm from junction tree propagation (JTP) to likeli-
running on High End computer. hood weighting (LW) using feedback control.

Optimizing BN Algorithm and Parameters (see
our knowledge, the …rst demonstration of a successful
Figure 1(b)): In Figure 4, we show the steady state
on-line switch between two very di¤erent inference al-
computation time results for the …ve BN inference al-
gorithms (from JTP to LW) while a desired completion
gorithms discussed above. Here, steady state is de…ned
time is maintained.
loosely as the time k such that y(k + 1) = y(k) = yss
for all k k . Plugging this into (2) yields yss = Changing the Setpoint (see Figure 1(a)): There
c2 uss +c3 is both a supply side and a demand side in computing.
1 c1 , where u(k) = uss 8k k . Thus, the linear
gain coe¢ cient (slopes in Figure 4) for each algorithm On the supply side, one can control the supply of com-
is given by 1 c2c1 . putational resources, in the form of computers, CPUs,
CPU threads, or GPU threads. On the demand side,
The results fall into three groups: fast (JTP), medium
the outer loop can vary the Setpoint r(k), perhaps in
(VE and LW), and slow (LBP and PBP) computa-
combination with varying other outer loop parameters
tions. Also, the standard error in computation time
such as sampling frequency fC (k) and number of par-
varies between the algorithms, with JTP again being
ticles p(k) used in likelihood weighting [34] or particle
the best with a very small standard error.
…ltering [17].
While JTP is exact and fast, its Achilles’heel is mem-
One reason for the outer loop to vary the com-
ory consumption [23]. Consequently, it can be neces-
putational resources allocated to the high-criticality
sary, when running other memory-intensive processes,
process is illustrated in the following. Suppose, for
to use a less memory-intensive but inexact algorithm
k < k , that P (H(k) j e(k)) suggested that there was
like LW. How should one switch between two algo-
one or more faults in the ADAPT electrical power sys-
rithms, say the non-anytime algorithm JTP and the
tem, while P (H(k ) j e(k )) indicated that this was a
anytime algorithm LW, that have very di¤erent com-
false alarm. In this case, we may want to be less strin-
putational resource requirements but operate on the
gent about ensuring that computation of P (H(k + 1)
same BN? Feedback control can help in this regard,
j e(k + 1)), P (H(k + 2) j e(k + 2)), etc. …nishes in
see Figure 5. From a control perspective, this is consid-
a timely fashion, in other words it makes sense to put
ered dynamic uncertainty in plant parameters: Given
r(k + 1) > r(k ). Figure 6(b) shows how this type of
a computer P (i) , a change in the BN algorithm will
step change, at k 23:28, is supported by our control-
impact the way the plant, P (i) , responds. Here, we
theoretic framework. The baseline of not varying r(k)
switch from JTP to LW around 12:20, while maintain-
is shown in Figure 6(a). We here assume that sam-
ing the setpoint (on average) after a transient period
ple frequency fC (k) = fC is constant, and consider (i)
lasting around 10 seconds.11 This is, to the best of
r(k 1) = r(k) < 1=fC (as when r(k) = 2 sec in both
Figure 6(a) and Figure 6(b)) versus (ii) r(k) 1=fC
the computational environment and adapt to changes.
11
Note, our approach does not make any hard real-time (as when r(k) = 4 sec Figure 6(b)). The advantage
guarantees, only soft ones, and consequently the actual of (i) is that there is a much greater chance that BN
computation time is sometimes greater than the setpoint. computation …nishes before a new computational cy-
(a) Fixed Setpoint r(k) (b) Varying Setpoint r(k)

Figure 6: Outerloop optimization for LW, using: (a) …xed Setpoint r(k) = 2 and (b) change of Setpoint,
approximately at time 23:28, from r(k) = 2 to r(k) = 4.

cle starts, which is essential when EPS faults are more Bayesian network inference algorithms including both
likely. The advantage of (ii), on the other hand, is that exact and inexact algorithms. As a consequence, our
more low-criticality processes are allowed to run. approach enables the use of exact but non-anytime al-
gorithms (like variable elimination [20, 9], junction tree
The trade-o¤ between fast inference for the high-
propagation [18, 35], and arithmetic circuit evaluation
criticality process versus running many low-criticality
[7, 4]) in reactive settings. The bene…t of this is that
processes is illustrated in Figure 6. Figure 6(b)’s
such exact algorithms often perform very well, how-
bottom panel shows an increase in the number of
ever they do have limitations and consequently there
processes, on average, as a result of the increase in
are situations where they are unsuitable. With our ap-
r(k) at k 23:28; no similar increase can be found
proach, one can use these exact algorithms and then
in Figure 6(a)’s bottom panel.
switch to an inexact (often anytime) algorithm only if
needed, rather than having to always use an anytime
6 DISCUSSION AND OUTLOOK algorithm.
We are in this paper using rather basic control theory
Deploying BN algorithms and other resource-intensive
ideas. This enables new results and many opportuni-
AI algorithms can be a challenge when there are non-
ties for future work, both theoretical and experimen-
trivial constraints on computational resources in ap-
tal, and we invite other researchers to participate in
plications. In this paper, we have focused on support-
the exploration of this exciting area of reseach. We
ing requirements for reactive response without requir-
are, for example, developing adaptive control meth-
ing dedicated hard real-time computational resources
ods that leverage online system identi…cation of the
according to worst-case computational requirements.
process. There are also many interesting research op-
We have focused on reactive diagnosis using BNs,
portunities related to the use of BN posteriors as well
motivated by domains with some but uncertain do-
as their accuracy, and perhaps multiple BNs at dif-
main knowledge (hence probabilistic graphical models,
ferent levels of detail, in the outer control loop. In
speci…cally BNs) as well as uncertainty with respect to
this area, there is a strong connection to anytime al-
the computational platform and environment (hence
gorithms and metareasoning that can be further in-
feedback control).
vestigated, enabling more reactive and more capable
An alternative approach to achieving reactive response intelligent systems.
is the use of anytime algorithms. The motivation be-
hind anytime inference— namely the goal of intelligent
and reactive systems— and our work is quite similar.
However, the approaches are very di¤erent. Anytime Acknowledgements
algorithms are inherently inexact and produce solu-
tions whose quality gradually improve with computa-
tion time [37]. We focus on what can done, on the This material is based upon work supported, in part,
computing system level, for a broad range of existing by NSF awards CCF0937044 and ECCS0931978.
References to enforce policies for interrelated metrics with
application to the Apache Web server. In Proc. of
[1] T. Beauvisage. Computer usage in daily life. Network Operations and Management Symposium
In Proc. of the 27th International Conference on (NOMS-02), pages 219–234, 2002.
Human factors in Computing Systems (CHI-09),
pages 575–584, Boston, MA, 2009. [12] J. Hellerstein, Y. Diao, S. Parekh, and D. M.
Tilbury. Feedback Control of Computing Systems.
[2] Y. Brun, G. Di Marzo Serugendo, C. Gacek, Wiley, 2004.
H. Giese, H. Kienle, M. Litoiu, H. Müller,
M. Pezzè, and M. Shaw. Engineering self-adaptive [13] C. Hollot, V. Misra, D. Towsley, and W. Gong.
systems through feedback loops. In B. H. Cheng, On designing improved controllers for AQM
R. Lemos, H. Giese, P. Inverardi, and J. Magee, routers supporting TCP ‡ows. In Proc. of IEEE
editors, Software Engineering for Self-Adaptive INFOCOM, pages 1726–1734, 2000.
Systems, pages 48–70. Springer-Verlag, 2009.
[14] E. J. Horvitz, H. J. Suermondt, and G. F. Cooper.
[3] A. Carlin and S. Zilberstein. Decentralized moni- Bounded conditioning: Flexible inference for de-
toring of distributed anytime algorithms. In Proc. cisions under scarce resources. In Proc. of the
of the Tenth International Conference on Au- Fifth Conference on Uncertainty in Arti…cial In-
tonomous Agents and Multiagent Systems, pages telligence (UAI-89), pages 182–193, Windsor, On-
157–164, Taipei, Taiwan, 2011. tario, 1989.

[4] M. Chavira and A. Darwiche. Compiling Bayesian [15] A. K. Ishihara, J. van Doornik, and S. Ben-
networks using variable elimination. In Proc. Menahem. Stochastic stability of a neural-net ro-
of the Twentieth International Joint Conference bot controller subject to signal-dependent noise in
on Arti…cial Intelligence (IJCAI-07), pages 2443– the learning rule. International Journal of Adap-
2449, Hyderabad, India, 2007. tive Control and Signal Processing, 24(6):445–
466, 2010.
[5] A. Choi, A. Darwiche, L. Zheng, and O. J. Meng-
shoel. A tutorial on Bayesian networks for system [16] K. Kask and R. Dechter. Stochastic local search
health management. In A. Srivastava and J. Han, for Bayesian networks. In Proc. of the Seventh
editors, Data Mining in Systems Health Manage- International Workshop on Arti…cial Intelligence
ment: Detection, Diagnostics, and Prognostics. and Statistics (AISTATS-99), Fort Lauderdale,
Chapman and Hall/CRC Press, 2011. FL, Jan 1999.

[6] F. G. Cooper. The computational complexity of [17] D. Koller and U. Lerner. Sampling in Fac-
probabilistic inference using Bayesian belief net- tored Dynamic Systems. In A. Doucet, J. F. G.
works. Arti…cial Intelligence, 42:393–405, 1990. de Freitas, and N. Gordon, editors, Sequential
Monte Carlo Methods In Practice, pages 445–464.
[7] A. Darwiche. A di¤erential approach to infer- Springer-Verlag, 2001.
ence in Bayesian networks. Journal of the ACM,
50(3):280–305, 2003. [18] S. Lauritzen and D. J. Spiegelhalter. Local com-
putations with probabilities on graphical struc-
[8] T. Dean and M. S. Boddy. An analysis of tures and their application to expert systems
time-dependent planning. In Proc. of the Sev- (with discussion). Journal of the Royal Statistical
enth National Conference on Arti…cial Intelli- Society series B, 50(2):157–224, 1988.
gence (AAAI-88), pages 49–54, St. Paul, MN,
1988. [19] U. Lerner, R. Parr, D. Koller, and G. Biswas.
Bayesian fault detection and diagnosis in dynamic
[9] R. Dechter. Bucket elimination: A unifying systems. In Proc. of the Seventeenth national
framework for reasoning. Arti…cial Intelligence, Conference on Arti…cial Intelligence (AAAI-00),
113(1-2):41–85, 1999. pages 531–537, 2000.

[10] H. Deng, M. Krstic, and R. J. Williams. Stabi- [20] Z. Li and B. D’Ambrosio. E¢ cient inference in
lization of stochastic nonlinear systems driven by Bayes nets as a combinatorial optimization prob-
noise of unknown covariance. IEEE Transactions lem. International Journal of Approximate Rea-
on Automatic Control, 46(8):1237 –53, 2001. soning, 11(1):55–81, 1994.

[11] Y. Diao, N. Gandhi, J. L. Hellerstein, S. Parekh, [21] O. J. Mengshoel. Designing resource-bounded
and D. M. Tilbury. Using MIMO feedback control reasoners using Bayesian networks: System
health monitoring and diagnosis. In Proceedings quality control techniques for fault diagnosis in
of the 18th International Workshop on Principles hybrid domains. In Proc. of the Annual Confer-
of Diagnosis (DX-07), pages 330–337, Nashville, ence of the PHM Society 2011 (PHM-11), Mon-
TN, 2007. treal, Canada, 2011.

[22] O. J. Mengshoel. Understanding the role of noise [32] B. W. Ricks and O. J. Mengshoel. Methods for
in stochastic local search: Analysis and experi- probabilistic fault diagnosis: An electrical power
ments. Arti…cial Intelligence, 172(8-9):955–990, system case study. In Proc. of Annual Conference
2008. of the PHM Society, 2009 (PHM-09), San Diego,
CA, 2009.
[23] O. J. Mengshoel. Understanding the scalability
of bayesian network inference using clique tree [33] D. Roth. On the hardness of approximate reason-
growth curves. Arti…cial Intelligence, 174:984– ing. Arti…cial Intelligence, 82:273–302, 1996.
1006, 2010.
[34] R. Shachter and M. Peot. Simulation approaches
[24] O. J. Mengshoel, M. Chavira, K. Cascio, S. Poll, to general probabilistic inference on belief net-
A. Darwiche, and S. Uckun. Probabilistic model- works. In Uncertainty in Arti…cial Intelligence
based diagnosis: An electrical power system case 5, pages 221–231, Amsterdam, 1990. Elsevier.
study. IEEE Trans. on Systems, Man, and Cy-
bernetics, 40(5):874–885, 2010. [35] P. P. Shenoy. A valuation-based language for ex-
pert systems. International Journal of Approxi-
[25] O. J. Mengshoel, D. Roth, and D. C. Wilkins. mate Reasoning, 5(3):383–411, 1989.
Portfolios in stochastic local search: E¢ -
ciently computing most probable explanations in [36] C.-Z. Xu, B. Liu, and J. Wei. Model predictive
Bayesian networks. Journal of Automated Rea- feedback control for QoS assurance in webservers.
soning, 46(2):103–160, 2011. Computer, 41:66–72, March 2008.
[37] S. Zilberstein and S. Russell. Optimal composi-
[26] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy
tion of real-time systems. Arti…cial Intelligence,
belief propagation for approximate inference: An
82:181–213, 1996.
empirical study. In Proc. of the Fifteenth Confer-
ence on Uncertainty in AI (UAI-99), pages 467–
475, 1999.

[27] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury,
T. Jayram, and J. Bigus. Using control theory
to achieve service level objectives in performance
management. Real-Time Systems, 23(1):841–854,
2002.

[28] J. D. Park and A. Darwiche. Complexity results
and approximation strategies for MAP explana-
tions. Journal of Arti…cial Intelligence Research
(JAIR), 21:101–133, 2004.

[29] J. Pearl. Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. Mor-
gan Kaufmann, San Mateo, CA, 1988.

[30] S. Poll, A. Patterson-Hine, J. Camisa, D. Garcia,
D. Hall, C. Lee, O. J. Mengshoel, C. Neukom,
D. Nishikawa, J. Ossenfort, A. Sweet, S. Yen-
tus, I. Roychoudhury, M. Daigle, G. Biswas, and
X. Koutsoukos. Advanced diagnostics and prog-
nostics testbed. In Proc. of the 18th International
Workshop on Principles of Diagnosis (DX-07),
pages 178–185, Nashville, TN, 2007.

[31] B. W. Ricks, C. Harrison, and O. J. Mengshoel.
Integrating probabilistic reasoning and statistical