<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Approximate Inference of Dynamic Latent Classification Models for Oil Drilling Monitoring</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shengtong Zhong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science Norwegian University of Science and Technology 7491 Trondheim</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We have been working with dynamic data from an oil production facility in the North sea, where unstable situations should be identified as soon as possible. Monitoring in such a complex domain is a challenging task. Not only is such a domain typically volatile and following non-linear dynamics, but sensor input to the monitoring system can also often be high dimensional, making it difficult to model and classify the domain's states. Dynamic latent classification models are dynamic Bayesian networks capable of effective and efficient modeling and classification. An approximate inference algorithm utilizing Gaussian collapse has been tailormade for this family of models, but the approximation's properties have not been fully explored. In this paper we compare alternatives approximate inference methods for the dynamic latent classification model, in particular focusing on traditional sampling techniques. We show that the approximate scheme based on Gaussian collapse is computationally more efficient than sampling, while offering comparable accuracy results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the oil drilling, monitor the complex process and
identify the current system state is actually very
difficult. Monitoring the complex process often involves
keeping an eye on hundreds or thousands of sensors to
determine whether or not the process is stable. We
report results on an oil production facility in the North
sea, where unstable situations should be identified as
soon as possible [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The oil drilling data that we
are considering, consisting of some sixty variables, is
captured every five seconds. The data is monitored in
real time by experienced engineers, who have a
number of tasks to perform ranging from understanding
the situation on the platform (activity recognition) via
avoiding a number of either dangerous or costly
situations (event detection), to optimization of the drilling
operation. The variables that are collected cover both
topside measurements (like flow rates) and down-hole
measurements (like, for instance, gamma rate). For
the discussions to be concrete, we will tie the
development to the task of activity recognition in this paper.
The drilling of a well is a complex process, which
consists of activities that are performed iteratively as the
length of the well increases, and knowing which
activity is performed at any time is important for the
further event detection.
      </p>
      <p>
        Motivated by this problem setting, a generative
model called dynamic latent classification models (dLCM)
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for dynamic classification in continuous domains
is proposed to help the drilling engineers by
automatically analyzing the data stream and classify the
situation accordingly. Dynamic latent classification models
are Bayesian networks which could model the complex
system process and identify its system state.
A dynamic latent classification model can be seen as
combining a na¨ıve Bayes model with a mixture of
factor analyzers at each time point. The latent variables
of the factor analyzers are used to capture the
statespecific dynamics of the process as well as modeling
dependencies between attributes. As exact inference
for the model is intractable, an approximate inference
scheme based on Gaussian collapse is proposed in our
previous study [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Although the previous
experiments demonstrated that the proposed approximate
inference is functioned well the learning of dynamic
latent classification models as well as the classification
work, we further investigate the approximation’s
properties by introducing alternative sampling techniques.
The remaining of the paper is organized as follows. In
Section 2, we introduce the detail of dLCM. The
importance of Gaussian collapse in the inference of dLCM
is discussed in Section 3. Next, alternative sampling
techniques are proposed for dLCM in Section 4. After
the experiment results are illustrated and discussed in
Section 5, the conclusion of the paper is presented in
Section 6.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dynamic Latent Classification</title>
    </sec>
    <sec id="sec-3">
      <title>Models</title>
      <p>
        Dynamic Latent classification models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are dynamic
Bayesian networks, which can model the complex
system process and identify its system state. The
complex system process is highly dynamical and complex,
which makes it difficult to model and idetentify with
the static models and standard dynamic model. The
framework of dLCM is specified incrementally by
examining its expressivity relative to the oil drilling data.
The dLCM is established from na¨ıve Bayes model
(NB), which is one of the simplest static models. In the
first step, temporal dynamics of the class variables (a
first order Markov chain) is added as considerable
correlation between the class variable of consecutive time
slices are evidenced from the oil drilling data [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This
results in a dynamic version of na¨ıve Bayes, which is
also equivalent to a standard first order hidden Markov
model (HMM) shown in Figure 1, where Ct denotes
the class variable at time t and Yit denotes the i-th
attribute at time t. This model type has a long history
of usage in monitoring, see e.g. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Ct−1</p>
      <p>C t</p>
      <p>
        This model is described by a prior distribution over
the class variable P (c0), a conditional observation
distribution P (yt|ct), and transition probabilities for the
class variable P (ct|ct−1); we assume that the
model is stationary, i.e., P (yt|ct) = P (yt−1|ct−1) and
P (ct|ct−1) = P (ct+1|ct), for all t. For the continuous
observation vector, the conditional distribution may
be specified by a class-conditional multivariate
Gaussian distribution with mean μct and covariance matrix
Σct , i.e., Y |{Ct = ct} ∼ N (μct , Σct )
In a standard HMM, it assumes that the class
variable and attributes at different time points are
independent given the class variables at the current time,
which is violated in many real world setting. In our
oil drilling data, there is also a strong correlation
between attributes given the class [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Modeling the
dependence between attributes is then the next step
in creating the dLM.
      </p>
      <p>
        Following [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we introduce latent variables to
encode conditional dependence among the attributes.
Specifically, for each time step t we have the vector
Zt = (Z1t, . . . , Zt ) of latent variables that appear as
k
children of the class variable and parents of all the
attributes (see Figure 2). It can be seen as combining
the NB model with a factor analysis model at each
time step.
      </p>
      <p>Ct−1
The latent variable Zt is assigned a multivariate
Gaussian distribution conditional on the class variable and
the attributes Y t are assumed to be linear
multivariate Gaussian distributions conditional on the latent
variables:</p>
      <p>Zt|{Ct = ct} ∼ N (μct , Σct ),</p>
      <p>Y t|{Zt = zt} ∼ N (Lzt + Φ, Θ),
where Σct and Θ are diagonal covariance matrix and L
is the transition matrix, Φ is the offset from the latent
space to attribute space; note that the stationarity
assumption encoded in the model.</p>
      <p>In this model, the latent variables capture the
dependence between the attributes. They are conditionally
independent given the class but marginally
dependent. Furthermore, the same mapping, L, from the latent
space to the attribute space is used for all classes, and
hence, the relation between the class and the attributes
is conveyed by the latent variables only.</p>
      <p>
        At this step, the temporal dynamics of the model is
assumed to be only captured at the class level. When the
state specification of the class variable is coarse, then
this assumption will rarely hold. This assumption does
not hold in our oil drilling data, as the conditional
correlation of the attribute in successive time slices is
evident [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. we address this by modeling the dynamics of
the system at the level of the latent variables. The
state specific dynamics is encoded by assuming that the
latent variable at the current time slice follows a
linear Gaussian distribution conditioned on previous time
slice. Specifically, we encode the state specific
dynamics by assuming that the multivariate latent variable
Zt follows a linear Gaussian distribution conditioned
on Zt−1, and the transition dynamics between latent
variable is denoted by a diagonal matrix Act :
M t|{Ct = ct} ∼ P (M t|Ct = ct),
      </p>
      <p>Y t|{Zt = zt, M t = mt} ∼ N (Lmt zt + Φmt , Θmt ),
where 1 ≤ mt ≤ |sp (M )| (|sp (M )| denotes the
dimension of variable M space), P (M t = mt|Ct = ct) ≥ 0
and P|mspt(=M1 )| P (M t = mt|Ct = ct) = 1 for all 1 ≤
ct ≤ |sp (C)|, Φmt is the offset from the latent space
to attribute space.</p>
      <p>
        The final model is then called dynamic latent
classification model which is shown in Figure 4. The
dynamic latent classification model is shown to be
effective and efficient through the experiment with our oil
drilling data, and the significant improvement is
also demonstrated when comparing dLCM with static
models (such as NB or decision tree) and HMM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Zt|{Zt−1 = zt−1, Ct = ct} ∼ N (Act zt−1, Σct )
Ct−1
A graphical representation of the model is given in
Figure 3.
      </p>
      <p>Ct−1
Zt−1
1</p>
      <p>Zt−1
2</p>
      <p>C t
Z t
1</p>
      <p>
        Z t
2
A discrete mixture variable M is further introduced to
the model at each time slice for the purpose of
reducing the computational cost while maintaining the
representational power [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Similar situation is done by
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for static domains, and in the dynamic domains can
be seen from [
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ] where a probabilistic model called
switching state-space model is proposed that
combining discrete and continuous dynamics. In this case,
the mixture variable follows a multinomial distribution
conditioned on the class variable. and the attributes
Y t follow a multivariate Gaussian distribution
conditioned on the latent variables and the discrete mixture
variable,
      </p>
    </sec>
    <sec id="sec-4">
      <title>Approximate inference in dLCM</title>
      <p>The exact inference for dLCM is intractable. To make
dLCM applicable and effective in practice,
approximate inference is then proposed.
3.1</p>
      <sec id="sec-4-1">
        <title>Intractability of exact inference in dLCM</title>
        <p>
          Seen from the dLCM in Figure 4, an equivalent
probabilistic model is
p(y1:T , z1:T , m1:T , c1:T ) =
p(y1|z1, m1)p(z1|c1)p(m1|c1)p(c1) ·
T
Y p(yt|zt, mt)p(zt|zt−1, ct)p(mt|ct)p(ct|ct−1).
t=2
In dLCM, exact filtered and smoothed inference is
shown to be intractable (scaling exponentially with
T [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]) as neither the class variables nor the mixture
variables are observed: At time step 1, p(z1|y1) is
a mixture of |sp (C)| · |sp (M )| Gaussian. At
timestep 2, due to the summation over the classes and
mixture variables, p(z2|y1:2) will be a mixture of
|sp (C)|2·|sp (M )|2 Gaussian; at time-step 3 it will be a
mixture of |sp (C)|3 · |sp (M )|3 Gaussian and so on
until the generation of a mixture of |sp (C)|T · |sp (M )|T
Gaussian at time-point T . To control this explosion
in computational complexity, approximate inference
techniques are adopted to the inference of dLCM.
3.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Approximate inference: Forward pass</title>
        <p>
          The structure of the proposed dLCM is similar to
the linear dynamical system (LDS) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the
standard Rauch-Tung-Striebel (RTS) smoother [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and
the expectation correction smoother [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for LDS
provide the basis for the approximate inference of
dLCM. As for the RTS, the filtered estimate of dLCM
p(zt, mt, ct|y1:t) is obtained by a forward recursion,
and then following a backward recursion to calculate
the smoothed estimate p(zt, mt, ct|y1:T ). The
inference of dLCM is then achieved by a single forward
recursion and a single backward recursion iteratively.
Gaussian collapse is incorporated into both the
forward recursion and the backward recursion to form
the approximate inference. The Gaussian collapse in
the forward recursion is equivalent to assumed density
filtering [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and the Gaussian collapse in the backward
recursion mirrors the smoothed posterior collapse from
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>During the forward recursion of dLCM, the filtered
posterior p(zt, mt, ct|y1:t) is computed with a recursive
form. By a simple decomposition,
p(zt, mt, ct|y1:t) = p(zt, mt, ct, yt|y1:t−1)/p(yt|y1:t−1)
∝ p(zt, mt, ct, yt|y1:t−1).</p>
        <p>
          Dropping the normalization constant p(yt|y1:t−1),
p(zt, mt, ct|y1:t) is proportional to the new joint
probability p(zt, mt, ct, yt|y1:t−1), where
p(zt, mt, ct, yt|y1:t−1) = p(yt, zt|mt, ct, y1:t−1)·
p(mt|ct, y1:t−1)p(ct|y1:t−1).
(1)
To build the forward recursion, a recursive form for
each of the factors in Equation 1 is required.
Given the filtered results of the previous time-step, the
recursive form for each of the factors are shown to
be feasible [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. On the way to devise the recursive
form, one term p(zt−1|mt−1, ct−1, y1:t−1) is required,
which can be directly obtained since it is the filtered
probability from the previous time step. However, the
number of mixture components of p(zt−1|yt−1) is
increasing exponentially over time as we discussed
earlier, so is the case for p(zt−1|mt−1, ct−1, y1:t−1). In
our Gaussian collapse implementation [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the
term p(zt−1|mt−1, ct−1, y1:t−1) is collapsed into a single
Gaussian, parameterized with mean νmt−1,ct−1 and
covariance Γmt−1,ct−1, and then propagate this collapsed
Gaussian for next time slice. With this approximation,
the recursive computation of the forward pass becomes
tractable.
3.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Approximate inference: Backward pass</title>
        <p>Similar to the forward pass, the backward pass
also relies on a recursion computation of the smoothed
posterior p(zt, mt, ct|y1:T ). In detail, p(zt, mt, ct|y1:T )
is computed from its smoothed result of the previous
step p(zt+1, mt+1, ct+1|y1:T ), together with some
other quantities obtained from forward pass. The first
smoothed posterior is p(zT , mT , cT |y1:T ), which can be
directly obtained as it is also the last filtered posterior
from the forward pass. To compute p(zt, mt, ct|y1:T ),
factorize it as
p(zt, mt, ct|y1:T )
=
=
mt+1,ct+1</p>
        <p>X</p>
        <p>X
mt+1,ct+1
p(zt, mt, ct, mt+1, ct+1|y1:T )
p(zt|mt, ct, mt+1, ct+1, y1:T ) ·
p(mt, ct|mt+1, ct+1, y1:T )p(mt+1, ct+1|y1:T ).
Due to the fact
zt⊥⊥{yt+1:T , mt+1, ct+1}|{zt+1, mt, ct}, the
p(zt|mt, ct, mt+1, ct+1, y1:T ) can be found from
that
term
p(zt|mt, ct, mt+1, ct+1, y1:T )
= Z</p>
        <p>p(zt|zt+1, mt, ct, y1:t) ·
zt+1
p(zt+1|mt, ct, mt+1, ct+1, y1:T )dzt+1.</p>
        <p>
          To complete the backward recursive form, two
essential assumptions are further made in the
backward pass that makes the approximate inference
applicable and effective. The first assumption
is to approximate p(zt+1|mt, ct, mt+1, ct+1, y1:T ) by
p(zt+1|mt+1, ct+1, y1:T ) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This is due to that
although {mt, ct} ⊥6⊥zt+1|y1:T , the influence of {mt, ct}
on zt+1 through zt is ’weak’ as zt will be mostly
influenced by y1:t. The benefit of this simple assumption
lies in that p(zt+1|mt+1, ct+1, y1:T ) can be directly
obtained from the previous backward recursion.
Meanwhile p(zt+1|mt+1, ct+1, y1:T ) is a Gaussian mixture
whose components increase exponentially in T − t.
The second assumption is also a Gaussian collapse
process. p(zt+1|mt+1, ct+1, y1:T ) is collapsed into a single
Gaussian and then pass this collapsed Gaussian for the
next step. This will guarantee that the back
propagated term p(zt|mt, ct, y1:T ) will be Gaussian mixture
with fixed |sp (C)| · |sp (M )| components at next time
step. With this Gaussian collapse process at each time
slice, a tractable recursion in backward pass is
established.
3.4
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>The importance of approximate inference</title>
        <p>
          The exact inference is not applicable in practise as
its computation cost is increasing exponentially over
time. The approximate inference is then essential to
dLCM. Gaussian collapse is adopted during building
the recursive form for both forward and backward pass.
At the same time, p(zt+1|mt, ct, mt+1, ct+1, y1:T ) is
also approximated by p(zt+1|mt+1, ct+1) in dLCM. As
the approximations are made within the inference, the
quality of the overall learning and inference for
dLCM is rather sensitive to these approximations. Our
experimental results [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] showed that the overall
performance of dLCM is satisfactory, which indicate that
the chosen approximations are reasonable.
        </p>
        <p>
          Even though the proposed approximate inference in
dLCM is satisfactory, is there any improvement space
with alternative approximation methods? With this
question in mind, we decide to investigate the
approximation’s properties by incorporating new
approximation method. The traditional sampling techniques
(e.g., [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) are commonly used in a similar
approximation situation. Next section we will briefly introduce
sampling technique, and then we will explain how it is
integrated in dLCM. Meanwhile the approximation of
p(zt+1|mt, ct, mt+1, ct+1, y1:T ) by p(zt+1|mt+1, ct+1)
is kept unchanged.
        </p>
        <p>In general, we will replace Gaussian collapse by
sampling in the approximate inference of dLCM, and
further investigate the effectiveness and efficiency of this
proposal through a comparison experiment between
original Gaussian collapse based dLCM and sampling
techniques based dLCM.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Sampling</title>
      <sec id="sec-5-1">
        <title>Background</title>
        <p>The sampling is to select a subset of samples from
within a population to estimate the characteristics of
the original population. There is a commonly known
tradeoff in sampling. When less samples are
selected from within a population, which means the
sampling process takes shorter time, the estimation of the
characteristics to the original population is relatively
worse. On the other hand, if more samples are
selected from within the same population, which of course is
much more time consuming, the characteristics of the
original population is better estimated. The
efficiency (time consuming) and effectiveness (characteristics
estimation) are the essential concerns in the sampling
techniques. In general, more samples should be
selected within the tolerable time, and the better estimation
of characteristics of the population can be expected.
This feature of traditional sampling techniques makes
it attractive to the approximate inference of dLCM,
a balance between efficiency and effectiveness is
expected to be achieved according to application
requirement. Meanwhile sampling can approximate any
distribution as long as the sample number is sufficient.
Sampling is expected to replace the Gaussian collapse
for the approximation in the both forward and
backward pass. We introduce particle filtering next, which
will further motivate our discussion on the utilizations
detail of the sampling in the inference of dLCM.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Particle Filtering</title>
        <p>
          Particle filtering (PF) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is a technique for
implementing a recursive Bayesian filter by Monte Carlo
simulation, which is an efficient statistical method to
estimate the system state. The Monte Carlo simulation
relies on repeated random sampling techniques.
In particle filtering, let a weighted particle set
{(stn, πnt)}nN=1 at each time t denotes an approximation
of required posterior probability of the system state.
Each of N particles has the state stn and its weight
πnt, the weights are normalized such that Pn πnt = 1.
The particle filtering has three operation stages:
sampling (selection), prediction and observation. In the
sampling stage, N particles are chosen from the prior
probability according to the set {(s(nt−1), πnt−1)}nN=1.
Then predict the state of the chosen particles by the
dynamic model p(st|st−1). In observation stage, the
predicted particles are weighted according to
observation model p(yt|st) . After obtaining the weights of
particles, the state at time t can be estimated based
on the weighted particle set.
4.3
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>Sampling in the dCLM</title>
        <p>
          The sampling process that we required for the
inference of dCLM is similar to the PF. In the forward pass,
we know that the mixture components of p(zt−1|yt−1)
is increasing exponentially over time in the
exact inference. Instead of a recursive approximation
on p(zt−1|mt−1, ct−1, y1:t−1) in the Gaussian collapse
scheme, an recursive approximation on p(zt−1|y1:t−1)
by sampling is adopted. With the obtained
approximated distribution p(zt−1|y1:t−1) at time slice t−1, N
weighted samples {(stn−1, πnt−1)}nN=1 are selected from
this approximated distribution. These selected
samples are propagated to the next time slice t with a
linear transition dynamics Act . As the discrete class
variable Ct has the size of |sp (C)|, then each of the
selected samples will become |sp (C)| new samples. These
|sp (C)|·N propagated samples are further updated by
the observation yt. The updating rule is the same as
the Kalmar filter updating [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Due to the mixture
component has size of |sp (M )|, each of these
propagated samples will become |sp (M )| new samples again.
In general, the N selected samples from time slice t − 1
will become |sp (C)| · |sp (M )| · N samples at time
slice t and its weight are updated accordingly. The
weighted sample set {(fnt , γnt)}|sp(C)|·|sp(M)|·N is then
n=1
the approximation to p(zt|y1:t). For next time step
recursion, a new weighted sample set {(stn, πnt)}nN=1
containing N samples will be selected from the
approximated p(zt|y1:t). The recursive process is summarized
in Algorithm 1.
        </p>
        <p>
          Algorithm 1 Sampling in the forward pass
1: for t = 2 : T do
2: Select N samples from the previous
appxoximated distribution p(zt−1|y1:t−1) to form a
weighted sample set {(stn−1, πnt−1)}nN=1
3: Propagate these selected samples to the next
time slice t by a transition dynamics Act
4: Update the propagated samples by the
observation Y t.
5: Then the updated samples form a new
weighteadn asapmprpolxei mseatti{o(nfnot,fγpt()z}t|n|sy=p1(1C:t))|·|sp(M)|·N , which is
n
6: end for
In the backward pass, with a similar sampling
process as the forward pass, samples are firstly
selected from the approximated distribution p(zt+1|y1:T ).
p(zt+1|y1:T ) is approximated by a weighted sample
set denoted as {(btn+1, ρtn+1)}n=1. Next, the
selected samples are then back-propagated to the previous
time slice t (which is the next step of the backward
pass) with the reverse transition dynamics of Atc. The
back-propagated samples is later updated by the
observation Y t−1 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Similar to the forward pass, the
approximation to p(zt|y1:T ). p(zt|y1:T ) is a weighted
sample set {(gnt, τnt )}|sp(C)|·|sp(M)|·N . For the recursive
n=1
calculation of next time slice, a new weighted sample
set {(btn, ρtn)}nN=1 containing N samples will be
selected from this approximated distribution. The required
term p(zT |cT , y1:T ) at the beginning of the backward
pass is also the last time step result from the forward
pass, which indicates that {(gnT , τnT )}|sp(C)|·|sp(M)|·N is
n=1
the same sample set as {(fnT , γnT )}|ns=p(1C)|·|sp(M)|·N .
Finally the approximate inference for dLCM is
completely established with sampling technique based scheme.
The number of samples selected from the
approximated distribution at each step is fixed which is
dependent on the time consumption requirement and
estimation quality requirement of corresponding application.
There is a balance need to be addressed according to
the practical application requirement. Generally, the
more samples we select, the more time it costs while
the estimation quality is better.
        </p>
        <p>In the discussion of this section, the approximate
inference of dLCM based on sampling is established by
mimicking the particle filtering process both in the
forward and backward pass. To investigate the
effectiveness and efficiency of sampling based dLCM, a
comparison experiments test will be conducted in the next
section.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experiment Results</title>
      <p>In this section, the comparison experiments on
simulation data and oil drilling data are conducted and their
results are discussed.
5.1</p>
      <sec id="sec-6-1">
        <title>Experiments on simulation data</title>
        <p>A set of simulation data is firstly generated from
dLCM, and we investigate the classification accuracy and
time-consumption between Gaussian collapse scheme
and sampling scheme.
5.1.1</p>
      </sec>
      <sec id="sec-6-2">
        <title>Experiments settings on simulation data generation</title>
        <p>The simulation data-set are generated from dLCM
with parameters that is chosen by a “semi random”
process. The model parameters of dLCM have two
parts: model structure and model parameters with
fixed model structure. For each time slice, the model
structure is decided by four factors: the size of class
variable C (activity state), the dimension of
latent variable space Z, the dimension of attribute space
Y and the size of mixture components M . These
values are fixed as described in Table 1. After choosing
the model structure, its associated model parameters
were randomly generated. The above process of
choosing model structure and model parameters together is
the ”semi random” process. We then generate a data
set with this model.</p>
        <p>For convenience we call the model structure and its
parameters as model parameters in the remaining of
the paper. The generated data set and the model
parameters are used as true model for the classification
test purpose next. A comparison experiment between
data |sp (C)| |sp (M )| |sp (Z)| |sp (Y )|
set1 2 1 12 6
set2 2 2 18 9</p>
        <p>Gaussian collapse and sampling is then conducted.
5.1.2</p>
      </sec>
      <sec id="sec-6-3">
        <title>Results and discussion</title>
        <p>The comparison experiment is conducted with both
Gaussian collapse based and sampling based dLCM.
Among sampling based scheme, there are three
chosen sample sizes 40, 200, 1000 respectively. The
classification results on simulation set1 and set2 are
summarized in Table 2. The classification accuracy
results scheme are recorded with the average results of
ten runs of each scheme, and it shows that
Gaussian collapse based dLCM performs better than three
sampling based dLM in both set1 and set2. Among
sampling based scheme, scheme with larger sample
size achieves better classification accuracy in a general
sense.</p>
        <p>scheme (samples) set1/accuracy set2/accuracy
Gaussian collapse 99.60% 99.90%
Sampling (40) 96.75% 95.55%
Sampling (200) 97.60% 97.95%
Sampling (1000) 97.75% 98.40%
After investigating the effectiveness of each scheme,
we continue to discuss the efficiency. The efficiency of
each scheme is evaluated by the average time-cost of
ten run that is required to accomplish the classification
task, and the time-cost detail is shown in Table 3. In
set1, the classification task is accomplished 0.47 second
with Gaussian collapse, whereas the sampling scheme
with 40 samples cost 2.01 second. The larger sample
size in sampling scheme, the more time it costs to
accomplish the classification task. Meanwhile it is clear
that Gaussian collapse requires much less time to
accomplish the classification task.</p>
        <p>Compared to sampling based scheme, Gaussian
collapse based scheme achieves comparable (slightly
better) classification results with much less time on
simulation data test.
5.2</p>
      </sec>
      <sec id="sec-6-4">
        <title>Experiments on oil drilling data</title>
        <p>Next the same comparison experiment is conducted
with the oil drilling data from North sea.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Experiment settings on oil drilling data</title>
        <p>As we mentioned in the introduction section, we will
tie the development to the task of activity recognition
in this paper. In total, there are 5 drilling
acclivities in the dataset used for classification task. These
activities are “drilling”, “connection”, “tripping in”,
“tripping out” and “other”. The original oil drilling
data contains more than 60 variables. Advised by oil
drilling domain expert, 9 variables for the classification
task here. There are two chosen data set, which
contains 80000 and 50000 time slices with all 5 activities
presented respectively.</p>
        <p>
          For classification purpose in this paper, we combine
these 5 activities into 2 activities and conduct the
classification test on the combined data set. Three
activities including “drilling”, “connection” and
“other” activities are combined as one activity, and we do
the similar combination for “tripping in” and
“tripping out” activities. The reason behind is that these
combined actives are physically close and may have
quite similar dynamics. This combination also
simplify our experiments with the oil drilling data, while
maintaining the comparison experiment purpose.
Before we can compare the inference of each scheme
on the oil drilling data set, we learn a dLCM with the
learning method proposed in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] with the oil drilling
data set containing 80000 time slices. The model
structure is chosen by experience, with 2 mixture
component and 16 latent variables. After learning its
parameters with the chosen model structure, the dLCM
for further classification experiment is then finalized.
With the learnt dLM, the classification experiment will
be conducted on another oil drilling data set
containing 50000 time slices.
5.2.2
        </p>
      </sec>
      <sec id="sec-6-6">
        <title>Results and discussion</title>
        <p>With the fixed dLCM, the average (by ten runs)
classification accuracy and average time-cost for each
scheme are obtained. There are 4 scheme are
presented, Gaussian collapsed based scheme and sampling
techniques based scheme with 40, 200, 1000 samples
respectively. The experiments results are summarized in
Table 4.</p>
        <p>Among the sampling techniques based scheme, more
samples achieves higher classification accuracy.
However, with more samples in sampling techniques based
scheme, the computation cot for the classification task
is much more expensive. It is clearly shown in the table
that sampling with 1000 samples requires more than
one hour to accomplish the classification task which is
around 40 times than that of Gaussian collapse, and
they achieve a similar classification accuracy. In
general Gaussian collapse still achieves comparable
results (slightly better than Sampling), while keeping the
computation cost in a rather low standard compared
to sampling based scheme.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In the approximate inference of the dLCM, the
Gaussian collapse is originally adopted as the core of the
approximation method. In this paper, alternatively
sampling technique is proposed to do the
approximation. A process similar to particle filtering, utilizing
sampling as the basis, is then incorporated into the
approximate inference of the dLCM. We then conduct
the comparison experiment results on both simulated
data and real oil drilling data. The experimental
results from both sets show that the approximate scheme
based on Gaussian collapse is computationally more
efficient than sampling, while offering comparable
accuracy results.</p>
      <sec id="sec-7-1">
        <title>Acknowledgements</title>
        <p>I would like to thank Helge Langseth who helped a
lot during the whole process of organizing and writing
this paper.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sanjeev</given-names>
            <surname>Arulampalam</surname>
          </string-name>
          , Simon Maskell, Neil Gordon, and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Clapp</surname>
          </string-name>
          .
          <article-title>A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          ,
          <volume>50</volume>
          :
          <fpage>174</fpage>
          -
          <lpage>188</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Yaakov</given-names>
            <surname>Bar-Shalom</surname>
          </string-name>
          and
          <article-title>Xiao-Rong Li. Estimation and Tracking: Principles, Techniques and software</article-title>
          . Artech House Publishers,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>David</given-names>
            <surname>Barber</surname>
          </string-name>
          .
          <article-title>Expectation correction for smoothed inference in switching linear dynamical systems</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>7</volume>
          :
          <fpage>2515</fpage>
          -
          <lpage>2540</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Boyen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daphne</given-names>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Approximate learning of dynamic models</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          <volume>12</volume>
          , pages
          <fpage>396</fpage>
          -
          <lpage>402</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Stuart</given-names>
            <surname>Geman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Donald</given-names>
            <surname>Geman</surname>
          </string-name>
          .
          <article-title>Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>6</volume>
          :
          <fpage>721</fpage>
          -
          <lpage>741</lpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Zoubin</given-names>
            <surname>Ghahramani</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geoffrey E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Variational learning for switching state-space models</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>12</volume>
          :
          <fpage>963</fpage>
          -
          <lpage>996</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Helge</given-names>
            <surname>Langseth</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas D.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          .
          <article-title>Latent classification models</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>59</volume>
          (
          <issue>3</issue>
          ):
          <fpage>237</fpage>
          -
          <lpage>265</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Uri</given-names>
            <surname>Lerner</surname>
          </string-name>
          .
          <article-title>Hybrid Bayesian networks for reasoning about complex systems</article-title>
          .
          <source>PhD thesis</source>
          , Dept. of Comp. Sci. Stanford University, Stanford,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.E.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tung</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Striebel</surname>
          </string-name>
          .
          <article-title>Maximum likelihood estimates of linear dynamic systems</article-title>
          .
          <source>AIAA Journal</source>
          ,
          <volume>3</volume>
          :
          <fpage>1445</fpage>
          -
          <lpage>1450</lpage>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Padhraic</given-names>
            <surname>Smyth</surname>
          </string-name>
          .
          <article-title>Hidden Markov models for fault detection in dynamic system</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Greg</given-names>
            <surname>Welch</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gary</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <article-title>An introduction to the kalman filter</article-title>
          .
          <source>Technical report</source>
          , University of North Carolina at Chapel Hill,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Shengtong</surname>
            <given-names>Zhong</given-names>
          </string-name>
          , Helge Langseth, and
          <string-name>
            <given-names>Thomas D.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          .
          <article-title>Bayesian networks for dynamic classification</article-title>
          . Working Paper, http://idi.ntnu.no/~shket/dLCM.pdf,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>