On Approximate Inference of Dynamic Latent Classification Models
                  for Oil Drilling Monitoring


                                             Shengtong Zhong
                               Department of Computer and Information Science
                                Norwegian University of Science and Technology
                                          7491 Trondheim, Norway
                                              shket@idi.ntnu.no

                     Abstract                             real time by experienced engineers, who have a num-
                                                          ber of tasks to perform ranging from understanding
                                                          the situation on the platform (activity recognition) via
    We have been working with dynamic data
    from an oil production facility in the North          avoiding a number of either dangerous or costly situa-
    sea, where unstable situations should be i-           tions (event detection), to optimization of the drilling
                                                          operation. The variables that are collected cover both
    dentified as soon as possible. Monitoring in
    such a complex domain is a challenging task.          topside measurements (like flow rates) and down-hole
                                                          measurements (like, for instance, gamma rate). For
    Not only is such a domain typically volatile
                                                          the discussions to be concrete, we will tie the develop-
    and following non-linear dynamics, but sen-
    sor input to the monitoring system can al-            ment to the task of activity recognition in this paper.
                                                          The drilling of a well is a complex process, which con-
    so often be high dimensional, making it dif-
    ficult to model and classify the domain’s s-          sists of activities that are performed iteratively as the
    tates. Dynamic latent classification model-           length of the well increases, and knowing which ac-
                                                          tivity is performed at any time is important for the
    s are dynamic Bayesian networks capable of
    effective and efficient modeling and classifi-        further event detection.
    cation. An approximate inference algorithm            Motivated by this problem setting, a generative mod-
    utilizing Gaussian collapse has been tailor-          el called dynamic latent classification models (dLCM)
    made for this family of models, but the ap-           [12] for dynamic classification in continuous domains
    proximation’s properties have not been ful-           is proposed to help the drilling engineers by automati-
    ly explored. In this paper we compare al-             cally analyzing the data stream and classify the situa-
    ternatives approximate inference methods for          tion accordingly. Dynamic latent classification models
    the dynamic latent classification model, in           are Bayesian networks which could model the complex
    particular focusing on traditional sampling           system process and identify its system state.
    techniques. We show that the approximate
                                                          A dynamic latent classification model can be seen as
    scheme based on Gaussian collapse is compu-
    tationally more efficient than sampling, while        combining a naı̈ve Bayes model with a mixture of fac-
                                                          tor analyzers at each time point. The latent variables
    offering comparable accuracy results.
                                                          of the factor analyzers are used to capture the state-
                                                          specific dynamics of the process as well as modeling
                                                          dependencies between attributes. As exact inference
1   Introduction                                          for the model is intractable, an approximate inference
                                                          scheme based on Gaussian collapse is proposed in our
In the oil drilling, monitor the complex process and
                                                          previous study [12]. Although the previous experi-
identify the current system state is actually very dif-
                                                          ments demonstrated that the proposed approximate
ficult. Monitoring the complex process often involves
                                                          inference is functioned well the learning of dynamic
keeping an eye on hundreds or thousands of sensors to
                                                          latent classification models as well as the classification
determine whether or not the process is stable. We re-
                                                          work, we further investigate the approximation’s prop-
port results on an oil production facility in the North
                                                          erties by introducing alternative sampling techniques.
sea, where unstable situations should be identified as
soon as possible [12]. The oil drilling data that we      The remaining of the paper is organized as follows. In
are considering, consisting of some sixty variables, is   Section 2, we introduce the detail of dLCM. The im-
captured every five seconds. The data is monitored in     portance of Gaussian collapse in the inference of dLCM
is discussed in Section 3. Next, alternative sampling          able and attributes at different time points are inde-
techniques are proposed for dLCM in Section 4. After           pendent given the class variables at the current time,
the experiment results are illustrated and discussed in        which is violated in many real world setting. In our
Section 5, the conclusion of the paper is presented in         oil drilling data, there is also a strong correlation be-
Section 6.                                                     tween attributes given the class [12]. Modeling the
                                                               dependence between attributes is then the next step
2     Dynamic Latent Classification                            in creating the dLM.
      Models                                                   Following [7], we introduce latent variables to en-
                                                               code conditional dependence among the attributes.
Dynamic Latent classification models [12] are dynamic          Specifically, for each time step t we have the vector
Bayesian networks, which can model the complex sys-            Z t = (Z1t , . . . , Zkt ) of latent variables that appear as
tem process and identify its system state. The com-            children of the class variable and parents of all the at-
plex system process is highly dynamical and complex,           tributes (see Figure 2). It can be seen as combining
which makes it difficult to model and idetentify with          the NB model with a factor analysis model at each
the static models and standard dynamic model. The              time step.
framework of dLCM is specified incrementally by ex-
amining its expressivity relative to the oil drilling data.                   C t−1                         Ct
The dLCM is established from naı̈ve Bayes model (N-
B), which is one of the simplest static models. In the
first step, temporal dynamics of the class variables (a
first order Markov chain) is added as considerable cor-
                                                                      t−1
                                                                     Z1                t−1
                                                                                      Z2            Z1t            Z2t
relation between the class variable of consecutive time
slices are evidenced from the oil drilling data [12]. This
results in a dynamic version of naı̈ve Bayes, which is al-
so equivalent to a standard first order hidden Markov            Y1t−1   Y2t−1    Y3t−1   Y4t−1   Y1t     Y2t    Y3t     Y4t
model (HMM) shown in Figure 1, where C t denotes
the class variable at time t and Yit denotes the i-th at-      Figure 2: An example model by adding latent vari-
tribute at time t. This model type has a long history          ables to dynamic version of naı̈ve Bayes using 2TDB-
of usage in monitoring, see e.g. [10].                         N, where the dimension of latent space is 2 and the
                                                               dimension of attribute space is 4. In each time step,
                                                               the conditional dependencies between the attributes
                C t−1                       Ct                 are encoded by the latent variables (Z1t , . . . , Zkt ).

                                                               The latent variable Z t is assigned a multivariate Gaus-
                                                               sian distribution conditional on the class variable and
    Y1t−1   Y2t−1   Y3t−1   Y4t−1   Y1t   Y2t    Y3t   Y4t     the attributes Y t are assumed to be linear multivari-
                                                               ate Gaussian distributions conditional on the latent
Figure 1: An example of dynamic version of naı̈ve              variables:
Bayes with 4 attributes using 2-time slice dynam-
ic Bayesian networks representation (2TDBN). At-
tributes are assumed to be conditionally independent                          Z t |{C t = ct } ∼ N (µct , Σct ),
given the class variable.                                                   Y t |{Z t = z t } ∼ N (Lz t + Φ, Θ),

This model is described by a prior distribution over           where Σct and Θ are diagonal covariance matrix and L
the class variable P (c0 ), a conditional observation dis-     is the transition matrix, Φ is the offset from the latent
tribution P (y t |ct ), and transition probabilities for the   space to attribute space; note that the stationarity
class variable P (ct |ct−1 ); we assume that the mod-          assumption encoded in the model.
el is stationary, i.e., P (y t |ct ) = P (y t−1 |ct−1 ) and
                                                               In this model, the latent variables capture the depen-
P (ct |ct−1 ) = P (ct+1 |ct ), for all t. For the continuous
                                                               dence between the attributes. They are conditionally
observation vector, the conditional distribution may
                                                               independent given the class but marginally dependen-
be specified by a class-conditional multivariate Gaus-
                                                               t. Furthermore, the same mapping, L, from the latent
sian distribution with mean µct and covariance matrix
                                                               space to the attribute space is used for all classes, and
Σct , i.e., Y |{C t = ct } ∼ N (µct , Σct )
                                                               hence, the relation between the class and the attributes
In a standard HMM, it assumes that the class vari-             is conveyed by the latent variables only.
At this step, the temporal dynamics of the model is as-
sumed to be only captured at the class level. When the
state specification of the class variable is coarse, then                        M t |{C t = ct } ∼ P (M t |C t = ct ),
this assumption will rarely hold. This assumption does               Y t |{Z t = z t , M t = mt } ∼ N (Lmt z t + Φmt , Θmt ),
not hold in our oil drilling data, as the conditional cor-
relation of the attribute in successive time slices is evi-      where 1 ≤ mt ≤ |sp (M )| (|sp (M )| denotes the dimen-
dent [12]. we address this by modeling the dynamics of           sion of variable M space), P (M t = mt |C t = ct ) ≥ 0
the system at the level of the latent variables. The s-               P|sp(M)|       t     t  t      t
                                                                 and     mt =1 P (M = m |C = c ) = 1 for all 1 ≤
tate specific dynamics is encoded by assuming that the            t
                                                                 c ≤ |sp (C)|, Φmt is the offset from the latent space
latent variable at the current time slice follows a lin-         to attribute space.
ear Gaussian distribution conditioned on previous time
slice. Specifically, we encode the state specific dynam-         The final model is then called dynamic latent classi-
ics by assuming that the multivariate latent variable            fication model which is shown in Figure 4. The dy-
Z t follows a linear Gaussian distribution conditioned           namic latent classification model is shown to be effec-
on Z t−1 , and the transition dynamics between latent            tive and efficient through the experiment with our oil
variable is denoted by a diagonal matrix Act :                   drilling data, and the significant improvement is al-
                                                                 so demonstrated when comparing dLCM with static
                                                                 models (such as NB or decision tree) and HMM [12].

 Z t |{Z t−1 = z t−1 , C t = ct } ∼ N (Act z t−1 , Σct )
                                                                                   C t−1                              Ct

A graphical representation of the model is given in
Figure 3.                                                                t−1
                                                                        Z1             M t−1
                                                                                                  t−1
                                                                                                 Z2           Z1t     Mt         Z2t

              C t−1                          Ct
                                                                     Y1t−1     Y2t−1     Y3t−1   Y4t−1    Y1t       Y2t    Y3t         Y4t
      t−1
     Z1                t−1
                      Z2             Z1t            Z2t          Figure 4: An example of dynamic latent classification
                                                                 model using 2TDBN, where the dimension of latent
                                                                 space is 2 and the dimension of attribute space is 4.

  Y1t−1   Y2t−1   Y3t−1   Y4t−1   Y1t      Y2t    Y3t      Y4t
                                                                 3     Approximate inference in dLCM
Figure 3: An example model by incrementally adding
                                                                 The exact inference for dLCM is intractable. To make
dynamics on latent variables using 2TDBN, where the
                                                                 dLCM applicable and effective in practice, approxi-
dimension of latent space is 2 and the dimension of
                                                                 mate inference is then proposed.
attribute space is 4. The state specific dynamics are
encoded at the level of the latent variables.
                                                                 3.1     Intractability of exact inference in dLCM

A discrete mixture variable M is further introduced to           Seen from the dLCM in Figure 4, an equivalent prob-
the model at each time slice for the purpose of reduc-           abilistic model is
ing the computational cost while maintaining the rep-
resentational power [12]. Similar situation is done by            p(y 1:T , z 1:T , m1:T , c1:T ) =
[7] for static domains, and in the dynamic domains can                   p(y 1 |z 1 , m1 )p(z 1 |c1 )p(m1 |c1 )p(c1 ) ·
be seen from [3, 6] where a probabilistic model called                   Y
                                                                         T
switching state-space model is proposed that combin-                           p(y t |z t , mt )p(z t |z t−1 , ct )p(mt |ct )p(ct |ct−1 ).
ing discrete and continuous dynamics. In this case,                      t=2
the mixture variable follows a multinomial distribution
conditioned on the class variable. and the attributes            In dLCM, exact filtered and smoothed inference is
Y t follow a multivariate Gaussian distribution condi-           shown to be intractable (scaling exponentially with
tioned on the latent variables and the discrete mixture          T [8]) as neither the class variables nor the mixture
variable,                                                        variables are observed: At time step 1, p(z 1 |y 1 ) is
a mixture of |sp (C)| · |sp (M )| Gaussian. At time-                                number of mixture components of p(z t−1 |y t−1 ) is in-
step 2, due to the summation over the classes and                                   creasing exponentially over time as we discussed ear-
mixture variables, p(z 2 |y 1:2 ) will be a mixture of                              lier, so is the case for p(z t−1 |mt−1 , ct−1 , y 1:t−1 ). In
         2         2
|sp (C)| ·|sp (M )| Gaussian; at time-step 3 it will be a                           our Gaussian collapse implementation [12], the ter-
                     3         3
mixture of |sp (C)| · |sp (M )| Gaussian and so on un-                              m p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) is collapsed into a single
til the generation of a mixture of |sp (C)|T · |sp (M )|T                           Gaussian, parameterized with mean νmt−1 ,ct−1 and co-
Gaussian at time-point T . To control this explosion                                variance Γmt−1 ,ct−1 , and then propagate this collapsed
in computational complexity, approximate inference                                  Gaussian for next time slice. With this approximation,
techniques are adopted to the inference of dLCM.                                    the recursive computation of the forward pass becomes
                                                                                    tractable.
3.2       Approximate inference: Forward pass
                                                                                    3.3    Approximate inference: Backward pass
The structure of the proposed dLCM is similar to
                                                                                    Similar to the forward pass, the backward pass al-
the linear dynamical system (LDS) [2], the stan-
                                                                                    so relies on a recursion computation of the smoothed
dard Rauch-Tung-Striebel (RTS) smoother [9] and
                                                                                    posterior p(z t , mt , ct |y 1:T ). In detail, p(z t , mt , ct |y 1:T )
the expectation correction smoother [3] for LDS pro-
                                                                                    is computed from its smoothed result of the previous
vide the basis for the approximate inference of dL-
                                                                                    step p(z t+1 , mt+1 , ct+1 |y 1:T ), together with some oth-
CM. As for the RTS, the filtered estimate of dLCM
                                                                                    er quantities obtained from forward pass. The first s-
p(z t , mt , ct |y 1:t ) is obtained by a forward recursion,
                                                                                    moothed posterior is p(z T , mT , cT |y 1:T ), which can be
and then following a backward recursion to calculate
                                                                                    directly obtained as it is also the last filtered posterior
the smoothed estimate p(z t , mt , ct |y 1:T ). The infer-
                                                                                    from the forward pass. To compute p(z t , mt , ct |y 1:T ),
ence of dLCM is then achieved by a single forward
                                                                                    factorize it as
recursion and a single backward recursion iteratively.
Gaussian collapse is incorporated into both the for-                                   p(z t , mt , ct |y 1:T )
ward recursion and the backward recursion to form                                                   X
the approximate inference. The Gaussian collapse in                                           =               p(z t , mt , ct , mt+1 , ct+1 |y 1:T )
                                                                                                 mt+1 ,ct+1
the forward recursion is equivalent to assumed density                                              X
filtering [4], and the Gaussian collapse in the backward                                     =                  p(z t |mt , ct , mt+1 , ct+1 , y 1:T ) ·
recursion mirrors the smoothed posterior collapse from                                           mt+1 ,ct+1
[3].                                                                                         p(mt , ct |mt+1 , ct+1 , y 1:T )p(mt+1 , ct+1 |y 1:T ).
During the forward recursion of dLCM, the filtered
posterior p(z t , mt , ct |y 1:t ) is computed with a recursive                     Due               to             the           fact       that
form. By a simple decomposition,                                                    z t ⊥⊥{y t+1:T , mt+1 , ct+1 }|{z t+1 , mt , ct },   the term
                                                                                    p(z t |mt , ct , mt+1 , ct+1 , y 1:T ) can be found from

  p(z t , mt , ct |y 1:t ) = p(z t , mt , ct , y t |y 1:t−1 )/p(y t |y 1:t−1 )
                                                                                            p(z t |mt , ct , mt+1 , ct+1 , y 1:T )
                        ∝ p(z t , mt , ct , y t |y 1:t−1 ).                                          Z
                                                                                                   =         p(z t |z t+1 , mt , ct , y 1:t ) ·
                                                                                                        z t+1
Dropping the normalization constant p(y t |y 1:t−1 ),
                                                                                                  p(z   t+1
                                                                                                            |mt , ct , mt+1 , ct+1 , y 1:T )dz t+1 .
p(z t , mt , ct |y 1:t ) is proportional to the new joint prob-
ability p(z t , mt , ct , y t |y 1:t−1 ), where
                                                                                    To complete the backward recursive form, two es-
      t     t   t
 p(z , m , c , y |y t   1:t−1              t     t
                                ) = p(y , z |m , c , y t   t     1:t−1
                                                                         )·         sential assumptions are further made in the back-
                                   t   t       1:t−1       t     1:t−1
                                                                              (1)   ward pass that makes the approximate inference
                              p(m |c , y               )p(c |y           ).
                                                                                    applicable and effective.            The first assumption
                                                                                    is to approximate p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) by
To build the forward recursion, a recursive form for                                p(z t+1 |mt+1 , ct+1 , y 1:T ) [3]. This is due to that al-
each of the factors in Equation 1 is required. Giv-                                 though {mt , ct } 6⊥⊥z t+1 |y 1:T , the influence of {mt , ct }
en the filtered results of the previous time-step, the                              on z t+1 through z t is ’weak’ as z t will be mostly influ-
recursive form for each of the factors are shown to                                 enced by y 1:t . The benefit of this simple assumption
be feasible [12]. On the way to devise the recursive                                lies in that p(z t+1 |mt+1 , ct+1 , y 1:T ) can be directly ob-
form, one term p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) is required,                        tained from the previous backward recursion. Mean-
which can be directly obtained since it is the filtered                             while p(z t+1 |mt+1 , ct+1 , y 1:T ) is a Gaussian mixture
probability from the previous time step. However, the                               whose components increase exponentially in T − t.
The second assumption is also a Gaussian collapse pro-               characteristics to the original population is relatively
cess. p(z t+1 |mt+1 , ct+1 , y 1:T ) is collapsed into a single      worse. On the other hand, if more samples are select-
Gaussian and then pass this collapsed Gaussian for the               ed from within the same population, which of course is
next step. This will guarantee that the back propa-                  much more time consuming, the characteristics of the
gated term p(z t |mt , ct , y 1:T ) will be Gaussian mixture         original population is better estimated. The efficien-
with fixed |sp (C)| · |sp (M )| components at next time              cy (time consuming) and effectiveness (characteristics
step. With this Gaussian collapse process at each time               estimation) are the essential concerns in the sampling
slice, a tractable recursion in backward pass is estab-              techniques. In general, more samples should be select-
lished.                                                              ed within the tolerable time, and the better estimation
                                                                     of characteristics of the population can be expected.
3.4    The importance of approximate inference                       This feature of traditional sampling techniques makes
                                                                     it attractive to the approximate inference of dLCM,
The exact inference is not applicable in practise as                 a balance between efficiency and effectiveness is ex-
its computation cost is increasing exponentially over                pected to be achieved according to application require-
time. The approximate inference is then essential to                 ment. Meanwhile sampling can approximate any dis-
dLCM. Gaussian collapse is adopted during building                   tribution as long as the sample number is sufficient.
the recursive form for both forward and backward pass.               Sampling is expected to replace the Gaussian collapse
At the same time, p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) is al-    for the approximation in the both forward and back-
so approximated by p(z t+1 |mt+1 , ct+1 ) in dLCM. As                ward pass. We introduce particle filtering next, which
the approximations are made within the inference, the                will further motivate our discussion on the utilizations
quality of the overall learning and inference for dL-                detail of the sampling in the inference of dLCM.
CM is rather sensitive to these approximations. Our
experimental results [12] showed that the overall per-               4.2    Particle Filtering
formance of dLCM is satisfactory, which indicate that
the chosen approximations are reasonable.                            Particle filtering (PF) [1] is a technique for implement-
Even though the proposed approximate inference in                    ing a recursive Bayesian filter by Monte Carlo simu-
dLCM is satisfactory, is there any improvement space                 lation, which is an efficient statistical method to esti-
with alternative approximation methods? With this                    mate the system state. The Monte Carlo simulation
question in mind, we decide to investigate the ap-                   relies on repeated random sampling techniques.
proximation’s properties by incorporating new approx-                In particle filtering, let a weighted particle set
imation method. The traditional sampling techniques                  {(stn , πnt )}N
                                                                                   n=1 at each time t denotes an approximation
(e.g., [5]) are commonly used in a similar approxima-                of required posterior probability of the system state.
tion situation. Next section we will briefly introduce               Each of N particles has the state stn andPits weight
sampling technique, and then we will explain how it is               πnt , the weights are normalized such that n πnt = 1.
integrated in dLCM. Meanwhile the approximation of                   The particle filtering has three operation stages: sam-
p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) by p(z t+1 |mt+1 , ct+1 )   pling (selection), prediction and observation. In the
is kept unchanged.                                                   sampling stage, N particles are chosen from the prior
                                                                                                               (t−1)
In general, we will replace Gaussian collapse by sam-                probability according to the set {(sn           , πnt−1 )}N
                                                                                                                               n=1 .
pling in the approximate inference of dLCM, and fur-                 Then predict the state of the chosen particles by the
ther investigate the effectiveness and efficiency of this            dynamic model p(st |st−1 ). In observation stage, the
proposal through a comparison experiment between                     predicted particles are weighted according to observa-
original Gaussian collapse based dLCM and sampling                   tion model p(y t |st ) . After obtaining the weights of
techniques based dLCM.                                               particles, the state at time t can be estimated based
                                                                     on the weighted particle set.

4     Sampling                                                       4.3    Sampling in the dCLM

4.1    Background                                                    The sampling process that we required for the infer-
                                                                     ence of dCLM is similar to the PF. In the forward pass,
The sampling is to select a subset of samples from                   we know that the mixture components of p(z t−1 |y t−1 )
within a population to estimate the characteristics of               is increasing exponentially over time in the exac-
the original population. There is a commonly known                   t inference. Instead of a recursive approximation
tradeoff in sampling. When less samples are select-                  on p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) in the Gaussian collapse
ed from within a population, which means the sam-                    scheme, an recursive approximation on p(z t−1 |y 1:t−1 )
pling process takes shorter time, the estimation of the              by sampling is adopted. With the obtained approxi-
mated distribution p(z t−1 |y 1:t−1 ) at time slice t−1, N          nally the approximate inference for dLCM is complete-
weighted samples {(st−1        t−1 N
                         n , πn )}n=1 are selected from             ly established with sampling technique based scheme.
this approximated distribution. These selected sam-
                                                                    The number of samples selected from the approximat-
ples are propagated to the next time slice t with a lin-
                                                                    ed distribution at each step is fixed which is dependen-
ear transition dynamics Act . As the discrete class vari-
                                                                    t on the time consumption requirement and estima-
able C t has the size of |sp (C)|, then each of the select-
                                                                    tion quality requirement of corresponding application.
ed samples will become |sp (C)| new samples. These
                                                                    There is a balance need to be addressed according to
|sp (C)|·N propagated samples are further updated by
                                                                    the practical application requirement. Generally, the
the observation y t . The updating rule is the same as
                                                                    more samples we select, the more time it costs while
the Kalmar filter updating [11]. Due to the mixture
                                                                    the estimation quality is better.
component has size of |sp (M )|, each of these propa-
gated samples will become |sp (M )| new samples again.              In the discussion of this section, the approximate in-
In general, the N selected samples from time slice t− 1             ference of dLCM based on sampling is established by
will become |sp (C)| · |sp (M )| · N samples at time s-             mimicking the particle filtering process both in the for-
lice t and its weight are updated accordingly. The                  ward and backward pass. To investigate the effective-
                                   |sp(C)|·|sp(M)|·N
weighted sample set {(fnt , γnt )}n=1                is then        ness and efficiency of sampling based dLCM, a com-
                             t 1:t                                  parison experiments test will be conducted in the next
the approximation to p(z |y ). For next time step re-
cursion, a new weighted sample set {(stn , πnt )}N  n=1 con-
                                                                    section.
taining N samples will be selected from the approxi-
mated p(z t |y 1:t ). The recursive process is summarized           5     Experiment Results
in Algorithm 1.
                                                                    In this section, the comparison experiments on simula-
Algorithm 1 Sampling in the forward pass
                                                                    tion data and oil drilling data are conducted and their
 1: for t = 2 : T do
                                                                    results are discussed.
 2:     Select N samples from the previous appx-
    oximated distribution p(z t−1 |y 1:t−1 ) to form a
    weighted sample set {(st−1       t−1 N                          5.1     Experiments on simulation data
                               n , πn )}n=1
 3:     Propagate these selected samples to the next                A set of simulation data is firstly generated from dL-
    time slice t by a transition dynamics Act                       CM, and we investigate the classification accuracy and
 4:     Update the propagated samples by the obser-                 time-consumption between Gaussian collapse scheme
    vation Y t .                                                    and sampling scheme.
 5:     Then the updated samples form a new weight-
                                |sp(C)|·|sp(M)|·N
    ed sample set {(fnt , γnt )}n=1               , which is        5.1.1    Experiments settings on simulation
    an approximation of p(z t |y 1:t )                                       data generation
 6: end for
                                                                    The simulation data-set are generated from dLCM
In the backward pass, with a similar sampling pro-                  with parameters that is chosen by a “semi random”
cess as the forward pass, samples are firstly select-               process. The model parameters of dLCM have two
ed from the approximated distribution p(z t+1 |y 1:T ).             parts: model structure and model parameters with
p(z t+1 |y 1:T ) is approximated by a weighted sample               fixed model structure. For each time slice, the model
set denoted as {(bt+1         t+1
                        n , ρn )}n=1 . Next, the select-            structure is decided by four factors: the size of class
ed samples are then back-propagated to the previous                 variable C (activity state), the dimension of laten-
time slice t (which is the next step of the backward                t variable space Z, the dimension of attribute space
pass) with the reverse transition dynamics of Atc . The             Y and the size of mixture components M . These val-
back-propagated samples is later updated by the ob-                 ues are fixed as described in Table 1. After choosing
servation Y t−1 [11]. Similar to the forward pass, the              the model structure, its associated model parameters
approximation to p(z t |y 1:T ). p(z t |y 1:T ) is a weighted       were randomly generated. The above process of choos-
                         |sp(C)|·|sp(M)|·N                          ing model structure and model parameters together is
sample set {(gnt , τnt )}n=1               . For the recursive
calculation of next time slice, a new weighted sample               the ”semi random” process. We then generate a data
set {(btn , ρtn )}N
                  n=1 containing N samples will be select-
                                                                    set with this model.
ed from this approximated distribution. The required                For convenience we call the model structure and its
term p(z T |cT , y 1:T ) at the beginning of the backward           parameters as model parameters in the remaining of
pass is also the last time step result from the forward             the paper. The generated data set and the model pa-
                                             |sp(C)|·|sp(M)|·N
pass, which indicates that {(gnT , τnT )}n=1                   is   rameters are used as true model for the classification
                                 T   T   |sp(C)|·|sp(M)|·N
the same sample set as {(fn , γn )}n=1                      . Fi-   test purpose next. A comparison experiment between
 data     |sp (C)|   |sp (M )|   |sp (Z)|    |sp (Y )|     T     scheme (samples)       set1/time-cost    set2/time-cost
 set1         2          1          12           6        500    Gaussian Collapse         0.47(s)           1.91(s)
 set2         2          2          18           9       1000    Sampling (40)             2.01 (s)           5.97(s)
                                                                 Sampling (200)             7.31(s)          17.56 (s)
Table 1: The simulation data set with its model struc-           Sampling (1000)           36.24 (s)         80.70 (s)
ture information.
                                                                Table 3: The average time-cost (second) on simulation
Gaussian collapse and sampling is then conducted.               data set with both Gaussian collapse based and sam-
                                                                pling based (varying the number of samples) dLCM.
5.1.2    Results and discussion

The comparison experiment is conducted with both                5.2.1   Experiment settings on oil drilling data
Gaussian collapse based and sampling based dLCM.                As we mentioned in the introduction section, we will
Among sampling based scheme, there are three cho-               tie the development to the task of activity recognition
sen sample sizes 40, 200, 1000 respectively. The classi-        in this paper. In total, there are 5 drilling acclivi-
fication results on simulation set1 and set2 are sum-           ties in the dataset used for classification task. These
marized in Table 2. The classification accuracy re-             activities are “drilling”, “connection”, “tripping in”,
sults scheme are recorded with the average results of           “tripping out” and “other”. The original oil drilling
ten runs of each scheme, and it shows that Gaus-                data contains more than 60 variables. Advised by oil
sian collapse based dLCM performs better than three             drilling domain expert, 9 variables for the classification
sampling based dLM in both set1 and set2. Among                 task here. There are two chosen data set, which con-
sampling based scheme, scheme with larger sample                tains 80000 and 50000 time slices with all 5 activities
size achieves better classification accuracy in a general       presented respectively.
sense.
                                                                For classification purpose in this paper, we combine
 scheme (samples)       set1/accuracy       set2/accuracy       these 5 activities into 2 activities and conduct the
 Gaussian collapse         99.60%              99.90%           classification test on the combined data set. Three
   Sampling (40)           96.75%              95.55%           activities including “drilling”, “connection” and “oth-
  Sampling (200)           97.60%              97.95%           er” activities are combined as one activity, and we do
  Sampling (1000)          97.75%              98.40%           the similar combination for “tripping in” and “trip-
                                                                ping out” activities. The reason behind is that these
Table 2: The average classification accuracy on simu-           combined actives are physically close and may have
lation data set with Gaussian collapse based and sam-           quite similar dynamics. This combination also sim-
pling based (varying the number of samples) dLCM.               plify our experiments with the oil drilling data, while
                                                                maintaining the comparison experiment purpose.
After investigating the effectiveness of each scheme,
we continue to discuss the efficiency. The efficiency of        Before we can compare the inference of each scheme
each scheme is evaluated by the average time-cost of            on the oil drilling data set, we learn a dLCM with the
ten run that is required to accomplish the classification       learning method proposed in [12] with the oil drilling
task, and the time-cost detail is shown in Table 3. In          data set containing 80000 time slices. The model
set1, the classification task is accomplished 0.47 second       structure is chosen by experience, with 2 mixture com-
with Gaussian collapse, whereas the sampling scheme             ponent and 16 latent variables. After learning its pa-
with 40 samples cost 2.01 second. The larger sample             rameters with the chosen model structure, the dLCM
size in sampling scheme, the more time it costs to ac-          for further classification experiment is then finalized.
complish the classification task. Meanwhile it is clear         With the learnt dLM, the classification experiment will
that Gaussian collapse requires much less time to ac-           be conducted on another oil drilling data set contain-
complish the classification task.                               ing 50000 time slices.

Compared to sampling based scheme, Gaussian col-                5.2.2   Results and discussion
lapse based scheme achieves comparable (slightly bet-
ter) classification results with much less time on sim-         With the fixed dLCM, the average (by ten runs)
ulation data test.                                              classification accuracy and average time-cost for each
                                                                scheme are obtained. There are 4 scheme are pre-
5.2     Experiments on oil drilling data                        sented, Gaussian collapsed based scheme and sampling
                                                                techniques based scheme with 40, 200, 1000 samples re-
Next the same comparison experiment is conducted                spectively. The experiments results are summarized in
with the oil drilling data from North sea.                      Table 4.
     scheme (samples)      accuracy     time-cost            [2] Yaakov Bar-Shalom and Xiao-Rong Li. Estima-
     Gaussian Collapse      82.9%      110.15(s)                 tion and Tracking: Principles, Techniques and
     Sampling (40)         69.87% )     335.93(s)                software. Artech House Publishers, 1993.
     Sampling (200)         76.56%     1130.85 (s)
     Sampling (1000)        79.07%     4469.33 (s)           [3] David Barber. Expectation correction for s-
                                                                 moothed inference in switching linear dynamical
Table 4: The average classification accuracy and time-           systems. Journal of Machine Learning Research,
cost (second) on oil drilling data set with both Gaus-           7:2515–2540, 2006.
sian collapse based and sampling based (varying the
                                                             [4] Xavier Boyen and Daphne Koller. Approximate
number of samples) dLCM.
                                                                 learning of dynamic models. In Advances in
                                                                 Neural Information Processing Systems 12, pages
Among the sampling techniques based scheme, more                 396–402, 1999.
samples achieves higher classification accuracy. How-        [5] Stuart Geman and Donald Geman. Stochastic
ever, with more samples in sampling techniques based             relaxation, Gibbs distribution and the Bayesian
scheme, the computation cot for the classification task          restoration of images. IEEE Transactions on Pat-
is much more expensive. It is clearly shown in the table         tern Analysis and Machine Intelligence, 6:721–
that sampling with 1000 samples requires more than               741, 1984.
one hour to accomplish the classification task which is
around 40 times than that of Gaussian collapse, and          [6] Zoubin Ghahramani and Geoffrey E. Hinton.
they achieve a similar classification accuracy. In gener-        Variational learning for switching state-space
al Gaussian collapse still achieves comparable result-           models. Neural Computation, 12:963–996, 1998.
s (slightly better than Sampling), while keeping the
computation cost in a rather low standard compared           [7] Helge Langseth and Thomas D. Nielsen. La-
to sampling based scheme.                                        tent classification models. Machine Learning,
                                                                 59(3):237–265, 2005.

6   Conclusion                                               [8] Uri Lerner. Hybrid Bayesian networks for reason-
                                                                 ing about complex systems. PhD thesis, Dept. of
In the approximate inference of the dLCM, the Gaus-              Comp. Sci. Stanford University, Stanford, 2002.
sian collapse is originally adopted as the core of the       [9] H.E. Rauch, F. Tung, and C. T. Striebel. Maxi-
approximation method. In this paper, alternatively               mum likelihood estimates of linear dynamic sys-
sampling technique is proposed to do the approxima-              tems. AIAA Journal, 3:1445–1450, 1965.
tion. A process similar to particle filtering, utilizing
sampling as the basis, is then incorporated into the        [10] Padhraic Smyth. Hidden Markov models for fault
approximate inference of the dLCM. We then conduct               detection in dynamic system. Pattern Recogni-
the comparison experiment results on both simulated              tion, 27(1):149–164, 1994.
data and real oil drilling data. The experimental re-
sults from both sets show that the approximate scheme       [11] Greg Welch and Gary Bishop. An introduction to
based on Gaussian collapse is computationally more               the kalman filter. Technical report, University of
efficient than sampling, while offering comparable ac-           North Carolina at Chapel Hill, 1995.
curacy results.                                             [12] Shengtong Zhong,      Helge Langseth,    and
                                                                 Thomas D. Nielsen.          Bayesian networks
Acknowledgements                                                 for dynamic classification.   Working Paper,
                                                                 http://idi.ntnu.no/~shket/dLCM.pdf, 2012.
I would like to thank Helge Langseth who helped a
lot during the whole process of organizing and writing
this paper.


References

 [1] Sanjeev Arulampalam, Simon Maskell, Neil Gor-
     don, and Tim Clapp. A tutorial on particle fil-
     ters for on-line non-linear/non-gaussian bayesian
     tracking. IEEE Transactions on Signal Process-
     ing, 50:174–188, 2001.