-

On Approximate Inference of Dynamic Latent Classification Models for Oil Drilling Monitoring

Shengtong Zhong

0 0 Department of Computer and Information Science Norwegian University of Science and Technology 7491 Trondheim , Norway

We have been working with dynamic data from an oil production facility in the North sea, where unstable situations should be identified as soon as possible. Monitoring in such a complex domain is a challenging task. Not only is such a domain typically volatile and following non-linear dynamics, but sensor input to the monitoring system can also often be high dimensional, making it difficult to model and classify the domain's states. Dynamic latent classification models are dynamic Bayesian networks capable of effective and efficient modeling and classification. An approximate inference algorithm utilizing Gaussian collapse has been tailormade for this family of models, but the approximation's properties have not been fully explored. In this paper we compare alternatives approximate inference methods for the dynamic latent classification model, in particular focusing on traditional sampling techniques. We show that the approximate scheme based on Gaussian collapse is computationally more efficient than sampling, while offering comparable accuracy results.

In the oil drilling, monitor the complex process and identify the current system state is actually very difficult. Monitoring the complex process often involves keeping an eye on hundreds or thousands of sensors to determine whether or not the process is stable. We report results on an oil production facility in the North sea, where unstable situations should be identified as soon as possible [ 12 ]. The oil drilling data that we are considering, consisting of some sixty variables, is captured every five seconds. The data is monitored in real time by experienced engineers, who have a number of tasks to perform ranging from understanding the situation on the platform (activity recognition) via avoiding a number of either dangerous or costly situations (event detection), to optimization of the drilling operation. The variables that are collected cover both topside measurements (like flow rates) and down-hole measurements (like, for instance, gamma rate). For the discussions to be concrete, we will tie the development to the task of activity recognition in this paper. The drilling of a well is a complex process, which consists of activities that are performed iteratively as the length of the well increases, and knowing which activity is performed at any time is important for the further event detection.

Motivated by this problem setting, a generative model called dynamic latent classification models (dLCM) [ 12 ] for dynamic classification in continuous domains is proposed to help the drilling engineers by automatically analyzing the data stream and classify the situation accordingly. Dynamic latent classification models are Bayesian networks which could model the complex system process and identify its system state. A dynamic latent classification model can be seen as combining a na¨ıve Bayes model with a mixture of factor analyzers at each time point. The latent variables of the factor analyzers are used to capture the statespecific dynamics of the process as well as modeling dependencies between attributes. As exact inference for the model is intractable, an approximate inference scheme based on Gaussian collapse is proposed in our previous study [ 12 ]. Although the previous experiments demonstrated that the proposed approximate inference is functioned well the learning of dynamic latent classification models as well as the classification work, we further investigate the approximation’s properties by introducing alternative sampling techniques. The remaining of the paper is organized as follows. In Section 2, we introduce the detail of dLCM. The importance of Gaussian collapse in the inference of dLCM is discussed in Section 3. Next, alternative sampling techniques are proposed for dLCM in Section 4. After the experiment results are illustrated and discussed in Section 5, the conclusion of the paper is presented in Section 6. 2

Dynamic Latent Classification Models

Dynamic Latent classification models [ 12 ] are dynamic Bayesian networks, which can model the complex system process and identify its system state. The complex system process is highly dynamical and complex, which makes it difficult to model and idetentify with the static models and standard dynamic model. The framework of dLCM is specified incrementally by examining its expressivity relative to the oil drilling data. The dLCM is established from na¨ıve Bayes model (NB), which is one of the simplest static models. In the first step, temporal dynamics of the class variables (a first order Markov chain) is added as considerable correlation between the class variable of consecutive time slices are evidenced from the oil drilling data [ 12 ]. This results in a dynamic version of na¨ıve Bayes, which is also equivalent to a standard first order hidden Markov model (HMM) shown in Figure 1, where Ct denotes the class variable at time t and Yit denotes the i-th attribute at time t. This model type has a long history of usage in monitoring, see e.g. [ 10 ].

Ct−1

C t

This model is described by a prior distribution over the class variable P (c0), a conditional observation distribution P (yt|ct), and transition probabilities for the class variable P (ct|ct−1); we assume that the model is stationary, i.e., P (yt|ct) = P (yt−1|ct−1) and P (ct|ct−1) = P (ct+1|ct), for all t. For the continuous observation vector, the conditional distribution may be specified by a class-conditional multivariate Gaussian distribution with mean μct and covariance matrix Σct , i.e., Y |{Ct = ct} ∼ N (μct , Σct ) In a standard HMM, it assumes that the class variable and attributes at different time points are independent given the class variables at the current time, which is violated in many real world setting. In our oil drilling data, there is also a strong correlation between attributes given the class [ 12 ]. Modeling the dependence between attributes is then the next step in creating the dLM.

Following [ 7 ], we introduce latent variables to encode conditional dependence among the attributes. Specifically, for each time step t we have the vector Zt = (Z1t, . . . , Zt ) of latent variables that appear as k children of the class variable and parents of all the attributes (see Figure 2). It can be seen as combining the NB model with a factor analysis model at each time step.

Ct−1 The latent variable Zt is assigned a multivariate Gaussian distribution conditional on the class variable and the attributes Y t are assumed to be linear multivariate Gaussian distributions conditional on the latent variables:

Zt|{Ct = ct} ∼ N (μct , Σct ),

Y t|{Zt = zt} ∼ N (Lzt + Φ, Θ), where Σct and Θ are diagonal covariance matrix and L is the transition matrix, Φ is the offset from the latent space to attribute space; note that the stationarity assumption encoded in the model.

In this model, the latent variables capture the dependence between the attributes. They are conditionally independent given the class but marginally dependent. Furthermore, the same mapping, L, from the latent space to the attribute space is used for all classes, and hence, the relation between the class and the attributes is conveyed by the latent variables only.

At this step, the temporal dynamics of the model is assumed to be only captured at the class level. When the state specification of the class variable is coarse, then this assumption will rarely hold. This assumption does not hold in our oil drilling data, as the conditional correlation of the attribute in successive time slices is evident [ 12 ]. we address this by modeling the dynamics of the system at the level of the latent variables. The state specific dynamics is encoded by assuming that the latent variable at the current time slice follows a linear Gaussian distribution conditioned on previous time slice. Specifically, we encode the state specific dynamics by assuming that the multivariate latent variable Zt follows a linear Gaussian distribution conditioned on Zt−1, and the transition dynamics between latent variable is denoted by a diagonal matrix Act : M t|{Ct = ct} ∼ P (M t|Ct = ct),

Y t|{Zt = zt, M t = mt} ∼ N (Lmt zt + Φmt , Θmt ), where 1 ≤ mt ≤ |sp (M )| (|sp (M )| denotes the dimension of variable M space), P (M t = mt|Ct = ct) ≥ 0 and P|mspt(=M1 )| P (M t = mt|Ct = ct) = 1 for all 1 ≤ ct ≤ |sp (C)|, Φmt is the offset from the latent space to attribute space.

The final model is then called dynamic latent classification model which is shown in Figure 4. The dynamic latent classification model is shown to be effective and efficient through the experiment with our oil drilling data, and the significant improvement is also demonstrated when comparing dLCM with static models (such as NB or decision tree) and HMM [ 12 ]. Zt|{Zt−1 = zt−1, Ct = ct} ∼ N (Act zt−1, Σct ) Ct−1 A graphical representation of the model is given in Figure 3.

Ct−1 Zt−1 1

Zt−1 2

C t Z t 1

Z t 2 A discrete mixture variable M is further introduced to the model at each time slice for the purpose of reducing the computational cost while maintaining the representational power [ 12 ]. Similar situation is done by [ 7 ] for static domains, and in the dynamic domains can be seen from [ 3, 6 ] where a probabilistic model called switching state-space model is proposed that combining discrete and continuous dynamics. In this case, the mixture variable follows a multinomial distribution conditioned on the class variable. and the attributes Y t follow a multivariate Gaussian distribution conditioned on the latent variables and the discrete mixture variable,

Approximate inference in dLCM

The exact inference for dLCM is intractable. To make dLCM applicable and effective in practice, approximate inference is then proposed. 3.1

Intractability of exact inference in dLCM

Seen from the dLCM in Figure 4, an equivalent probabilistic model is p(y1:T , z1:T , m1:T , c1:T ) = p(y1|z1, m1)p(z1|c1)p(m1|c1)p(c1) · T Y p(yt|zt, mt)p(zt|zt−1, ct)p(mt|ct)p(ct|ct−1). t=2 In dLCM, exact filtered and smoothed inference is shown to be intractable (scaling exponentially with T [ 8 ]) as neither the class variables nor the mixture variables are observed: At time step 1, p(z1|y1) is a mixture of |sp (C)| · |sp (M )| Gaussian. At timestep 2, due to the summation over the classes and mixture variables, p(z2|y1:2) will be a mixture of |sp (C)|2·|sp (M )|2 Gaussian; at time-step 3 it will be a mixture of |sp (C)|3 · |sp (M )|3 Gaussian and so on until the generation of a mixture of |sp (C)|T · |sp (M )|T Gaussian at time-point T . To control this explosion in computational complexity, approximate inference techniques are adopted to the inference of dLCM. 3.2

Approximate inference: Forward pass

The structure of the proposed dLCM is similar to the linear dynamical system (LDS) [ 2 ], the standard Rauch-Tung-Striebel (RTS) smoother [ 9 ] and the expectation correction smoother [ 3 ] for LDS provide the basis for the approximate inference of dLCM. As for the RTS, the filtered estimate of dLCM p(zt, mt, ct|y1:t) is obtained by a forward recursion, and then following a backward recursion to calculate the smoothed estimate p(zt, mt, ct|y1:T ). The inference of dLCM is then achieved by a single forward recursion and a single backward recursion iteratively. Gaussian collapse is incorporated into both the forward recursion and the backward recursion to form the approximate inference. The Gaussian collapse in the forward recursion is equivalent to assumed density filtering [ 4 ], and the Gaussian collapse in the backward recursion mirrors the smoothed posterior collapse from [ 3 ].

Dropping the normalization constant p(yt|y1:t−1), p(zt, mt, ct|y1:t) is proportional to the new joint probability p(zt, mt, ct, yt|y1:t−1), where p(zt, mt, ct, yt|y1:t−1) = p(yt, zt|mt, ct, y1:t−1)· p(mt|ct, y1:t−1)p(ct|y1:t−1). (1) To build the forward recursion, a recursive form for each of the factors in Equation 1 is required. Given the filtered results of the previous time-step, the recursive form for each of the factors are shown to be feasible [ 12 ]. On the way to devise the recursive form, one term p(zt−1|mt−1, ct−1, y1:t−1) is required, which can be directly obtained since it is the filtered probability from the previous time step. However, the number of mixture components of p(zt−1|yt−1) is increasing exponentially over time as we discussed earlier, so is the case for p(zt−1|mt−1, ct−1, y1:t−1). In our Gaussian collapse implementation [ 12 ], the term p(zt−1|mt−1, ct−1, y1:t−1) is collapsed into a single Gaussian, parameterized with mean νmt−1,ct−1 and covariance Γmt−1,ct−1, and then propagate this collapsed Gaussian for next time slice. With this approximation, the recursive computation of the forward pass becomes tractable. 3.3

Approximate inference: Backward pass

Similar to the forward pass, the backward pass also relies on a recursion computation of the smoothed posterior p(zt, mt, ct|y1:T ). In detail, p(zt, mt, ct|y1:T ) is computed from its smoothed result of the previous step p(zt+1, mt+1, ct+1|y1:T ), together with some other quantities obtained from forward pass. The first smoothed posterior is p(zT , mT , cT |y1:T ), which can be directly obtained as it is also the last filtered posterior from the forward pass. To compute p(zt, mt, ct|y1:T ), factorize it as p(zt, mt, ct|y1:T ) = = mt+1,ct+1

X mt+1,ct+1 p(zt, mt, ct, mt+1, ct+1|y1:T ) p(zt|mt, ct, mt+1, ct+1, y1:T ) · p(mt, ct|mt+1, ct+1, y1:T )p(mt+1, ct+1|y1:T ). Due to the fact zt⊥⊥{yt+1:T , mt+1, ct+1}|{zt+1, mt, ct}, the p(zt|mt, ct, mt+1, ct+1, y1:T ) can be found from that term p(zt|mt, ct, mt+1, ct+1, y1:T ) = Z

p(zt|zt+1, mt, ct, y1:t) · zt+1 p(zt+1|mt, ct, mt+1, ct+1, y1:T )dzt+1.

To complete the backward recursive form, two essential assumptions are further made in the backward pass that makes the approximate inference applicable and effective. The first assumption is to approximate p(zt+1|mt, ct, mt+1, ct+1, y1:T ) by p(zt+1|mt+1, ct+1, y1:T ) [ 3 ]. This is due to that although {mt, ct} ⊥6⊥zt+1|y1:T , the influence of {mt, ct} on zt+1 through zt is ’weak’ as zt will be mostly influenced by y1:t. The benefit of this simple assumption lies in that p(zt+1|mt+1, ct+1, y1:T ) can be directly obtained from the previous backward recursion. Meanwhile p(zt+1|mt+1, ct+1, y1:T ) is a Gaussian mixture whose components increase exponentially in T − t. The second assumption is also a Gaussian collapse process. p(zt+1|mt+1, ct+1, y1:T ) is collapsed into a single Gaussian and then pass this collapsed Gaussian for the next step. This will guarantee that the back propagated term p(zt|mt, ct, y1:T ) will be Gaussian mixture with fixed |sp (C)| · |sp (M )| components at next time step. With this Gaussian collapse process at each time slice, a tractable recursion in backward pass is established. 3.4

The importance of approximate inference

The exact inference is not applicable in practise as its computation cost is increasing exponentially over time. The approximate inference is then essential to dLCM. Gaussian collapse is adopted during building the recursive form for both forward and backward pass. At the same time, p(zt+1|mt, ct, mt+1, ct+1, y1:T ) is also approximated by p(zt+1|mt+1, ct+1) in dLCM. As the approximations are made within the inference, the quality of the overall learning and inference for dLCM is rather sensitive to these approximations. Our experimental results [ 12 ] showed that the overall performance of dLCM is satisfactory, which indicate that the chosen approximations are reasonable.

Even though the proposed approximate inference in dLCM is satisfactory, is there any improvement space with alternative approximation methods? With this question in mind, we decide to investigate the approximation’s properties by incorporating new approximation method. The traditional sampling techniques (e.g., [ 5 ]) are commonly used in a similar approximation situation. Next section we will briefly introduce sampling technique, and then we will explain how it is integrated in dLCM. Meanwhile the approximation of p(zt+1|mt, ct, mt+1, ct+1, y1:T ) by p(zt+1|mt+1, ct+1) is kept unchanged.

In general, we will replace Gaussian collapse by sampling in the approximate inference of dLCM, and further investigate the effectiveness and efficiency of this proposal through a comparison experiment between original Gaussian collapse based dLCM and sampling techniques based dLCM. 4 4.1

Sampling Background

The sampling is to select a subset of samples from within a population to estimate the characteristics of the original population. There is a commonly known tradeoff in sampling. When less samples are selected from within a population, which means the sampling process takes shorter time, the estimation of the characteristics to the original population is relatively worse. On the other hand, if more samples are selected from within the same population, which of course is much more time consuming, the characteristics of the original population is better estimated. The efficiency (time consuming) and effectiveness (characteristics estimation) are the essential concerns in the sampling techniques. In general, more samples should be selected within the tolerable time, and the better estimation of characteristics of the population can be expected. This feature of traditional sampling techniques makes it attractive to the approximate inference of dLCM, a balance between efficiency and effectiveness is expected to be achieved according to application requirement. Meanwhile sampling can approximate any distribution as long as the sample number is sufficient. Sampling is expected to replace the Gaussian collapse for the approximation in the both forward and backward pass. We introduce particle filtering next, which will further motivate our discussion on the utilizations detail of the sampling in the inference of dLCM. 4.2

Particle Filtering

Particle filtering (PF) [ 1 ] is a technique for implementing a recursive Bayesian filter by Monte Carlo simulation, which is an efficient statistical method to estimate the system state. The Monte Carlo simulation relies on repeated random sampling techniques. In particle filtering, let a weighted particle set {(stn, πnt)}nN=1 at each time t denotes an approximation of required posterior probability of the system state. Each of N particles has the state stn and its weight πnt, the weights are normalized such that Pn πnt = 1. The particle filtering has three operation stages: sampling (selection), prediction and observation. In the sampling stage, N particles are chosen from the prior probability according to the set {(s(nt−1), πnt−1)}nN=1. Then predict the state of the chosen particles by the dynamic model p(st|st−1). In observation stage, the predicted particles are weighted according to observation model p(yt|st) . After obtaining the weights of particles, the state at time t can be estimated based on the weighted particle set. 4.3

Sampling in the dCLM

The sampling process that we required for the inference of dCLM is similar to the PF. In the forward pass, we know that the mixture components of p(zt−1|yt−1) is increasing exponentially over time in the exact inference. Instead of a recursive approximation on p(zt−1|mt−1, ct−1, y1:t−1) in the Gaussian collapse scheme, an recursive approximation on p(zt−1|y1:t−1) by sampling is adopted. With the obtained approximated distribution p(zt−1|y1:t−1) at time slice t−1, N weighted samples {(stn−1, πnt−1)}nN=1 are selected from this approximated distribution. These selected samples are propagated to the next time slice t with a linear transition dynamics Act . As the discrete class variable Ct has the size of |sp (C)|, then each of the selected samples will become |sp (C)| new samples. These |sp (C)|·N propagated samples are further updated by the observation yt. The updating rule is the same as the Kalmar filter updating [ 11 ]. Due to the mixture component has size of |sp (M )|, each of these propagated samples will become |sp (M )| new samples again. In general, the N selected samples from time slice t − 1 will become |sp (C)| · |sp (M )| · N samples at time slice t and its weight are updated accordingly. The weighted sample set {(fnt , γnt)}|sp(C)|·|sp(M)|·N is then n=1 the approximation to p(zt|y1:t). For next time step recursion, a new weighted sample set {(stn, πnt)}nN=1 containing N samples will be selected from the approximated p(zt|y1:t). The recursive process is summarized in Algorithm 1.

Algorithm 1 Sampling in the forward pass 1: for t = 2 : T do 2: Select N samples from the previous appxoximated distribution p(zt−1|y1:t−1) to form a weighted sample set {(stn−1, πnt−1)}nN=1 3: Propagate these selected samples to the next time slice t by a transition dynamics Act 4: Update the propagated samples by the observation Y t. 5: Then the updated samples form a new weighteadn asapmprpolxei mseatti{o(nfnot,fγpt()z}t|n|sy=p1(1C:t))|·|sp(M)|·N , which is n 6: end for In the backward pass, with a similar sampling process as the forward pass, samples are firstly selected from the approximated distribution p(zt+1|y1:T ). p(zt+1|y1:T ) is approximated by a weighted sample set denoted as {(btn+1, ρtn+1)}n=1. Next, the selected samples are then back-propagated to the previous time slice t (which is the next step of the backward pass) with the reverse transition dynamics of Atc. The back-propagated samples is later updated by the observation Y t−1 [ 11 ]. Similar to the forward pass, the approximation to p(zt|y1:T ). p(zt|y1:T ) is a weighted sample set {(gnt, τnt )}|sp(C)|·|sp(M)|·N . For the recursive n=1 calculation of next time slice, a new weighted sample set {(btn, ρtn)}nN=1 containing N samples will be selected from this approximated distribution. The required term p(zT |cT , y1:T ) at the beginning of the backward pass is also the last time step result from the forward pass, which indicates that {(gnT , τnT )}|sp(C)|·|sp(M)|·N is n=1 the same sample set as {(fnT , γnT )}|ns=p(1C)|·|sp(M)|·N . Finally the approximate inference for dLCM is completely established with sampling technique based scheme. The number of samples selected from the approximated distribution at each step is fixed which is dependent on the time consumption requirement and estimation quality requirement of corresponding application. There is a balance need to be addressed according to the practical application requirement. Generally, the more samples we select, the more time it costs while the estimation quality is better.

In the discussion of this section, the approximate inference of dLCM based on sampling is established by mimicking the particle filtering process both in the forward and backward pass. To investigate the effectiveness and efficiency of sampling based dLCM, a comparison experiments test will be conducted in the next section. 5

Experiment Results

In this section, the comparison experiments on simulation data and oil drilling data are conducted and their results are discussed. 5.1

Experiments on simulation data

A set of simulation data is firstly generated from dLCM, and we investigate the classification accuracy and time-consumption between Gaussian collapse scheme and sampling scheme. 5.1.1

Experiments settings on simulation data generation

The simulation data-set are generated from dLCM with parameters that is chosen by a “semi random” process. The model parameters of dLCM have two parts: model structure and model parameters with fixed model structure. For each time slice, the model structure is decided by four factors: the size of class variable C (activity state), the dimension of latent variable space Z, the dimension of attribute space Y and the size of mixture components M . These values are fixed as described in Table 1. After choosing the model structure, its associated model parameters were randomly generated. The above process of choosing model structure and model parameters together is the ”semi random” process. We then generate a data set with this model.

For convenience we call the model structure and its parameters as model parameters in the remaining of the paper. The generated data set and the model parameters are used as true model for the classification test purpose next. A comparison experiment between data |sp (C)| |sp (M )| |sp (Z)| |sp (Y )| set1 2 1 12 6 set2 2 2 18 9

Gaussian collapse and sampling is then conducted. 5.1.2

Results and discussion

The comparison experiment is conducted with both Gaussian collapse based and sampling based dLCM. Among sampling based scheme, there are three chosen sample sizes 40, 200, 1000 respectively. The classification results on simulation set1 and set2 are summarized in Table 2. The classification accuracy results scheme are recorded with the average results of ten runs of each scheme, and it shows that Gaussian collapse based dLCM performs better than three sampling based dLM in both set1 and set2. Among sampling based scheme, scheme with larger sample size achieves better classification accuracy in a general sense.

scheme (samples) set1/accuracy set2/accuracy Gaussian collapse 99.60% 99.90% Sampling (40) 96.75% 95.55% Sampling (200) 97.60% 97.95% Sampling (1000) 97.75% 98.40% After investigating the effectiveness of each scheme, we continue to discuss the efficiency. The efficiency of each scheme is evaluated by the average time-cost of ten run that is required to accomplish the classification task, and the time-cost detail is shown in Table 3. In set1, the classification task is accomplished 0.47 second with Gaussian collapse, whereas the sampling scheme with 40 samples cost 2.01 second. The larger sample size in sampling scheme, the more time it costs to accomplish the classification task. Meanwhile it is clear that Gaussian collapse requires much less time to accomplish the classification task.

Compared to sampling based scheme, Gaussian collapse based scheme achieves comparable (slightly better) classification results with much less time on simulation data test. 5.2

Experiments on oil drilling data

Next the same comparison experiment is conducted with the oil drilling data from North sea.

Experiment settings on oil drilling data

As we mentioned in the introduction section, we will tie the development to the task of activity recognition in this paper. In total, there are 5 drilling acclivities in the dataset used for classification task. These activities are “drilling”, “connection”, “tripping in”, “tripping out” and “other”. The original oil drilling data contains more than 60 variables. Advised by oil drilling domain expert, 9 variables for the classification task here. There are two chosen data set, which contains 80000 and 50000 time slices with all 5 activities presented respectively.

For classification purpose in this paper, we combine these 5 activities into 2 activities and conduct the classification test on the combined data set. Three activities including “drilling”, “connection” and “other” activities are combined as one activity, and we do the similar combination for “tripping in” and “tripping out” activities. The reason behind is that these combined actives are physically close and may have quite similar dynamics. This combination also simplify our experiments with the oil drilling data, while maintaining the comparison experiment purpose. Before we can compare the inference of each scheme on the oil drilling data set, we learn a dLCM with the learning method proposed in [ 12 ] with the oil drilling data set containing 80000 time slices. The model structure is chosen by experience, with 2 mixture component and 16 latent variables. After learning its parameters with the chosen model structure, the dLCM for further classification experiment is then finalized. With the learnt dLM, the classification experiment will be conducted on another oil drilling data set containing 50000 time slices. 5.2.2

Results and discussion

With the fixed dLCM, the average (by ten runs) classification accuracy and average time-cost for each scheme are obtained. There are 4 scheme are presented, Gaussian collapsed based scheme and sampling techniques based scheme with 40, 200, 1000 samples respectively. The experiments results are summarized in Table 4.

Among the sampling techniques based scheme, more samples achieves higher classification accuracy. However, with more samples in sampling techniques based scheme, the computation cot for the classification task is much more expensive. It is clearly shown in the table that sampling with 1000 samples requires more than one hour to accomplish the classification task which is around 40 times than that of Gaussian collapse, and they achieve a similar classification accuracy. In general Gaussian collapse still achieves comparable results (slightly better than Sampling), while keeping the computation cost in a rather low standard compared to sampling based scheme. 6

Conclusion

In the approximate inference of the dLCM, the Gaussian collapse is originally adopted as the core of the approximation method. In this paper, alternatively sampling technique is proposed to do the approximation. A process similar to particle filtering, utilizing sampling as the basis, is then incorporated into the approximate inference of the dLCM. We then conduct the comparison experiment results on both simulated data and real oil drilling data. The experimental results from both sets show that the approximate scheme based on Gaussian collapse is computationally more efficient than sampling, while offering comparable accuracy results.

Acknowledgements

I would like to thank Helge Langseth who helped a lot during the whole process of organizing and writing this paper.

[1]

Sanjeev

Arulampalam , Simon Maskell, Neil Gordon, and

Tim

Clapp . A tutorial on particle filters for on-line non-linear/non-gaussian bayesian tracking . IEEE Transactions on Signal Processing , 50 : 174 - 188 , 2001 .

[2]

Yaakov

Bar-Shalom and Xiao-Rong Li. Estimation and Tracking: Principles, Techniques and software . Artech House Publishers, 1993 .

[3]

David

Barber . Expectation correction for smoothed inference in switching linear dynamical systems . Journal of Machine Learning Research , 7 : 2515 - 2540 , 2006 .

[4]

Xavier

Boyen and

Daphne

Koller . Approximate learning of dynamic models . In Advances in Neural Information Processing Systems 12 , pages 396 - 402 , 1999 .

[5]

Stuart

Geman and

Donald

Geman . Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images . IEEE Transactions on Pattern Analysis and Machine Intelligence , 6 : 721 - 741 , 1984 .

[6]

Zoubin

Ghahramani and

Geoffrey E.

Hinton . Variational learning for switching state-space models . Neural Computation , 12 : 963 - 996 , 1998 .

[7]

Helge

Langseth and

Thomas D.

Nielsen . Latent classification models . Machine Learning , 59 ( 3 ): 237 - 265 , 2005 .

[8]

Uri

Lerner . Hybrid Bayesian networks for reasoning about complex systems . PhD thesis , Dept. of Comp. Sci. Stanford University, Stanford, 2002 .

[9]

H.E.

Rauch ,

Tung , and

C. T.

Striebel . Maximum likelihood estimates of linear dynamic systems . AIAA Journal , 3 : 1445 - 1450 , 1965 .

[10]

Padhraic

Smyth . Hidden Markov models for fault detection in dynamic system . Pattern Recognition , 27 ( 1 ): 149 - 164 , 1994 .

[11]

Greg

Welch and

Gary

Bishop . An introduction to the kalman filter . Technical report , University of North Carolina at Chapel Hill, 1995 .

[12] Shengtong

Zhong

, Helge Langseth, and

Thomas D.

Nielsen . Bayesian networks for dynamic classification . Working Paper, http://idi.ntnu.no/~shket/dLCM.pdf, 2012 .