On Approximate Inference of Dynamic Latent Classification Models for Oil Drilling Monitoring Shengtong Zhong Department of Computer and Information Science Norwegian University of Science and Technology 7491 Trondheim, Norway shket@idi.ntnu.no Abstract real time by experienced engineers, who have a num- ber of tasks to perform ranging from understanding the situation on the platform (activity recognition) via We have been working with dynamic data from an oil production facility in the North avoiding a number of either dangerous or costly situa- sea, where unstable situations should be i- tions (event detection), to optimization of the drilling operation. The variables that are collected cover both dentified as soon as possible. Monitoring in such a complex domain is a challenging task. topside measurements (like flow rates) and down-hole measurements (like, for instance, gamma rate). For Not only is such a domain typically volatile the discussions to be concrete, we will tie the develop- and following non-linear dynamics, but sen- sor input to the monitoring system can al- ment to the task of activity recognition in this paper. The drilling of a well is a complex process, which con- so often be high dimensional, making it dif- ficult to model and classify the domain’s s- sists of activities that are performed iteratively as the tates. Dynamic latent classification model- length of the well increases, and knowing which ac- tivity is performed at any time is important for the s are dynamic Bayesian networks capable of effective and efficient modeling and classifi- further event detection. cation. An approximate inference algorithm Motivated by this problem setting, a generative mod- utilizing Gaussian collapse has been tailor- el called dynamic latent classification models (dLCM) made for this family of models, but the ap- [12] for dynamic classification in continuous domains proximation’s properties have not been ful- is proposed to help the drilling engineers by automati- ly explored. In this paper we compare al- cally analyzing the data stream and classify the situa- ternatives approximate inference methods for tion accordingly. Dynamic latent classification models the dynamic latent classification model, in are Bayesian networks which could model the complex particular focusing on traditional sampling system process and identify its system state. techniques. We show that the approximate A dynamic latent classification model can be seen as scheme based on Gaussian collapse is compu- tationally more efficient than sampling, while combining a naı̈ve Bayes model with a mixture of fac- tor analyzers at each time point. The latent variables offering comparable accuracy results. of the factor analyzers are used to capture the state- specific dynamics of the process as well as modeling dependencies between attributes. As exact inference 1 Introduction for the model is intractable, an approximate inference scheme based on Gaussian collapse is proposed in our In the oil drilling, monitor the complex process and previous study [12]. Although the previous experi- identify the current system state is actually very dif- ments demonstrated that the proposed approximate ficult. Monitoring the complex process often involves inference is functioned well the learning of dynamic keeping an eye on hundreds or thousands of sensors to latent classification models as well as the classification determine whether or not the process is stable. We re- work, we further investigate the approximation’s prop- port results on an oil production facility in the North erties by introducing alternative sampling techniques. sea, where unstable situations should be identified as soon as possible [12]. The oil drilling data that we The remaining of the paper is organized as follows. In are considering, consisting of some sixty variables, is Section 2, we introduce the detail of dLCM. The im- captured every five seconds. The data is monitored in portance of Gaussian collapse in the inference of dLCM is discussed in Section 3. Next, alternative sampling able and attributes at different time points are inde- techniques are proposed for dLCM in Section 4. After pendent given the class variables at the current time, the experiment results are illustrated and discussed in which is violated in many real world setting. In our Section 5, the conclusion of the paper is presented in oil drilling data, there is also a strong correlation be- Section 6. tween attributes given the class [12]. Modeling the dependence between attributes is then the next step 2 Dynamic Latent Classification in creating the dLM. Models Following [7], we introduce latent variables to en- code conditional dependence among the attributes. Dynamic Latent classification models [12] are dynamic Specifically, for each time step t we have the vector Bayesian networks, which can model the complex sys- Z t = (Z1t , . . . , Zkt ) of latent variables that appear as tem process and identify its system state. The com- children of the class variable and parents of all the at- plex system process is highly dynamical and complex, tributes (see Figure 2). It can be seen as combining which makes it difficult to model and idetentify with the NB model with a factor analysis model at each the static models and standard dynamic model. The time step. framework of dLCM is specified incrementally by ex- amining its expressivity relative to the oil drilling data. C t−1 Ct The dLCM is established from naı̈ve Bayes model (N- B), which is one of the simplest static models. In the first step, temporal dynamics of the class variables (a first order Markov chain) is added as considerable cor- t−1 Z1 t−1 Z2 Z1t Z2t relation between the class variable of consecutive time slices are evidenced from the oil drilling data [12]. This results in a dynamic version of naı̈ve Bayes, which is al- so equivalent to a standard first order hidden Markov Y1t−1 Y2t−1 Y3t−1 Y4t−1 Y1t Y2t Y3t Y4t model (HMM) shown in Figure 1, where C t denotes the class variable at time t and Yit denotes the i-th at- Figure 2: An example model by adding latent vari- tribute at time t. This model type has a long history ables to dynamic version of naı̈ve Bayes using 2TDB- of usage in monitoring, see e.g. [10]. N, where the dimension of latent space is 2 and the dimension of attribute space is 4. In each time step, the conditional dependencies between the attributes C t−1 Ct are encoded by the latent variables (Z1t , . . . , Zkt ). The latent variable Z t is assigned a multivariate Gaus- sian distribution conditional on the class variable and Y1t−1 Y2t−1 Y3t−1 Y4t−1 Y1t Y2t Y3t Y4t the attributes Y t are assumed to be linear multivari- ate Gaussian distributions conditional on the latent Figure 1: An example of dynamic version of naı̈ve variables: Bayes with 4 attributes using 2-time slice dynam- ic Bayesian networks representation (2TDBN). At- tributes are assumed to be conditionally independent Z t |{C t = ct } ∼ N (µct , Σct ), given the class variable. Y t |{Z t = z t } ∼ N (Lz t + Φ, Θ), This model is described by a prior distribution over where Σct and Θ are diagonal covariance matrix and L the class variable P (c0 ), a conditional observation dis- is the transition matrix, Φ is the offset from the latent tribution P (y t |ct ), and transition probabilities for the space to attribute space; note that the stationarity class variable P (ct |ct−1 ); we assume that the mod- assumption encoded in the model. el is stationary, i.e., P (y t |ct ) = P (y t−1 |ct−1 ) and In this model, the latent variables capture the depen- P (ct |ct−1 ) = P (ct+1 |ct ), for all t. For the continuous dence between the attributes. They are conditionally observation vector, the conditional distribution may independent given the class but marginally dependen- be specified by a class-conditional multivariate Gaus- t. Furthermore, the same mapping, L, from the latent sian distribution with mean µct and covariance matrix space to the attribute space is used for all classes, and Σct , i.e., Y |{C t = ct } ∼ N (µct , Σct ) hence, the relation between the class and the attributes In a standard HMM, it assumes that the class vari- is conveyed by the latent variables only. At this step, the temporal dynamics of the model is as- sumed to be only captured at the class level. When the state specification of the class variable is coarse, then M t |{C t = ct } ∼ P (M t |C t = ct ), this assumption will rarely hold. This assumption does Y t |{Z t = z t , M t = mt } ∼ N (Lmt z t + Φmt , Θmt ), not hold in our oil drilling data, as the conditional cor- relation of the attribute in successive time slices is evi- where 1 ≤ mt ≤ |sp (M )| (|sp (M )| denotes the dimen- dent [12]. we address this by modeling the dynamics of sion of variable M space), P (M t = mt |C t = ct ) ≥ 0 the system at the level of the latent variables. The s- P|sp(M)| t t t t and mt =1 P (M = m |C = c ) = 1 for all 1 ≤ tate specific dynamics is encoded by assuming that the t c ≤ |sp (C)|, Φmt is the offset from the latent space latent variable at the current time slice follows a lin- to attribute space. ear Gaussian distribution conditioned on previous time slice. Specifically, we encode the state specific dynam- The final model is then called dynamic latent classi- ics by assuming that the multivariate latent variable fication model which is shown in Figure 4. The dy- Z t follows a linear Gaussian distribution conditioned namic latent classification model is shown to be effec- on Z t−1 , and the transition dynamics between latent tive and efficient through the experiment with our oil variable is denoted by a diagonal matrix Act : drilling data, and the significant improvement is al- so demonstrated when comparing dLCM with static models (such as NB or decision tree) and HMM [12]. Z t |{Z t−1 = z t−1 , C t = ct } ∼ N (Act z t−1 , Σct ) C t−1 Ct A graphical representation of the model is given in Figure 3. t−1 Z1 M t−1 t−1 Z2 Z1t Mt Z2t C t−1 Ct Y1t−1 Y2t−1 Y3t−1 Y4t−1 Y1t Y2t Y3t Y4t t−1 Z1 t−1 Z2 Z1t Z2t Figure 4: An example of dynamic latent classification model using 2TDBN, where the dimension of latent space is 2 and the dimension of attribute space is 4. Y1t−1 Y2t−1 Y3t−1 Y4t−1 Y1t Y2t Y3t Y4t 3 Approximate inference in dLCM Figure 3: An example model by incrementally adding The exact inference for dLCM is intractable. To make dynamics on latent variables using 2TDBN, where the dLCM applicable and effective in practice, approxi- dimension of latent space is 2 and the dimension of mate inference is then proposed. attribute space is 4. The state specific dynamics are encoded at the level of the latent variables. 3.1 Intractability of exact inference in dLCM A discrete mixture variable M is further introduced to Seen from the dLCM in Figure 4, an equivalent prob- the model at each time slice for the purpose of reduc- abilistic model is ing the computational cost while maintaining the rep- resentational power [12]. Similar situation is done by p(y 1:T , z 1:T , m1:T , c1:T ) = [7] for static domains, and in the dynamic domains can p(y 1 |z 1 , m1 )p(z 1 |c1 )p(m1 |c1 )p(c1 ) · be seen from [3, 6] where a probabilistic model called Y T switching state-space model is proposed that combin- p(y t |z t , mt )p(z t |z t−1 , ct )p(mt |ct )p(ct |ct−1 ). ing discrete and continuous dynamics. In this case, t=2 the mixture variable follows a multinomial distribution conditioned on the class variable. and the attributes In dLCM, exact filtered and smoothed inference is Y t follow a multivariate Gaussian distribution condi- shown to be intractable (scaling exponentially with tioned on the latent variables and the discrete mixture T [8]) as neither the class variables nor the mixture variable, variables are observed: At time step 1, p(z 1 |y 1 ) is a mixture of |sp (C)| · |sp (M )| Gaussian. At time- number of mixture components of p(z t−1 |y t−1 ) is in- step 2, due to the summation over the classes and creasing exponentially over time as we discussed ear- mixture variables, p(z 2 |y 1:2 ) will be a mixture of lier, so is the case for p(z t−1 |mt−1 , ct−1 , y 1:t−1 ). In 2 2 |sp (C)| ·|sp (M )| Gaussian; at time-step 3 it will be a our Gaussian collapse implementation [12], the ter- 3 3 mixture of |sp (C)| · |sp (M )| Gaussian and so on un- m p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) is collapsed into a single til the generation of a mixture of |sp (C)|T · |sp (M )|T Gaussian, parameterized with mean νmt−1 ,ct−1 and co- Gaussian at time-point T . To control this explosion variance Γmt−1 ,ct−1 , and then propagate this collapsed in computational complexity, approximate inference Gaussian for next time slice. With this approximation, techniques are adopted to the inference of dLCM. the recursive computation of the forward pass becomes tractable. 3.2 Approximate inference: Forward pass 3.3 Approximate inference: Backward pass The structure of the proposed dLCM is similar to Similar to the forward pass, the backward pass al- the linear dynamical system (LDS) [2], the stan- so relies on a recursion computation of the smoothed dard Rauch-Tung-Striebel (RTS) smoother [9] and posterior p(z t , mt , ct |y 1:T ). In detail, p(z t , mt , ct |y 1:T ) the expectation correction smoother [3] for LDS pro- is computed from its smoothed result of the previous vide the basis for the approximate inference of dL- step p(z t+1 , mt+1 , ct+1 |y 1:T ), together with some oth- CM. As for the RTS, the filtered estimate of dLCM er quantities obtained from forward pass. The first s- p(z t , mt , ct |y 1:t ) is obtained by a forward recursion, moothed posterior is p(z T , mT , cT |y 1:T ), which can be and then following a backward recursion to calculate directly obtained as it is also the last filtered posterior the smoothed estimate p(z t , mt , ct |y 1:T ). The infer- from the forward pass. To compute p(z t , mt , ct |y 1:T ), ence of dLCM is then achieved by a single forward factorize it as recursion and a single backward recursion iteratively. Gaussian collapse is incorporated into both the for- p(z t , mt , ct |y 1:T ) ward recursion and the backward recursion to form X the approximate inference. The Gaussian collapse in = p(z t , mt , ct , mt+1 , ct+1 |y 1:T ) mt+1 ,ct+1 the forward recursion is equivalent to assumed density X filtering [4], and the Gaussian collapse in the backward = p(z t |mt , ct , mt+1 , ct+1 , y 1:T ) · recursion mirrors the smoothed posterior collapse from mt+1 ,ct+1 [3]. p(mt , ct |mt+1 , ct+1 , y 1:T )p(mt+1 , ct+1 |y 1:T ). During the forward recursion of dLCM, the filtered posterior p(z t , mt , ct |y 1:t ) is computed with a recursive Due to the fact that form. By a simple decomposition, z t ⊥⊥{y t+1:T , mt+1 , ct+1 }|{z t+1 , mt , ct }, the term p(z t |mt , ct , mt+1 , ct+1 , y 1:T ) can be found from p(z t , mt , ct |y 1:t ) = p(z t , mt , ct , y t |y 1:t−1 )/p(y t |y 1:t−1 ) p(z t |mt , ct , mt+1 , ct+1 , y 1:T ) ∝ p(z t , mt , ct , y t |y 1:t−1 ). Z = p(z t |z t+1 , mt , ct , y 1:t ) · z t+1 Dropping the normalization constant p(y t |y 1:t−1 ), p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T )dz t+1 . p(z t , mt , ct |y 1:t ) is proportional to the new joint prob- ability p(z t , mt , ct , y t |y 1:t−1 ), where To complete the backward recursive form, two es- t t t p(z , m , c , y |y t 1:t−1 t t ) = p(y , z |m , c , y t t 1:t−1 )· sential assumptions are further made in the back- t t 1:t−1 t 1:t−1 (1) ward pass that makes the approximate inference p(m |c , y )p(c |y ). applicable and effective. The first assumption is to approximate p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) by To build the forward recursion, a recursive form for p(z t+1 |mt+1 , ct+1 , y 1:T ) [3]. This is due to that al- each of the factors in Equation 1 is required. Giv- though {mt , ct } 6⊥⊥z t+1 |y 1:T , the influence of {mt , ct } en the filtered results of the previous time-step, the on z t+1 through z t is ’weak’ as z t will be mostly influ- recursive form for each of the factors are shown to enced by y 1:t . The benefit of this simple assumption be feasible [12]. On the way to devise the recursive lies in that p(z t+1 |mt+1 , ct+1 , y 1:T ) can be directly ob- form, one term p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) is required, tained from the previous backward recursion. Mean- which can be directly obtained since it is the filtered while p(z t+1 |mt+1 , ct+1 , y 1:T ) is a Gaussian mixture probability from the previous time step. However, the whose components increase exponentially in T − t. The second assumption is also a Gaussian collapse pro- characteristics to the original population is relatively cess. p(z t+1 |mt+1 , ct+1 , y 1:T ) is collapsed into a single worse. On the other hand, if more samples are select- Gaussian and then pass this collapsed Gaussian for the ed from within the same population, which of course is next step. This will guarantee that the back propa- much more time consuming, the characteristics of the gated term p(z t |mt , ct , y 1:T ) will be Gaussian mixture original population is better estimated. The efficien- with fixed |sp (C)| · |sp (M )| components at next time cy (time consuming) and effectiveness (characteristics step. With this Gaussian collapse process at each time estimation) are the essential concerns in the sampling slice, a tractable recursion in backward pass is estab- techniques. In general, more samples should be select- lished. ed within the tolerable time, and the better estimation of characteristics of the population can be expected. 3.4 The importance of approximate inference This feature of traditional sampling techniques makes it attractive to the approximate inference of dLCM, The exact inference is not applicable in practise as a balance between efficiency and effectiveness is ex- its computation cost is increasing exponentially over pected to be achieved according to application require- time. The approximate inference is then essential to ment. Meanwhile sampling can approximate any dis- dLCM. Gaussian collapse is adopted during building tribution as long as the sample number is sufficient. the recursive form for both forward and backward pass. Sampling is expected to replace the Gaussian collapse At the same time, p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) is al- for the approximation in the both forward and back- so approximated by p(z t+1 |mt+1 , ct+1 ) in dLCM. As ward pass. We introduce particle filtering next, which the approximations are made within the inference, the will further motivate our discussion on the utilizations quality of the overall learning and inference for dL- detail of the sampling in the inference of dLCM. CM is rather sensitive to these approximations. Our experimental results [12] showed that the overall per- 4.2 Particle Filtering formance of dLCM is satisfactory, which indicate that the chosen approximations are reasonable. Particle filtering (PF) [1] is a technique for implement- Even though the proposed approximate inference in ing a recursive Bayesian filter by Monte Carlo simu- dLCM is satisfactory, is there any improvement space lation, which is an efficient statistical method to esti- with alternative approximation methods? With this mate the system state. The Monte Carlo simulation question in mind, we decide to investigate the ap- relies on repeated random sampling techniques. proximation’s properties by incorporating new approx- In particle filtering, let a weighted particle set imation method. The traditional sampling techniques {(stn , πnt )}N n=1 at each time t denotes an approximation (e.g., [5]) are commonly used in a similar approxima- of required posterior probability of the system state. tion situation. Next section we will briefly introduce Each of N particles has the state stn andPits weight sampling technique, and then we will explain how it is πnt , the weights are normalized such that n πnt = 1. integrated in dLCM. Meanwhile the approximation of The particle filtering has three operation stages: sam- p(z t+1 |mt , ct , mt+1 , ct+1 , y 1:T ) by p(z t+1 |mt+1 , ct+1 ) pling (selection), prediction and observation. In the is kept unchanged. sampling stage, N particles are chosen from the prior (t−1) In general, we will replace Gaussian collapse by sam- probability according to the set {(sn , πnt−1 )}N n=1 . pling in the approximate inference of dLCM, and fur- Then predict the state of the chosen particles by the ther investigate the effectiveness and efficiency of this dynamic model p(st |st−1 ). In observation stage, the proposal through a comparison experiment between predicted particles are weighted according to observa- original Gaussian collapse based dLCM and sampling tion model p(y t |st ) . After obtaining the weights of techniques based dLCM. particles, the state at time t can be estimated based on the weighted particle set. 4 Sampling 4.3 Sampling in the dCLM 4.1 Background The sampling process that we required for the infer- ence of dCLM is similar to the PF. In the forward pass, The sampling is to select a subset of samples from we know that the mixture components of p(z t−1 |y t−1 ) within a population to estimate the characteristics of is increasing exponentially over time in the exac- the original population. There is a commonly known t inference. Instead of a recursive approximation tradeoff in sampling. When less samples are select- on p(z t−1 |mt−1 , ct−1 , y 1:t−1 ) in the Gaussian collapse ed from within a population, which means the sam- scheme, an recursive approximation on p(z t−1 |y 1:t−1 ) pling process takes shorter time, the estimation of the by sampling is adopted. With the obtained approxi- mated distribution p(z t−1 |y 1:t−1 ) at time slice t−1, N nally the approximate inference for dLCM is complete- weighted samples {(st−1 t−1 N n , πn )}n=1 are selected from ly established with sampling technique based scheme. this approximated distribution. These selected sam- The number of samples selected from the approximat- ples are propagated to the next time slice t with a lin- ed distribution at each step is fixed which is dependen- ear transition dynamics Act . As the discrete class vari- t on the time consumption requirement and estima- able C t has the size of |sp (C)|, then each of the select- tion quality requirement of corresponding application. ed samples will become |sp (C)| new samples. These There is a balance need to be addressed according to |sp (C)|·N propagated samples are further updated by the practical application requirement. Generally, the the observation y t . The updating rule is the same as more samples we select, the more time it costs while the Kalmar filter updating [11]. Due to the mixture the estimation quality is better. component has size of |sp (M )|, each of these propa- gated samples will become |sp (M )| new samples again. In the discussion of this section, the approximate in- In general, the N selected samples from time slice t− 1 ference of dLCM based on sampling is established by will become |sp (C)| · |sp (M )| · N samples at time s- mimicking the particle filtering process both in the for- lice t and its weight are updated accordingly. The ward and backward pass. To investigate the effective- |sp(C)|·|sp(M)|·N weighted sample set {(fnt , γnt )}n=1 is then ness and efficiency of sampling based dLCM, a com- t 1:t parison experiments test will be conducted in the next the approximation to p(z |y ). For next time step re- cursion, a new weighted sample set {(stn , πnt )}N n=1 con- section. taining N samples will be selected from the approxi- mated p(z t |y 1:t ). The recursive process is summarized 5 Experiment Results in Algorithm 1. In this section, the comparison experiments on simula- Algorithm 1 Sampling in the forward pass tion data and oil drilling data are conducted and their 1: for t = 2 : T do results are discussed. 2: Select N samples from the previous appx- oximated distribution p(z t−1 |y 1:t−1 ) to form a weighted sample set {(st−1 t−1 N 5.1 Experiments on simulation data n , πn )}n=1 3: Propagate these selected samples to the next A set of simulation data is firstly generated from dL- time slice t by a transition dynamics Act CM, and we investigate the classification accuracy and 4: Update the propagated samples by the obser- time-consumption between Gaussian collapse scheme vation Y t . and sampling scheme. 5: Then the updated samples form a new weight- |sp(C)|·|sp(M)|·N ed sample set {(fnt , γnt )}n=1 , which is 5.1.1 Experiments settings on simulation an approximation of p(z t |y 1:t ) data generation 6: end for The simulation data-set are generated from dLCM In the backward pass, with a similar sampling pro- with parameters that is chosen by a “semi random” cess as the forward pass, samples are firstly select- process. The model parameters of dLCM have two ed from the approximated distribution p(z t+1 |y 1:T ). parts: model structure and model parameters with p(z t+1 |y 1:T ) is approximated by a weighted sample fixed model structure. For each time slice, the model set denoted as {(bt+1 t+1 n , ρn )}n=1 . Next, the select- structure is decided by four factors: the size of class ed samples are then back-propagated to the previous variable C (activity state), the dimension of laten- time slice t (which is the next step of the backward t variable space Z, the dimension of attribute space pass) with the reverse transition dynamics of Atc . The Y and the size of mixture components M . These val- back-propagated samples is later updated by the ob- ues are fixed as described in Table 1. After choosing servation Y t−1 [11]. Similar to the forward pass, the the model structure, its associated model parameters approximation to p(z t |y 1:T ). p(z t |y 1:T ) is a weighted were randomly generated. The above process of choos- |sp(C)|·|sp(M)|·N ing model structure and model parameters together is sample set {(gnt , τnt )}n=1 . For the recursive calculation of next time slice, a new weighted sample the ”semi random” process. We then generate a data set {(btn , ρtn )}N n=1 containing N samples will be select- set with this model. ed from this approximated distribution. The required For convenience we call the model structure and its term p(z T |cT , y 1:T ) at the beginning of the backward parameters as model parameters in the remaining of pass is also the last time step result from the forward the paper. The generated data set and the model pa- |sp(C)|·|sp(M)|·N pass, which indicates that {(gnT , τnT )}n=1 is rameters are used as true model for the classification T T |sp(C)|·|sp(M)|·N the same sample set as {(fn , γn )}n=1 . Fi- test purpose next. A comparison experiment between data |sp (C)| |sp (M )| |sp (Z)| |sp (Y )| T scheme (samples) set1/time-cost set2/time-cost set1 2 1 12 6 500 Gaussian Collapse 0.47(s) 1.91(s) set2 2 2 18 9 1000 Sampling (40) 2.01 (s) 5.97(s) Sampling (200) 7.31(s) 17.56 (s) Table 1: The simulation data set with its model struc- Sampling (1000) 36.24 (s) 80.70 (s) ture information. Table 3: The average time-cost (second) on simulation Gaussian collapse and sampling is then conducted. data set with both Gaussian collapse based and sam- pling based (varying the number of samples) dLCM. 5.1.2 Results and discussion The comparison experiment is conducted with both 5.2.1 Experiment settings on oil drilling data Gaussian collapse based and sampling based dLCM. As we mentioned in the introduction section, we will Among sampling based scheme, there are three cho- tie the development to the task of activity recognition sen sample sizes 40, 200, 1000 respectively. The classi- in this paper. In total, there are 5 drilling acclivi- fication results on simulation set1 and set2 are sum- ties in the dataset used for classification task. These marized in Table 2. The classification accuracy re- activities are “drilling”, “connection”, “tripping in”, sults scheme are recorded with the average results of “tripping out” and “other”. The original oil drilling ten runs of each scheme, and it shows that Gaus- data contains more than 60 variables. Advised by oil sian collapse based dLCM performs better than three drilling domain expert, 9 variables for the classification sampling based dLM in both set1 and set2. Among task here. There are two chosen data set, which con- sampling based scheme, scheme with larger sample tains 80000 and 50000 time slices with all 5 activities size achieves better classification accuracy in a general presented respectively. sense. For classification purpose in this paper, we combine scheme (samples) set1/accuracy set2/accuracy these 5 activities into 2 activities and conduct the Gaussian collapse 99.60% 99.90% classification test on the combined data set. Three Sampling (40) 96.75% 95.55% activities including “drilling”, “connection” and “oth- Sampling (200) 97.60% 97.95% er” activities are combined as one activity, and we do Sampling (1000) 97.75% 98.40% the similar combination for “tripping in” and “trip- ping out” activities. The reason behind is that these Table 2: The average classification accuracy on simu- combined actives are physically close and may have lation data set with Gaussian collapse based and sam- quite similar dynamics. This combination also sim- pling based (varying the number of samples) dLCM. plify our experiments with the oil drilling data, while maintaining the comparison experiment purpose. After investigating the effectiveness of each scheme, we continue to discuss the efficiency. The efficiency of Before we can compare the inference of each scheme each scheme is evaluated by the average time-cost of on the oil drilling data set, we learn a dLCM with the ten run that is required to accomplish the classification learning method proposed in [12] with the oil drilling task, and the time-cost detail is shown in Table 3. In data set containing 80000 time slices. The model set1, the classification task is accomplished 0.47 second structure is chosen by experience, with 2 mixture com- with Gaussian collapse, whereas the sampling scheme ponent and 16 latent variables. After learning its pa- with 40 samples cost 2.01 second. The larger sample rameters with the chosen model structure, the dLCM size in sampling scheme, the more time it costs to ac- for further classification experiment is then finalized. complish the classification task. Meanwhile it is clear With the learnt dLM, the classification experiment will that Gaussian collapse requires much less time to ac- be conducted on another oil drilling data set contain- complish the classification task. ing 50000 time slices. Compared to sampling based scheme, Gaussian col- 5.2.2 Results and discussion lapse based scheme achieves comparable (slightly bet- ter) classification results with much less time on sim- With the fixed dLCM, the average (by ten runs) ulation data test. classification accuracy and average time-cost for each scheme are obtained. There are 4 scheme are pre- 5.2 Experiments on oil drilling data sented, Gaussian collapsed based scheme and sampling techniques based scheme with 40, 200, 1000 samples re- Next the same comparison experiment is conducted spectively. The experiments results are summarized in with the oil drilling data from North sea. Table 4. scheme (samples) accuracy time-cost [2] Yaakov Bar-Shalom and Xiao-Rong Li. Estima- Gaussian Collapse 82.9% 110.15(s) tion and Tracking: Principles, Techniques and Sampling (40) 69.87% ) 335.93(s) software. Artech House Publishers, 1993. Sampling (200) 76.56% 1130.85 (s) Sampling (1000) 79.07% 4469.33 (s) [3] David Barber. Expectation correction for s- moothed inference in switching linear dynamical Table 4: The average classification accuracy and time- systems. Journal of Machine Learning Research, cost (second) on oil drilling data set with both Gaus- 7:2515–2540, 2006. sian collapse based and sampling based (varying the [4] Xavier Boyen and Daphne Koller. Approximate number of samples) dLCM. learning of dynamic models. In Advances in Neural Information Processing Systems 12, pages Among the sampling techniques based scheme, more 396–402, 1999. samples achieves higher classification accuracy. How- [5] Stuart Geman and Donald Geman. Stochastic ever, with more samples in sampling techniques based relaxation, Gibbs distribution and the Bayesian scheme, the computation cot for the classification task restoration of images. IEEE Transactions on Pat- is much more expensive. It is clearly shown in the table tern Analysis and Machine Intelligence, 6:721– that sampling with 1000 samples requires more than 741, 1984. one hour to accomplish the classification task which is around 40 times than that of Gaussian collapse, and [6] Zoubin Ghahramani and Geoffrey E. Hinton. they achieve a similar classification accuracy. In gener- Variational learning for switching state-space al Gaussian collapse still achieves comparable result- models. Neural Computation, 12:963–996, 1998. s (slightly better than Sampling), while keeping the computation cost in a rather low standard compared [7] Helge Langseth and Thomas D. Nielsen. La- to sampling based scheme. tent classification models. Machine Learning, 59(3):237–265, 2005. 6 Conclusion [8] Uri Lerner. Hybrid Bayesian networks for reason- ing about complex systems. PhD thesis, Dept. of In the approximate inference of the dLCM, the Gaus- Comp. Sci. Stanford University, Stanford, 2002. sian collapse is originally adopted as the core of the [9] H.E. Rauch, F. Tung, and C. T. Striebel. Maxi- approximation method. In this paper, alternatively mum likelihood estimates of linear dynamic sys- sampling technique is proposed to do the approxima- tems. AIAA Journal, 3:1445–1450, 1965. tion. A process similar to particle filtering, utilizing sampling as the basis, is then incorporated into the [10] Padhraic Smyth. Hidden Markov models for fault approximate inference of the dLCM. We then conduct detection in dynamic system. Pattern Recogni- the comparison experiment results on both simulated tion, 27(1):149–164, 1994. data and real oil drilling data. The experimental re- sults from both sets show that the approximate scheme [11] Greg Welch and Gary Bishop. An introduction to based on Gaussian collapse is computationally more the kalman filter. Technical report, University of efficient than sampling, while offering comparable ac- North Carolina at Chapel Hill, 1995. curacy results. [12] Shengtong Zhong, Helge Langseth, and Thomas D. Nielsen. Bayesian networks Acknowledgements for dynamic classification. Working Paper, http://idi.ntnu.no/~shket/dLCM.pdf, 2012. I would like to thank Helge Langseth who helped a lot during the whole process of organizing and writing this paper. References [1] Sanjeev Arulampalam, Simon Maskell, Neil Gor- don, and Tim Clapp. A tutorial on particle fil- ters for on-line non-linear/non-gaussian bayesian tracking. IEEE Transactions on Signal Process- ing, 50:174–188, 2001.