Uncertain Dynamic Process Monitoring Using Moving Window PCA for Interval-Valued Data M-Faouzi HARKAT ∗ Tarek AIT-IZEM ∗∗,∗∗∗ Frédéric KRATZ ∗∗∗ Majdi Mansouri ∗∗∗∗ Mohamed Nounou ∗∗∗∗ Hazem Nounou ∗ ∗ Chemical Engineering Program, Texas A& M University at Qatar, Doha, Qatar ∗∗ Automatics and signals laboratory Annaba (LASA), Department of electronics, Badji Mokhtar University, P.O.Box 12, 23000, Algeria ∗∗∗ Laboratoire PRISME, INSA Centre Val-de-Loire, 88 boulevard Lahitolle 18020 Bourges Cedex, France ∗∗∗∗ Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha, Qatar Abstract: In this paper, we present a new process monitoring approach for uncertain, or highly noisy systems, which is based on the well known Moving Window Principal Component Analysis (MWPCA) extended to the interval case. We propose to use The Midpoints-Radii PCA (MRPCA) for modelling, which independently exploits two PCAs on the center and radius matrices of the system’s sensor interval-valued data. Furthermore, by changing the size and the shift of the window, Both center and radius model parameters are updated on-line; thus deriving a new Moving Window Midpoints-Radii PCA (MWMRPCA) approach. Based on the updated MWMRPCA, an interval SPE statistic and its control limit are calculated and updated through time, and are used for monitoring the state of the process. The performances of the proposed approach is illustrated by an application to the detection of faults on the Tennesse Eastman Process (TEP). Keywords: Dynamic Systems, Moving Window PCA, Interval Data, Midpoints-Radii PCA, Fault detection, SPE statistic. 1. INTRODUCTION moving window PCA (MWPCA) (Wang et al., 2005) or recursive PCA (RPCA) (Li et al., 2000) has to be used to Multivariate statistical approaches based on principal handle these type of processes. component analysis (PCA) have been widely applied for PCA based diagnosis strategies have been developed for fault diagnosis to improve process quality and produc- the analysis of single-valued variables. But in real life, tivity. The key idea of PCA is to extract linear struc- there are many situations in which the use of these single- ture from high-dimensional data by finding new principal valued variables may cause severe loss of information, axes. In other words, PCA provides a statistical model and thus compromise the reliability of the approach. of the system while reducing the dimensionality of the In this case, more robust strategy can be achieved by used dataset. For process monitoring, PCA uses several describing the process measurements by interval-valued statistics to detect changes in the system behaviour. An data. Therefore, An interval-valued PCA model is to be abnormal situation will cause the corresponding statistic used to handle the new nature of data. to exceed the control limits (Qin, 2003), (Harkat et al., 2006). In the last two decades or so, many researchers investi- gated the possibility to extend PCA to interval-valued A major limitation of PCA-based monitoring is that the data. (Cazes et al., 1997), and (Chouakria, 1998), pro- PCA model, once built from the data, is time-invariant, posed the first interval-valued PCA approaches, known as while most real industrial processes are time-varying. The the centers PCA (CPCA) and the vertices PCA (VPCA) time-varying characteristics of industrial processes include methods. The centers method relies on the interval centers changes in the mean, changes in the variance; and changes to computes the principal components (PCs), while the in the correlation structure among variables, and even vertices method computes the PCs using the vertices of changes in the number of significant principal components the observed hyper-rectangles. Another approach is the (PCs). When a static PCA model is used to monitor midpoints-radii PCA (MRPCA) introduced by (Palumbo time-varying processes, false alarms often occurs, which and Lauro, 2003), which treats midpoints and interval significantly compromise the reliability of the monitoring ranges as two separate variables, and performs based system. Thus, a dynamic variant of PCA, such as the on two separate models of these variables. (D’Urso and T = PTX (2) Giordani, 2004), introduced another approach using least squares for MRPCA, and (Gioia and Lauro, 2006), put Where P is the eigenvectors matrix, or the coefficients of forward an analytical interval PCA based on an interval- the linear transformation. The projection can be reversed valued covariance matrix. (Le-Rademacher and Billard, back to δX are as: considered as faults. 2 2 SP E(k) = kx(k) − x̂(k)k = ke(k)k (18) where e(k) is the residual vector given by the difference 3. DYNAMIC PCA between the measurement vector and its estimate from the PCA model. A faulty condition is declared if the SP E In dynamic PCA algorithms, new measurements are used index exceeds its control limit δα2 determined statistically to update the PCA model on-line. The updating scheme (Jackson and Mudholkar, 1979),(Nomikos and MacGregor, should include: mean, covariance, principal components 1995) for a significance level α. and the confidence limit for the monitoring statistic. Several algorithms for dynamic PCA can be found in the literature. In this paper, we investigate the possibility to For the interval-valued PCA case, several indices are extend the Moving Window PCA (MWPCA) algorithm to proposed as extentions of classical SP E to interval valued the interval-valued data case. data (Ait-Izem et al., 2017),(Benaicha et al., 2013). A first method of calculation treats separately both bounds of 3.1 Moving window PCA the residuals in computing interval SPE, which we denote SP E, and is given by: In MWPCA, a data window of fixed length is moved in real time to update the PCA model once a new  2 T normal sample is available. Thus, for a window of size SP E(k) = ke(k)k = e(k) e(k) 2 T (19) w, the treated data matrix at time sample k is Xk = SP E(k) = ke(k)k = e(k) e(k) [x(k − w + 1), x(k − w + 2), . . . , x(k)], and at time k + 1 Given that e(k) is the interval valued residual computed becomes Xk+1 = [x(k − w + 2), x(k − w + 3), . . . , x(k + 1)]. from the MRPCA model. In the presence of faults, deci- MWPCA algorithm then updates various parameters, in- sions are made when both bounds in Eq. 19 exceed their cluding : normalization parameters, PCA model param- detection threshold. eters, and monitoring statistics control thresholds. The overall strategy for time-varying monitoring using MW- Another index introduced in (Ait-Izem et al., 2017) is PCA is given as follows: denoted ISP E index, and is given by: (1) Collect training data from the process under NOC. m X Normalize the training data using its mean and stan- 2 2 ISP E(k) = k[e(k)]k = k[ej (k)]k (20) dard deviation. Then compute covariance matrix and j=1 carry out an eigen-decomposition to obtain the PCA model for the process. Choose the number of com- Online mode ponents. Finally, Determine the control limits for the At sample time k, use the previous values of time sample used monitoring statistic. (k − 1) of mck−1 , Dk−1 , Λck−1 , Λrk−1 , Pk−1 c r , Pk−1 , A0 , (2) Obtain a new testing sample and normalize it using SP Elim,k−1 , `c0 and `r0 ; scaling parameters from the moving data window. (3) Evaluate the monitoring statistic for the normalized (1) Compute interval residuals and ISP Ek index, for a testing sample using the PCA model obtained in Step new sample [x(k)] after standardizing the interval 1, and check with the corresponding control limit using mck−1 , Dk−1 ; from step 1. If not exceeded, the measurement is (2) If ISP Ek > ISP Elim,k−1 go to step 3, else go to step considered normal. 5; (4) If the measurement is normal, update the moving (3) Check if the new sample is an outlier. For instance, if window by shifting it by one time sample (add new ISP Elim,t−s > ISP Elim,k−t−1 , (t=1,2) (i.e. if three measurement and delete the oldest measurement). consecutive out-of control signals have been gener- Repeat from Step 2. Else, the measurement is faulty, ated), the new sample is not an outlier. Otherwise, it stop update after three consecutive faulty measure- is an outlying sample; ments. (4) If it is an outlier, go to step 5. Otherwise, consider the current condition to be abnormal. Typically, the process condition cannot be determined by examin- 3.2 Moving Window MRPCA for interval-valued data ing one sample. In this case, the model is retained without updating and the current sample is stored. Then, if the process condition is proven to be normal Dynamic variants of PCA, such as MWPCA, were de- subsequently, the model can be updated in a block- veloped in order to take into account the dynamics or wise manner; variations of the system to be monitored, but which remain (5) recompute residuals and ISP Ek ; less robust due to the effect of measurements uncertainties. (6) Shift the window and calculate new normalization To overcome this problem, we propose in this section a and model parameters for the new window (update) dynamic approach to PCA for interval-valued data for (7) update the number of principal components to retain diagnosis purpose. The idea is to exploit the MRPCA for the two sub-models, and SP Elim,k . model for interval-valued data combined with the well known Mowing window PCA algorithm. 4. APPLICATION First, let us recall the normalization procedure for interval- valued data, each interval-valued measurement is normal- The performance of the proposed monitoring scheme, ized as follows: using a MWMRPCA model, is illustrated through an  "  # application on a classical example: the Tennessee Eastman [xj (k)] − X¯c j xj (k) − X¯c j xj (k) − X¯c j p = p , p (24) Process (TEP), whose scheme is shown in Figure 1. This V AR ([Xj ]) V AR ([Xj ]) V AR ([Xj ]) process is developed by Eastman Chemical Company to provide a simulation of a real industrial process for the Where X̄jc is the centers mean of the j-th interval-valued testing of control and/or monitoring methods. There are variable, and V AR([Xj ]) is the interval variance defined five major units in TEP simulation (as shown in fig 1 as follows (Wang et al., 2012): a reactor, a separator, a stripper, a condenser, and a n compressor. The process has 12 manipulated variables, X 1 x2 j (k) + xj (k)xj (k) + x2 j (k) 22 continuous process measurements, and 19 composition  V AR ([Xj ]) = 3 measurements sampled less frequently. which all have k=1 (25) Gaussian noise. Corresponding to different production rates, there are six modes of process operation. The Since the MRPCA model is constituted of two sub-models of center and radius, we propose to apply a moving window FI 1 8 FI 9 scheme to both models in parallel. The proposed MWM- A CWS Compressor Purge JI RPCA algorithm is detailed in the following algorithm: FI 2 7 XA D TI FI XB 13 PI A Offline mode FI Condenser CWR LI N XC A 3 L XD c r SC (1) Construct the center (X ) and radius (X ) matrices E PI TI Y Z XE from the initial interval-valued data matrix [X] LI CWS 5 FI E R XF XG (2) Obtain the initial values for normalization parame- XA 10 XH ters, i.e. centers mean mc0 and interval variance D0 XB A N 6 Stripper Vapor/Liq. XD TI Separator A (3) Compute the initial models parameters (for centers XC A L FI 12 CWR N A XE and raidus): covariance Σ, eigenvector matrices P0c XD Y Z TI TI LI FI L Y XF and P0r , the eigenvalues matrices Λc0 and Λr0 , the XE E Reactor Stm Z XG R Cond E XF R rotation matrix A0 , and the number of principal FI FI XH components for the two sub-models `c0 and `r0 ; based C 4 11 Product on the training data block; (4) Compute interval residuals, and ISP E0 index then Fig. 1. Tennessee Eastman Process obtain the control threshold ISP Elim,0 . (5) Determine the initial size for moving window w. TEP process was run for 1 hours, and we collected 1000 samples from 22 variables. Considering an uncertainty δxi 3.5 of measurements, of the order of 5% of the measurements for each variable, we construct the new interval-valued 3 data matrix of the process, with δxi being the radius [SPE] of the data intervals. The first 150 samples were used 2.5 to construct the initial MWMRPCA model, while the rest of the samples will be processed one by one, by 2 sliding the window each time, in testing phase (online monitoring). The model parameters are updated for each 100 200 300 400 500 600 700 800 900 1000 time sample normal sample, where the the number of components is selected according to the Cumulative Percentage Variance Fig. 2. Fault detection using SP E indicator with MWM- criterion (PCV), so that the explained variance represents RPCA model for TEP data approximately 95% of the total variance, given as follows: 1.3   1.2 P̀  λj  1.1  j=1  CP V (`) = 100  P (26) ISPE  m  1 λj  0.9 j=1 0.8 In order to compare the performance of the new MWMR- 0.7 PCA monitoring strategy, a classical dynamic PCA model 100 200 300 400 500 time sample 600 700 800 900 1000 using the sliding window PCA (MWPCA) (Wang et al., 2005) approach, as well as a static classical PCA model Fig. 3. Fault detection using ISP E indicator with MWM- were developed based on the same set of data. Subse- RPCA model for TEP data quently, two types of outliers, in two different variables, and in two different moments were simulated, that is: 250 (1) An uncertainty ζx9 in reactor temperature, explained 200 by the variable x9 , simulated as bias of amplitude 150 smaller than the radius (ζx9 < δx9 ), from time sample SPE 200 until moment 300. 100 (2) A fault fx3 in the E feed rate, which is given by variable x3 , simulated as a relatively large bias com- 50 pared to the uncertainty/radius (fx3 > δx3 , from time 0 sample 700 to the end. 0 100 200 300 400 500 sample time 600 700 800 900 1000 An interesting property of the PCA for interval-valued data, applied to the diagnosis, is that the defined radius Fig. 4. Fault detection using SP E indicator with MWPCA of the data acts as a safe-zone. In other words, the PCA model for TEP data model is insensitive to outliers with amplitude smaller than 8 the radius (uncertainty), because considered as a normal variation of the process. As for the outliers with greater 6 amplitude than the radius, they are normally detected as faults. SPE 4 Indeed, by inspecting the figures 2 and 3, which represent 2 the detection indices SP E and ISP E for the MWMRPCA model, we can clearly notice the absence of the bias ζx9 0 representing the uncertainty (present between time 200 100 200 300 400 500 600 700 800 900 1000 and 300), as well as the presence/detection of the fx3 faults time sample simulated from moment 700. This is due to the nature Fig. 5. Fault detection using SP E indicator with Classical of the interval model, which considers the bias ζx9 as a PCA model for TEP data normal variation of the process. The MWMRPCA model thus manages to follow the dynamics of the system, to detect faults, while providing more robustness with respect has mislead the dynamic MWPCA model. More precisely, to measurement uncertainties. Note that the SP E index the simulated bias ζx5 is considered as a fault by the depends on two thresholds, one for each of its bounds, MWPCA model thus resulting in a stopping of model while the ISP E depends only on a single threshold. The update. The impact is a huge rate of false detections from latter has better performance with regard to the number the moment of the presence of uncertainties (from time of false alarms, and in distinguishing the fault. sample 200). A conventional dynamic PCA monitoring strategy, MWPCA in this case, can be thus insufficient to According to the SP E criterion calculated in the case of handle highly noisy systems. Figure 5, represents the SP E the MWPCA model, represented in the figure 4, we can index combined with static PCA, which demonstrates its notice that the uncertainty ζx9 present in the variable x9 total incapacity in detecting faults in a variant system. 5. CONCLUSION Le-Rademacher, J. and Billard, L. (2012). Sym- bolic covariance principal component analysis and In this work, a new algorithm based on moving window visualization for interval-valued data. Journal of PCA is presented, for modelling and monitoring of dy- Computational and Graphical Statistics, 21(2), 413– namic processes. The model used is an interval variant of 432. doi:10.1080/10618600.2012.679895. PCA, called MRPCA, which treats separately the matrices Li, W., Yue, H., Valle-Cervantes, S., and Qin, S. of the centers and the radii extracted from the interval- (2000). Recursive pca for adaptive process monitoring. valued data matrix. Applied on-line, the proposed algo- Journal of Process Control, 10(5), 471 – 486. doi: rithm verifies for each sample the presence or absence of https://doi.org/10.1016/S0959-1524(00)00022-6. faults, using interval statistical indices (SP E and ISP E). Nomikos, P. and MacGregor, J.F. (1995). Multivariate spc After that, the update of the model and the normaliza- charts for monitoring batch process. Technometrics, 37, tion parameters is performed. The presented approach 414. improves not only the robustness toward measurement Palumbo, F. and Lauro, N. (2003). A pca for interval- uncertainties, but also allows to have a confidence zone valued data based on midpoints and radii. in New defined by the radius of the interval, making it possible Developments in Psychometrics, eds. H. Yanai, A. to consider any additional amount of information within Okada, K. Shigemasu, Y. Kano, and J. Meulman, Tokyo, this zone as a part of the normal operation of the system. 641–648. Applied to the Tenessee Eastman Process, the proposed Qin, S.J. (2003). Statistical process monitoring: basics algorithm demonstrates good performance over the static and beyond. Journal of Chemometrics, 17(8-9), 480– PCA model, and the MWPCA model with sliding window. 502. doi:10.1002/cem.800. Wang, H., Guan, R., and Wu, J. (2012). Cipca: Complete- REFERENCES information-based principal component analysis for interval-valued data. Neurocomput., 86, 158–169. doi: Ait-Izem, T., Harkat, M.F., Djeghaba, M., and Kratz, 10.1016/j.neucom.2012.01.018. F. (2015). Fault detection and isolation us- Wang, X., Kruger, U., and Irwin, G.W. (2005). Process ing interval principal component analysis methods. monitoring approach using fast moving window pca. IFAC-PapersOnLine, 48(21), 1402 – 1407. doi: Industrial & Engineering Chemistry Research, 44(15), https://doi.org/10.1016/j.ifacol.2015.09.721. 9th IFAC 5691–5702. doi:10.1021/ie048873f. Symposium on Fault Detection, Supervision andSafety for Technical Processes SAFEPROCESS 2015. Ait-Izem, T., Harkat, M.F., Djeghaba, M., and Kratz, F. (2017). Sensor fault detection based on principal component analysis for interval-valued data. Quality Engineering. doi:10.1080/08982112.2017.1391288. Benaicha, A., Mourot, G., Benothman, K., and Ragot, J. (2013). Fault detection and isolation with interval prin- cipal component analysis. 162–167. Proceedings Engi- neering and Technology PET. International Conference on Control, Engineering and Information Technology. Box, G. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems, i. effect of inequality of variance in the one-way classification. 25. Cazes, P., Chouakria, A., Diday, E., and Schektman, Y. (1997). Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée, 45(3), 5–24. Chouakria, A. (1998). Extension des Methodes d’analyse Factorielle à des Données de Type Intervalle. Ph.D. thesis, Université Paris-Dauphine. D’Urso, P. and Giordani, P. (2004). A least squares approach to principal component analysis for interval valued data. Chemometr Intell Lab Syst, 70(2), 179,192. Gioia, F. and Lauro, C. (2006). Principal component analysis on interval data. Computational Statistics, 21, 343–363. Harkat, M.F., Mourot, G., and Ragot, J. (2006). An improved pca scheme for sensor fdi: Applica- tion to an air quality monitoring network. Journal of Process Control, 16(6), 625 – 634. doi: https://doi.org/10.1016/j.jprocont.2005.09.007. Jackson, J.E. and Mudholkar, G.S. (1979). Control pro- cedures for residuals associated with principal compo- nent analysis. Technometrics, 21(3), 341–349. doi: 10.1080/00401706.1979.10489779.