<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Student's Scientific
Conference</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>MULTI-PCA DRIVEN APPROACH for FAULT DETECTION and ROOT CAUSE ANALYSIS of PROCESS EQUIPMENT</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>ABB Ability Innovation Centre ABB Ability Innovation Centre ABB, Power Generation BMS College of Engineering Bangalore</institution>
          ,
          <addr-line>India Bangalore</addr-line>
          ,
          <country>India Cleveland</country>
          ,
          <addr-line>USA Bangalore</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <volume>6</volume>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Principal Component Analysis (PCA) is quite popular for fault detection and diagnosis in industrial applications. PCA assumes linear relationships among the features and serves to represent them as a linear combination. However, a typical industrial application can have non-linearity due to operation at multiple operating regions or inherent non-linear relationships among the features. This paper proposes a novel clustering based Multi-PCA approach which can divide the overall non-linearity into simpler linearity's which can subsequently be modelled by multiple PCA models. The clustering is done with the use of domain knowledge where the fact that an operation of an asset at different operating points can lead to multimodal distribution of the variables. The proposed approach is structured systematically with the following steps 1) Feature set selection 2) Hierarchical Density Based Spatial Clustering (HDBSCAN) and 3) Fitting a PCA model in each cluster. The proposed approach retains the computational simplicity of the PCA compared to models based on other non-linear modelling approaches such as neural network based autoencoders. Finally the paper also proposes a simplified Root Cause Analysis (RCA) algorithm for identifying the cause of the fault.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Industrial assets such as motors, pumps, fans, turbines etc.
are subject to faults and failures due to operation at excess
load conditions or due to aging effects. Identifying that an
industrial asset is drifting towards an abnormal condition is
the key to avoid unplanned downtime of an industry due
to asset failure. In literature, there are two important
approaches to tackle this challenge of detecting abnormal
asset health. The first approach is based on detailed know-how
and physics of the asset and second approach is black box.
The first approach works well for simpler assets such as a
motor, as the underlying physics is well established.
However, this approach is not easily scalable and it requires one
to develop physics based models for every asset.
Additionally, as industrial assets become complex, such an approach
is difficult to implement. Hence, there is a significant shift
to apply data driven approaches for asset health monitoring.
Due to availability of low cost sensors and digital
technology, lots of data can be collected from an industrial asset and
approaches based on machine learning principles can be
applied to learn the model for the asset, in a semi-automated
manner. Such an approach is easily scalable and can be
applied to a variety of machines.</p>
      <p>
        Some of the earliest fault detection techniques were
model based. One such popular algorithm was based around
analytic redundancy, wherein a comparison between the
inputs of the monitored system and the output obtained from
an analytical mathematical model was carried out to detect
the presence of a fault
        <xref ref-type="bibr" rid="ref12">(M. Frank 1990)</xref>
        . However this
comparison was a naive estimate and failed to capture faulty
conditions in high dimensional spaces. Following which several
approaches based on multivariate-statistical process control
methods
        <xref ref-type="bibr" rid="ref14">(MacGregor and Kourti 1995)</xref>
        ;
        <xref ref-type="bibr" rid="ref10">(Kresta and
Marlin 1991)</xref>
        ;
        <xref ref-type="bibr" rid="ref15">(Macgregor 1994)</xref>
        were presented for the
diagnosis of complex physical processes. The usage of state
observers by modelling faults as state variable changes
(Isermann 2005) provided a better strategy for aberration
detection compared to statistical processes albeit at higher
computational cost.
      </p>
      <p>
        A pressing need to capture and localize abnormalities
at reduced computational rates brought about the usage of
Principal Component Analysis (PCA). PCA defines a new
outlook to the data and aims to capture hidden structure
underneath data redundancy and noise
        <xref ref-type="bibr" rid="ref17">(Pearson 1901)</xref>
        . An
abundant list of algorithms based around the PCA is
evident in literature. One such approach involves using the Q
and T2 statistic
        <xref ref-type="bibr" rid="ref22">(Villegas, Fuente, and Rodr´ıguez 2010)</xref>
        for
fault detection. This methodology was subsequently
simulated for fault detection in a waste water treatment plant
        <xref ref-type="bibr" rid="ref5">(Garcia-Alvarez 2009)</xref>
        wherein the authors showcase results
which capture local linear structure only. An improvement to
the conventional PCA model was brought about by
introducing the dynamic PCA (DPCA)
        <xref ref-type="bibr" rid="ref19">(Russell, Chiang, and Braatz
2000)</xref>
        which is established by considering the dependency
of current observations on previous time instances as well.
A non-linear modification to the PCA involved the
combination of using the Kronecker product, wavelet decomposition
(a) Clustered Data
(b) PCA applied to each cluster
and sliding median filter for determination of a fault in
nonlinear data-sets
        <xref ref-type="bibr" rid="ref23">(Zhang, Li, and Hu 2012)</xref>
        .
      </p>
      <p>
        All of these methods suffer from the inability of the
Hotelling’s T 2 to identify and isolate the responsible feature
(The Hotelling’s T 2 test is simply a multivariate counterpart
of the T-test). Further the fault detection index used is
extremely sensitive to anomalies, making them susceptible to
false positives. In order to fix this problem, a new fault
detection index based on the sum of the squares of the last few
principal components weighted by the inverse of their
variances was developed which yielded a good detection rate in
the dependent as well as independent variables
        <xref ref-type="bibr" rid="ref2">(Benaicha et
al. 2010)</xref>
        . This work also points out that hierarchical
contribution plots provide sufficient partitioning to localize any
anomaly, provided the stochastic nature of bloc size is
evaluated by a definite formula. A latest development pertinent to
the PCA involves decomposing variables using the
Empirical Mode Decomposition (EMD)
        <xref ref-type="bibr" rid="ref3">(Du and Du 2018)</xref>
        . Fault
detection is subsequently carried out by applying PCA to
the decomposed variables and detecting small shifts in data
using a Cumulative Sum control cart (CUMSUM).
      </p>
      <p>
        Kernel PCA
        <xref ref-type="bibr" rid="ref20">(Scho¨lkopf, Smola, and Mu¨ller 1996)</xref>
        extends the idea to the non-linear case wherein the kernel trick
is used to learn a linear representation in non-linear space.
While these models have been deployed with significant
success to capture non-linearity’s their success is correlated
with the assumption of the data generating process. Most
often, an RBF kernel is used so as to encode maximum
uncertainty about the data generating distribution and justifiably
so because the central limit theorem points to the
behavior of several random variables to be Gaussian distributed.
However, this may not be the case for several real world
processes. Therefore selecting the correct kernel may prove to
be an exhaustive process which scales exponentially with
more data and even after, may produce poor results. Our
method circumvents both these problems while remaining
inexpensive.
      </p>
      <p>
        Artificial Neural Nets (ANN) have also been employed
for aberration detection. One such work carries out anomaly
detection and root cause analysis using a Bayesian network
        <xref ref-type="bibr" rid="ref1">(Amin 2018)</xref>
        . A similar analysis which borrows the
technique of decomposing the T 2 statistic proved to be
extremely effective for non-linear fault diagnosis
        <xref ref-type="bibr" rid="ref21">(Verron, Li,
and Tiplica 2010)</xref>
        .
      </p>
      <p>
        While most of the proposals in literature are promising,
the need to maintain a balanced stance on computational
simplicity while simultaneously having to achieve
significant sensitivity for fault detection is still a problem that
remains unsolved. A simple yet robust solution that the PCA
offers is limited in scope due to it’s linear nature
        <xref ref-type="bibr" rid="ref22">(Villegas,
Fuente, and Rodr´ıguez 2010)</xref>
        ,
        <xref ref-type="bibr" rid="ref5">(Garcia-Alvarez 2009)</xref>
        ,
        <xref ref-type="bibr" rid="ref19">(Russell, Chiang, and Braatz 2000)</xref>
        .
      </p>
      <p>Our main contribution is an extension of the classical
PCA framework for non-linear systems. We propose a
systematic approach to capture non-linearity through several
linear models (by applying several PCA models on chunks
of localized data) while retaining the computational
simplicity of the single PCA. This paper successfully demonstrates
the proposed approach on an industrial asset having several
years of historical data.</p>
      <p>
        The Multi-PCA (Fig.1) offers a simple solution of
breaking the non linearity through clustering and then building
a PCA model for each cluster. This approach results in a
framework that can detect the faults with reasonable
accuracy. The concept proposed by Liling Ma et al.
        <xref ref-type="bibr" rid="ref13">(Ma et al.
2004)</xref>
        presents a similar idea of using multiple PCA
models. In this case however, process monitoring is achieved by
weighing each of the sub-PCA models using the K-means
clustering technique and creating a decision boundary based
on Hotelling’s T 2 statistic. K-means clustering is biased
to choose local data points because it splits the space into
Voronoi cells. Moreover, it performs poorly when tasked
with finding clusters in data inherent with varying
densities and is acutely affected by the choice of K. K-means
clustering does not in any way identify noise prevalent in
the data and assigns them to a cluster regardless of its
influence. As opposed to the naive clustering processes adopted
in
        <xref ref-type="bibr" rid="ref13">(Ma et al. 2004)</xref>
        , the Multi-PCA approach proposed in this
paper, employs the Hierarchical density based Spacial
clustering (HDBSCAN) algorithm
        <xref ref-type="bibr" rid="ref16 ref4">(McInnes, Healy, and Astels
2017)</xref>
        , which is radically inexpensive and robust. The elbow
method developed about the mean square error serves as
a sufficient statistic for setting the hyper parameters of the
HDBSCAN, following which the Multi-PCA modelling
approach is applied to the clustered space. While in
        <xref ref-type="bibr" rid="ref13">(Ma et al.
2004)</xref>
        the SOFM (self-organizing feature map) neural
network
        <xref ref-type="bibr" rid="ref8">(Kohonen and Honkela 2007)</xref>
        calculates fault
thresholds using the multiple PCA components, we showcase that
determining thresholds from reconstructions of projected
data provides similar results at a fraction of the
computational cost.
      </p>
      <p>We also present a novel feature selection strategy to select
essential features for clustering. It is to be noted that our
model can only detect known fault states that the Multi-PCA
model encounters during training. Hence it is necessary to
provide a wide array of fault cases to the model.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Preliminaries</title>
      <p>This section explains Multiple Principal component analysis
(Multi-PCA), describes it’s algorithmic flow chart for fault
detection and provides an overview of the Hierarchical
Density Based Spacial Clustering algorithm along with
illustration of the feature selection methodology for the process of
clustering.</p>
      <sec id="sec-2-1">
        <title>2.1 Multiple Principal Component Analysis</title>
        <p>
          The Principal Component Analysis (PCA)
          <xref ref-type="bibr" rid="ref17">(Pearson 1901)</xref>
          is
an orthogonal transformation that carries out dimensionality
reduction by converting a multivariate space into a subspace
which preserves maximum variance of the original space in
minimum number of dimensions. PCA can be thought of as
looking at data from it’s most informative viewpoint in the
transformed space.
        </p>
        <p>The Multi-PCA borrows this characteristic and extends it
to non-linear data by clustering the data space into several
clusters and applying an independent PCA on each (Fig.1).
The essence of clustering the data space is to account for
several operating regions prevalent in the steady state data
of the plant.</p>
        <p>To formally describe the process of fault detection
using the Multi-PCA, consider a standardized (zero mean
and unit variance) data matrix X 2 Rnxm (representing
the steady state model of the plant), where n indicates
the number of samples and m denotes the number of
feature variables. Clustering the data space X leads to clusters
x1; x2; x3; :::; xq 2 X where q is equal to the number of
clusters. Assuming each component (xi k i = 1; 2; ::; q) to
be independent of each other, the co-variance matrices Ci of
xi 8 i = 1; 2; ::; q describing the variance between the
features can be constructed as:</p>
        <p>Ci =</p>
        <p>1
1 ni
the singular value decomposition of Ci 2 Rmxm is given
as:
where Wi = Ci Cit; 8 i = 1; 2; ::; q and the columns
of Vi are the eigenvectors of Ci. The transformation matrix
for each cluster Pi is formulated by choosing a eigenvectors
(columns of Ci) corresponding with a eigenvalues.</p>
        <p>Ti = xi Pi 8 i = 1; 2; ::; q
equation 4 describes the transformation of each cluster to
a reduced dimension and Pi denotes the transformation
matrix for it’s respective cluster. A standard measure used to
calculate a i.e. number of principal components, based on
desired variance is specified by the cumulative percent
variance (CPV) formulation:
(3)
(4)
Pa j</p>
        <p>j=1
trace(Ci)
CPV(a) =
100
(5)
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Sequence of events for Multi-PCA fault detection</title>
        <p>We use the trends of Hotelling’s T 2 and Q statistics to
analyze abnormality in the data and tailor it to cater for fault
detection in the case of Multi-PCA.</p>
        <p>Steady state sensor data collected from an industrial
asset is normalized and treated as training data. Fault data
recorded during the malfunction of the industrial asset
serves as test case to detect anomalies using the Multi-PCA
model.</p>
        <p>The model is formulated based on the training data alone,
consider X to be the training data and Xf to be the fault data.
X is clustered into different operating regions by the
Hierarchical Density based Spacial Clustering Algorithm
(subsection 2.3) yielding clusters with unique cluster Id’s. The
K-Nearest Neighbors (KNN) classifier is now employed to
classify test data points into one of the cluster Id’s based on
majority voting of the data points by considering it’s K
nearest neighbor’s. KNN is based on the Eucledian distance and
provides a simple, inexpensive and robust solution to
designate data points into different clusters.</p>
        <p>Following this, principal components are determined
independently for each cluster in the training data (Eq. 4).
Let j denote the output clusters from KNN for the test data,
j 2 ( q) j (q) itself i.e, j can either contain a few or all
the clusters of the training data set X. Equation 4 can now
be extended as:</p>
        <p>Tjf = yjf Pj 8 j = 1; 2; : : : ; q or &lt; q
(6)
where, yjf are the test data clusters classified by KNN and
Pj = Pi k 8 j 2 i. Equation 6 transforms the data onto the
new space based on the steady state transformation matrices
Pi. Inverse transformations are applied to revert the training
and the fault data back to ’m’ dimensional space.</p>
        <p>ycjf = Tjf Pjt 8 j = 1; 2; : : : ; q or &lt; q (7)
the inverse transforms of eq. 7 carries with it the error of
projection. This error is expected to be large for the faulty
case and is computed for each cluster as:</p>
        <p>Ef = Yjf</p>
        <p>Ydjf
8 j = 1; 2; ::; q
(8)</p>
        <p>The threshold is set based on the validation set. KNN is
used to classify the validation data points into clusters, each
of these clusters are projected onto their respective steady
state PCA model and are reconstructed back. The
reconstructed data is compared with the original validation data
set to produce a sample error. A threshold of 3 standard
deviations from the mean of the sample error is set, which serves
as the decision boundary to detect anomalies. Fig.2 depicts
the algorithm.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Hierarchical Density Based Spacial</title>
      </sec>
      <sec id="sec-2-4">
        <title>Clustering Algorithm (HDBSCAN)</title>
        <p>
          The concept of Multi-PCA requires clustering the data into
different operating zones (Fig. 1). Literature presents several
algorithms for clustering, however each of these have a trade
off to account for in terms of computation cost and data size.
Figure 3 presented in
          <xref ref-type="bibr" rid="ref16 ref4">(McInnes, Healy, and Astels 2017)</xref>
          depicts the superior performance of the HDBSCAN over the
current state of art algorithms.
        </p>
        <p>
          HDBSCAN transforms a N-dimensional space according
to the density of the data by defining a new distance metric.
It’s hyper-parameters include minimum cluster size and
minimum samples which were decided through the elbow
technique. Using the distance matrix thus obtained it constructs
a minimum spanning tree based on Prim’s algorithm
          <xref ref-type="bibr" rid="ref18">(Prim
1957)</xref>
          . A dendrogram is formed by arranging the edges of
the spanning tree in the increasing order of their distance and
thereby creating clusters for each edge group. The important
clusters are retained by a measure of = dist1ance giving an
indication of ”how long” the clusters retain themselves.
        </p>
        <p>
          HDBSCAN scales well to large datasets and is effective
at global clustering. The algorithm also detects outliers in
the data and classifies them as noise. These outliers are data
points that are obtained as a result of sensor faults and
signify noise in the dataset. For example: A faulty
tachometer may output a negative value of speed or a very large
value that is improbable. HDBSCAN was found to
identify such stray data points and these points were removed.
HDBSCAN thus provided a way to account for sensor
related noise and drift.
Clustering of a multidimensional dataset requires feature
selection. Correlated features do not aid the clustering
methodology. They increase computational time without improving
cluster quality. A need to present only important features
to the clustering methodology has led to several algorithms
developed in literature. Michael Fop and Thomas Brendan
          <xref ref-type="bibr" rid="ref4">(Fop and Murphy 2017)</xref>
          present several approaches
involving Gaussian mixture models and latent class analysis
models. Dirichlet process mixture models were also proposed for
variable selection
          <xref ref-type="bibr" rid="ref6">(Kim, Tadesse, and Vannucci 2006)</xref>
          . As
opposed to finding a common feature subset that is relevant
to all clusters, Yuanhong Li et al.
          <xref ref-type="bibr" rid="ref11">(Li, Dong, and Hua 2008)</xref>
          developed a localized feature selection method for
clustering.
        </p>
        <p>The proposed method of feature selection is based on
inspecting the density plots of each variable and looking for
features exhibiting distinct operating regions (multi-modal
distributions). This method provided sufficient
simplification and served as a robust criterion for selecting distinct
variables for clustering. The idea behind such a feature
selection method is that, if an asset operates at N distinct
operating regimes, then one can expect N distinct peaks in its
density plot.</p>
        <p>Variables with two or more operating regions are chosen
for clustering, while the rest are rendered redundant in this
particular analysis. An illustration is shown in Fig.4
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Case Study</title>
      <p>We assess the performance of the proposed Multi-PCA
through a comparison with the Single PCA approach
employing the methodology aforementioned.</p>
      <sec id="sec-3-1">
        <title>Data Description</title>
        <p>Gas turbine data of a power generation plant comprising of
forty five features sampled over one minute intervals was
the dataset used for this case study. The training data used
to develop the steady state model X, is a matrix of
dimensions 37200x45 with total 37200 samples each having 45
variables. The data also has information about the dates on
which the faults are reported. Six test files are prepared as
test cases to detect anomalies corresponding to the 6 faults.
Each fault file contains data 24 hours prior to the fault
reporting time. Hence each fault file acts as a test case for
the proposed algorithm and ideally should detect possible
anomalies.</p>
        <p>We demonstrate the effectiveness of the proposed
MultiPCA algorithm over a single PCA algorithm. Table: 1
summarizes the various test case results for both algorithms. In
some test cases, both single PCA and Multi PCA algorithms
detect faults. Whereas in other test cases (Cases: 1, 3, 5, and
6) only Multi-PCA approach was able to detect the fault.
Also, in all test cases, Multi-PCA was able to detect faults.</p>
        <p>In order to prove the point further, fault case F5 is
analyzed in detail. For F5, both the single PCA and the
MultiPCA models are created based on the training data, and the
projected test data is reconstructed back. In Fig.8 the
actual and reconstructed signal of a variable called turbine
speed is depicted. Fig.8a is the original turbine speed
signal, whereas Fig.8b indicates the reconstructed signal using
the single PCA model and Fig.8c indicates the reconstructed
speed signal using the Multi PCA model. It is clear from
Fig.8 that the Multi-PCA approach is able to reconstruct the
signal very well and closely matches with that of the training
data signal. This is a possibility since the Multi-PCA divides
the data into multiple regimes whereas a Single PCA fits to
the entire data distribution. Also, feature selection (section
2.4) plays the role of a naive correlation detector and in the
case of F5, it identifies seven variables out of forty five to be
uncorrelated.</p>
        <p>The Hierarchical clustering algorithm (section 2.3) uses
only these seven variables to cluster the data into two
clusters, while simultaneously detecting outliers prevalent in the
data; Fig.5 depicts clustering of the Turbine Flow Speed
variable. We use the uncorrelated features to ensure that
redundant features do not interfere with the process of
clustering. Once the single PCA and Multi-PCA models are built,
F5 data is projected onto its respective principal
components, are reconstructed back and MSE per sample is
computed. The results are as shown in Fig.6 and Fig.7. As per
Fig.6, the mean squared errors (MSE) vary in range of
0600 indicating that the single PCA is not able to capture all
variation in the data. The MSE threshold is set at 100 (based
on the validation data) to decide if a data point is normal or
not. In the case of F5, using the single PCA model,
majority of the sample errors are within the threshold, providing
an incorrect indication that the gas turbine is in normal
operation. Therefore, the single PCA model is not confident
to mark the data set as faulty. Whereas, in the case of the
Multi-PCA approach (Fig.7), a good fit to the data in both
clusters (clustered by HDBSCAN) is achieved. This snug fit
places the majority of the sample errors above the calculated
validation threshold (different from that of the single PCA),
conclusively indicating a fault in F5.
In order to test the proposed algorithm’s ability to detect
normal operation of the gas turbine, a new test file was
prepared using 24 hours of the data from normal operation of
the gas turbine. This construed data set was not used during
training of the Multi-PCA algorithm. Results for this case
are as shown in Fig.9, the test file was found to contain three
clusters when clustered with a set of ten uncorrelated
features and the reconstructed sample errors for each cluster
indicated normal operation of the gas turbine as majority of
the test samples were well below the threshold values of the
corresponding cluster. This result showcases the lack of bias
(a) MPCA fault detection in cluster 1 (Fault data)
(a) Actual Fault data (Flow Speed variable) (F5)
(b) MPCA fault detection in cluster 2 (Fault data)
of the Multi-PCA model towards faults while successively
demonstrating its ability to classify anomalies well.</p>
        <p>Another experiment was conducted in order to test the
Multi-PCA against sensor bias faults. A bias was
deliberately added to two variables in the steady state dataset which
is representative of the normal operation of the gas turbine.
The Multi-PCA algorithm was used to detect a fault in such
data. The results of this test is shown in Fig.10. The
MultiPCA algorithm indicates abnormal behavior as the MSE per
sample is greater than the threshold value for majority of
the data points. The threshold at which a bias triggers the
fault was found to be approximately three percent above the
steady state value of the variables.</p>
        <p>Test Case</p>
        <p>SPCA
F1.</p>
        <p>F2.</p>
        <p>F3.</p>
        <p>F4.</p>
        <p>F5.</p>
        <p>F6.</p>
        <p>168.361
349.667
95.498
1252.578
121.629
71.819</p>
        <p>MPCA
Root cause analysis provides insight into which particular
variable is contributing to the anomaly. RCA provides a
contribution plot which indicates how each variable is
contributing in magnitude to the anomaly. Fault F5 is used to
demonstrate the root cause analysis performed using Multi-PCA
approach and results for the same are as depicted in Fig.11.</p>
        <p>The scatter plot (top left corner) is a plot of the sample
error for F5. The bar graph indicates the magnitude of
contribution of any variables to the fault and weighs them in
decreasing order. The top five variables contributing to the
fault are identified (Note that the contributing variables are
(a) MPCA fault detection in cluster 1
(a) MPCA fault detection in cluster 1
(b) MPCA fault detection in cluster 2
(b) MPCA fault detection in cluster 2
(c) MPCA fault detection in cluster 3
those of the raw dataset (Rm) and not of the principal
components; All aspects of fault detection are carried out in the
raw un-transformed space itself). In order to provide further
insight to the subject matter expert (SME), the steady state
signal and the aberrant signal are compared as shown in the
bottom section of Fig.11. RCA presents the user with a tool
to interactively detect the variables contributing to a fault.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>This paper has successfully demonstrated the superiority of
the Multi-PCA approach over the single PCA in an
industrial case study of a gas turbine. The Multi-PCA approach is
able to detect all six faults of the gas turbine. The Multi-PCA
approach is also able to detect sensor bias issues in a dataset.
This novel approach of feature selection, data clustering
followed by PCA model building was found to be quite robust
for industrial applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Fault Detection and Root Cause Diagnosis using Dynamic Bayesian Network</article-title>
          .
          <source>Ph.D. Dissertation</source>
          , Memorial University of Newfoundland.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Benaicha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guerfel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bouguila</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Benothman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>New Pca-Based Methodology for Sensor Fault Detection and Localization</article-title>
          .
          <source>In International Conference of Modeling Simulation MOSIM</source>
          (Vol.
          <volume>10</volume>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Fault Detection using Empirical Mode Decomposition based PCA and CUSUM with Application to the Tennessee Eastman Process</article-title>
          .
          <source>IFACPapersOnLine</source>
          <volume>51</volume>
          (
          <issue>18</issue>
          ):
          <fpage>488</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Fop</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>T. B.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Variable Selection Methods for Model-based Clustering</article-title>
          . arXiv
          <volume>12</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Garcia-Alvarez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Fault detection using Principal Component Analysis (PCA) in a Wastewater Treatment Isermann</article-title>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>Model-based fault-detection and diagnosis - status and applications</article-title>
          .
          <source>Annual Reviews in Control</source>
          <volume>29</volume>
          :
          <fpage>71</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Tadesse,
          <string-name>
            <surname>M. G.</surname>
          </string-name>
          ; and Vannucci,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>Variable selection in clustering via Dirichlet process mixture models</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>Biometrika</source>
          <volume>93</volume>
          (
          <issue>4</issue>
          ):
          <fpage>877</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kohonen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Honkela</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Kohonen network</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <issue>Scholarpedia 2</issue>
          (
          <issue>1</issue>
          ):
          <fpage>1568</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Kresta</surname>
            ,
            <given-names>J. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Marlin</surname>
            ,
            <given-names>T. E.</given-names>
          </string-name>
          <year>1991</year>
          .
          <article-title>Multivariate statistical monitoring of process operating performance</article-title>
          .
          <source>The Canadian Journal of Chemical Engineering</source>
          <volume>69</volume>
          (
          <issue>1</issue>
          ):
          <fpage>35</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Localized feature selection for clustering</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>1990</year>
          .
          <article-title>Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy. a survey and some new results</article-title>
          .
          <source>Automatica</source>
          <volume>26</volume>
          :
          <fpage>459</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , L.;
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Multi-pca models for process monitoring and fault diagnosis</article-title>
          .
          <source>IFAC Proceedings Volumes</source>
          <volume>37</volume>
          :
          <fpage>667</fpage>
          -
          <lpage>672</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>MacGregor</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kourti</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Statistical process control of multivariate processes</article-title>
          .
          <source>Control Engineering Practice</source>
          <volume>3</volume>
          :
          <fpage>403</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Macgregor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1994</year>
          .
          <article-title>Statistical process control of multivariate processes</article-title>
          .
          <source>IFAC Postprint</source>
          Volume
          <volume>427</volume>
          - 437.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Healy</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Astels</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>hdbscan: Hierarchical density based clustering</article-title>
          .
          <source>J. Open Source Software</source>
          <volume>2</volume>
          (
          <issue>11</issue>
          ):
          <fpage>205</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Pearson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>1901</year>
          . Liii.
          <article-title>on lines and planes of closest fit to systems of points in space</article-title>
          .
          <source>The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science</source>
          <volume>2</volume>
          (
          <issue>11</issue>
          ):
          <fpage>559</fpage>
          -
          <lpage>572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Prim</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          <year>1957</year>
          .
          <article-title>Shortest connection networks and some generalizations</article-title>
          .
          <source>Bell System Technical Journal</source>
          <volume>36</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1389</fpage>
          -
          <lpage>1401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>L. H.</given-names>
          </string-name>
          ; and Braatz,
          <string-name>
            <surname>R. D.</surname>
          </string-name>
          <year>2000</year>
          .
          <article-title>Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis</article-title>
          .
          <source>Chemometrics and Intelligent Laboratory Systems</source>
          <volume>51</volume>
          (
          <issue>1</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Scho</surname>
          </string-name>
          ¨lkopf, B.;
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and Mu¨ller, K.-R.
          <year>1996</year>
          .
          <article-title>Nonlinear component analysis as a kernel eigenvalue problem</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Verron</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; and Tiplica,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Fault detection and isolation of faults in a multivariate process with bayesian network</article-title>
          .
          <source>Journal of Process Control</source>
          <volume>20</volume>
          (
          <issue>8</issue>
          ):
          <fpage>902</fpage>
          -
          <lpage>911</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fuente</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Rodr´ıguez,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Principal component analysis for fault detection and diagnosis, experience with a pilot plant</article-title>
          .
          <source>Proceedings of the 9th WSEAS International conference on computational intelligence</source>
          ,
          <source>manmachine systems and cybernetics 147-152.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Improved multi-scale kernel principal component analysis and its application for fault detection</article-title>
          .
          <source>Chemical Engineering Research and Design</source>
          <volume>90</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1271</fpage>
          -
          <lpage>1280</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>