=Paper= {{Paper |id=None |storemode=property |title=Using Tree Augmented Naive Bayesian Classifiers to Improve Engine Fault Models |pdfUrl=https://ceur-ws.org/Vol-818/paper12.pdf |volume=Vol-818 }} ==Using Tree Augmented Naive Bayesian Classifiers to Improve Engine Fault Models== https://ceur-ws.org/Vol-818/paper12.pdf
    Using Tree Augmented Naı̈ve Bayes Classifiers to Improve Engine
                           Fault Models



    Daniel L.C. Mack             Gautam Biswas              Xenofon D. Koutsoukos          Dinkar Mylaraswamy
        EECS Dept.                  EECS Dept.                    EECS Dept.                Honeywell Aerospace
    Vanderbilt University       Vanderbilt University         Vanderbilt University         1985 Douglas Drive N
    Nashville, TN 37212         Nashville, TN 37212           Nashville, TN 37212          Golden Valley, MN 55422


                      Abstract                               evidence generated to explore the presence of faults.
                                                             One such state-of-the-art system is the Aircraft Di-
      Online fault diagnosis is critical for detect-         agnostic and Maintenance System (ADMS) (Spitzer,
      ing and mitigating adverse events that arise           2007) that is used on the Boeing B777. The ADMS
      in complex systems such as aircraft, auto-             uses an expert-derived fault propagation model, called
      mobiles, and industrial processes. A typi-             the system reference model that captures the interac-
      cal fault diagnosis system consists of a ref-          tions between aircraft components under various op-
      erence model that mathematically links di-             erating modes. Generation of this reference model is
      agnostic monitors providing partial evidence           a manual process and often the step results in incom-
      to potential fault hypotheses. A reasoning             pleteness and inaccuracies in the development and de-
      algorithm operated on this model uses a set-           ployment of an ADMS.
      covering scheme to establish likely fault can-         Some of the incompleteness and inaccuracies can
      didates and their rankings. However, incom-            be overcome as the engineering teams acquire addi-
      pleteness in the reference model and simpli-           tional knowledge from an operating fleet, and generate
      fying assumptions affect the accuracy of the           heuristics rather than a systematic upgrade to the orig-
      reasoning algorithms. In this paper, we de-            inal reference model. In other words, a gap exists for
      scribe a Tree Augmented Naı̈ve Bayes Classi-           systematic upgrades and increments to the reference
      fier (TAN) approach to systematically extend           model even though vast amount of operational data is
      a reference model structure using data from            collected by operating airlines. Closing this gap using
      system operations. We compare the perfor-              advances in data mining methods is the focus of this
      mance of the TAN models against a typi-                paper. We describe a specific data mining approach for
      cal reference model, and demonstrate that              augmenting an existing aircraft engine reference model
      the TAN improves classification accuracy by            as an alternative to ad hoc approaches. We demon-
      finding new causal links among the system              strate the effectiveness of our work on data generated
      monitors.                                              from a realistic aircraft engine simulator.
                                                             Statistical analysis and designing classifiers for discov-
1     Introduction                                           ering knowledge from real-world data has been stud-
                                                             ied extensively. For example, Witten (Witten &
Aircraft are complex systems containing several inter-       Frank, 1999) describes several data mining approaches
acting components and subsystems such as propulsion,         for producing black box models. Unfortunately, such
electrical, flight management, avionics, and bleed sub-      models are very difficult to verify, making them al-
systems. Smooth and integrated operation of these            most impossible to certify for airworthiness. Further,
subsystems is essential to keep the aircraft operating       the lack of transparency in these models makes it dif-
safely. However, any operating system degrades over          ficult to append this new knowledge to existing ADMS
time and monitoring the system online for detecting          reference models. For practical purposes, data min-
the onset of unfavorable conditions and intrinsic faults     ing approaches for aircraft reference models have to
is essential for increasing aviation safety.                 “build upon” existing model structures rather than
                                                             create something new, which will incur considerable
The current state of online fault diagnosis is focused on
                                                             engineering overhead cost.
installing a variety of sensors onboard an aircraft along
with reasoning software to automatically interpret the       The proposed approach to combing data mining with
fault models is somewhat unique. The data mining           occur) independently of another failure mode f mj oc-
does not start from a clean slate, but builds up from      curring, that is, P (f mk = 1|f mj = 1) = P (f mk = 1).
an existing ADMS reference model structure. In sec-
                                                           To isolate and disambiguate failure modes, the model
tion 2, we describe a typical reference model struc-
                                                           also defines an entity called “evidence”. The jth evi-
ture along with the reasoning algorithm (called the
                                                           dence is denoted by ej and the set E denotes all dis-
W-algorithm). Next, we systematically enumerate the
                                                           tinct monitors defined for the system under consid-
missing or partially correct information in this state-
                                                           eration. The diagnostic monitor associated with the
of-the-art reference model. These gaps formalize the
                                                           ith evidence can either indict or exonerate a subset of
data mining problem described in Section 3. We dis-
                                                           failure modes called its ambiguity group. The monitor
cuss the use of Tree-Augmented Naı̈ve Bayes Networks
                                                           mi can take three mutually exclusive values allowing
(TANs) as a data driven modeling structure for di-
                                                           a monitor to express indicting or exonerating or un-
agnosis with causal probabilistic models in section 4.
                                                           known support for the failure modes in its ambiguity
The data mining approach is illustrated using data
                                                           group. The notations are described in equation (2).
from a high fidelity simulator. Section 5 discusses the
CMAPS-S simulator and the data selection task for                      mi = 0 ⇔ Exonerating evidence
our experiments. Section 6 describes the experimental
                                                                       mi = 1 ⇔ Indicting evidence                (2)
results using the CMAPS-S data set, with a compar-
ison of a Naı̈ve Bayes classifier that replicates a sys-             mi = −1 ⇔ Unknown evidence
tem reference model against a TAN classifier model
derived from a learning algorithm. Metrics are defined     Ideally, we want a monitor associated with evidence
for evaluating classifier performance, and a number of     ei to fire only when the failure modes in its ambiguity
different experiments are run to examine the addition      group are occurring. Given the fact that the ith fail-
of evidence to these models. Section 7 presents a sum-     ure mode is occurring in the system, dji denotes the
mary of our approach, and outlines our directions for      probability that there will be a monitor providing an
future work for diagnostic and prognostic reasoning        indicting evidence under this condition.
using the data mining algorithms.
                                                                          dji = P (mj = 1|f mi = 1),              (3)
2   Reference Models and Reasoning
                                                           dji is called the detection probability of failure mode
Model-based strategies for diagnosing large, complex,      monitor f mj with respect to the ith evidence. A mon-
real-world systems rely on domain experts to craft         itor may fire when there is no failure mode present in
the reference models used for monitoring and isolating     the system. False alarm probability is the probability
faults. The complexity of the system makes it almost       that an indicting monitor is present when there are no
impossible to create complete physics-based models         failure modes occurring in the system. That is,
with reasonable resources. A more pragmatic solution
is to rely on expert-generated cause-effect models. In              i = P (mi = 1|f mj = 0, ∀f mj ∈ F )          (4)
simple terms, the reference model of the system be-
ing monitored can be represented as a bipartite graph      A reference model describes the relation between fail-
consisting of two types of nodes: failure modes and        ure modes and monitors. The reference model is a 6-
evidence. The set F defines all distinct failure modes     tuple defined as: [ E, F, D, P r,  ] ,where: E is evidence
defined for the system under consideration. A failure      set, F is failure mode set, D is detection probabilities,
mode f mi ∈ F may be present or absent in the sys-         P r is a priori probability of failure modes,  is false
tem. This is defined as the state of the failure mode.     alarm rate for monitors.
In the primary model, we allow only binary (occur-
                                                           Figure 1 illustrates an example reference model graph-
ring or not-occurring) states for the failure mode. We
                                                           ically, with fault modes (hypotheses) as nodes on the
use the following shorthand notations regarding these
                                                           left, and diagnostic monitors (DM) on the right. Each
assertions.
                                                           link has an associated detection probability, i.e., con-
                                                           ditional probability P (mj = 1|f mi = 1). In addition,
                                                           fault nodes on the right contain the a priori probabil-
    f mi = 0 ⇔The failure mode is not present
                                                    (1)    ity of fault occurrence, i.e., P (f mi ). Probabilities on
    f mi = 1 ⇔The failure mode is present                  the DM nodes indicate the likelihood that a particu-
                                                           lar monitor would indicate a fault in a nominal system,
Every failure mode has an a priori probability of oc-      which as defined above is i . Bayesian methods are em-
curring in the system. This probability is denoted by      ployed to combine the evidence provided by multiple
P (f mi = 1). A failure mode f mk can occur (or not        monitors to estimate the most likely fault candidates.
                                          Figure 1: Example Reference Model


The reasoner algorithm (called the W-algorithm) com-            reference model are strictly binary. The DMs are often
bines an abductive reasoning algorithm with a forward           derived by applying a threshold to other real valued
propagation algorithm to generate and rank possible             features known as condition indicators(CIs). These
failure modes. This algorithm operates in two steps:            CIs are built as functions of sensors to provide more in-
(1) Abductive reasoning step: Associated with each              formation about the health of the system. The thresh-
DM is an ambiguity set, AG = {f m1 , f m2 , · · · f mk }.       olds applied to create DMs are selected by a domain
This step assumes that the firing of the DM implies             expert. Data collected from these systems more of-
at least one of the faults in the ambiguity set has oc-         ten contain raw sensors and the CIs rather than the
curred; and (2) Forward reasoning step: For each f mi           DMs. This creates an issue when trying to examine
belonging to the AG, we extract other DMs that sup-             structures built with data and comparing them to the
port f mi . We call this set the supporting DMs, or             expert crafted models. Our approach utilizes the idea
the monitors of interest, i.e., S − DMi for f mi . As           of the abstracted CIs when constructing models from
these additional monitors fire, f mi without that mon-          data. Models built with data and containing CIs or
itor in S − DMi are removed from the AG. Over time              other select sensors are only missing the thresholding,
as the monitors fire, AG reduces in size, and ideally, to       and as such, when the the probabilities are calculated,
a single f mi . Additional details about the reasoning          a Naı̈ve Bayesian model is in essence approximating
algorithm is described in (Honeywell, 2010).                    the reasoning algorithm above. No fault modes are re-
                                                                moved from consideration, but the probabilistic rank-
The reasoning algorithm generates multiple single
                                                                ing of all failure modes will render many with a prob-
fault hypotheses, each hypothesis asserting the oc-
                                                                ability at or near 0. The inference used in Bayesian
currence of exactly one failure mode in the system.
                                                                networks is calculated in the context of discretized val-
The basic probability update rules assume indepen-
                                                                ues (Conditional Probability Tables). Any necessary
dence of monitor firing events. In other words,
                                                                discretization of these values is providing a threshold-
P (mj , mk |f mi ) = P (mj |f mi ) P (mk |f mi ) for all mon-
                                                                ing that acts similar to the reasoning algorithm on an
itors mj and mk . The two independence assumptions
                                                                expert model. We believe these similarities are enough
on: (1) Fault modes, and (2) monitors implies that
                                                                to warrant comparisons in the analysis of our results.
the reasoning algorithm treats the reference model as
                                                                We utilize this similarity in computation of learned
a set of Naı̈ve Bayes classifiers. The direct correspon-
                                                                models and their metrics for evaluation.
dence between the reference model for diagnosis and
the simple Bayesian structure provides opportunities
to use a class of generative Bayesian model algorithms          3   The Data Mining Problem
to build these model structures from data and enhance
the existing structures produced by a domain expert.            The reasoning algorithm may not reduce the ambiguity
This reasoning algorithm assumes the DMs used in the            group to a single fault element. For example, all of
                                                                the evidence (i.e., DMs) required to isolate the single
fault may not fire, leaving the size of the ambiguity         they can be used as exploratory analysis tools
set to be greater than 1. In this case, the reference         by the domain experts. We envision a successive
model is incomplete. This gap can be addressed by             refinement process, where the expert requests a
employing heuristic rules or systematically discovering       sequence of experimental runs, each built from
new diagnostic monitors from vast amount of historical        their observations and interpretations from pre-
data.                                                         vious results generated by the algorithms. They
                                                              can interpret the causal relations between faults
A second source of error arises from the “independence
                                                              and monitors, and discover the dependence among
assumptions”. The assumption of independence be-
                                                              the monitors for different fault situations. The ex-
tween (1) different pieces of evidence and (2) differ-
                                                              pert may also consider different analysis scenarios
ent fault modes may lead to certain hypotheses be-
                                                              to estimate methods for increasing the accuracy
ing assigned higher likelihood than the evidence truly
                                                              (while reducing false positives) in the diagnostic
implies. This assumption is made primarily because,
                                                              reasoner.
causality (or correlation) between evidence in the sys-
tem is not easily discernible while the system is be-
ing designed and assembled. Furthermore, deriving         After considering these factors and staying within
conditional probability tables with joint probabilities   the Bayes net paradigm, we selected Tree Augmented
such as when nodes have multiple parents is a diffi-      Naı̈ve Bayes(TAN), a model that could address the
cult task for human experts, and can be derived from      factors in a reasonable fashion, as well as challenge
data. Therefore, the knowledge required to overcome       the independence assumption in limited ways.
the simplifying (but erroneous) assumptions of inde-
pendence are best derived by analyzing data from an       4   Data Mining with Tree Augmented
operating fleet.                                              Naı̈ve Bayes Networks
As implied above, the reference model that does not
make the simplifying independence assumptions can         The choice of the data driven techniques to apply to a
be interpreted as a Noisy-OR classifier, which is a       particular class of problems is very much a function of
simplified form of a standard Bayes Network. A num-       the nature of the data and the problem(s) to be solved
ber of Machine Learning techniques for building Bayes     using the data. For example, using data we can sys-
networks from data have been reported in the litera-      tematically test and relax the independence assump-
ture (Friedman, Geiger, & Goldszmidt, 1997) We have       tions employed in the reference model, especially if it
studied a number of these approaches in the frame-        is useful for diagnosis. There are several interesting
work of diagnostic and prognostic reasoning. Among        alternatives, but one that fits well with our reference
the important considerations have been the notion of      model structure is the Tree Augmented Naı̈ve Bayes
independence among the monitors that support the di-      (TAN) Method (Friedman et al., 1997). The TAN
agnostic reasoning. Our choice for a Bayesian model       structure is a simple extension of the Naı̈ve Bayes net-
and for the data mining algorithms that build these       work. Like Naı̈ve Bayes, the root node is the class
models has been guided by:                                node, corresponding to one or more fault modes, is
                                                          causally connected to every evidence (monitor) node.
 1. The data mining algorithms should be designed         In addition, the TAN structure relaxes the assump-
    to provide information that supplements existing      tion of independence between the evidence nodes, and
    expert-generated reference models. It is very im-     allows most evidence nodes to have a second parent,
    portant that the experts be able to interpret the     which can be a related evidence node. This maintains
    results of the data mining algorithms, and char-      the directed acyclic graph requirements and produces
    acterize them as:                                     a tree that captures relationships among the monitors.
                                                          Generation of this structure is not as computationally
     (a) new relations between monitors and fault         expensive as a general Bayesian network.
         hypotheses that will improve the reference
                                                          An example TAN structure is illustrated in Figure 2.
         model;
                                                          The class node is the fault hypothesis under consider-
     (b) additional monitors (both simple and ad-         ation. The other nodes represent supporting evidence
         vanced) that help differentiate and provide      for the particular fault hypotheses. In this structure,
         support for specific diagnostic hypotheses;      the only node connected to the class node, is the root
     (c) refinements to the conditional probability       observational node. Dependencies among the moni-
         values between hypotheses and monitors.          tors are captured as additional causal links in the TAN
                                                          structure.
 2. The computational complexity of the data min-
    ing algorithms should be manageable, so that          The TAN Structure can be generated in several dif-
                                                             features that result in excessive binning(thus building
                                                             very large conditional probability tables).

                                                             5   The CMPAS-S Data

                                                             The CMAPS-S data set is generated from a simulator
                                                             developed at NASA’s Glenn Space Center (Frederick,
                                                             DeCastro, & Litt, 2007). The engine simulator takes
                                                             into account the wear and tear on a turbine engine
                                                             over multiple flights, and it can produce data for a
                                                             number of sensors for climb, cruise, and descent modes
                                                             of operation. The simulator parameters can be set to
                                                             run in nominal and faulty modes of operation.
                                                             As a first step, we select appropriate sensor measure-
                                                             ments as features and transform them into a sequence
          Figure 2: Example TAN Structure                    of values for the data mining task. Since the reference
                                                             model structure and the reasoner do not directly in-
                                                             clude temporal information, the data is separated into
ferent ways that includes (1) a greedy search with the       the different modes of operation. For this study, all
constraint that illegal edges (i.e., a node having more      of the data for fault analysis was extracted from the
than one parent from the evidence nodes) are disal-          cruise mode of operation. In this mode, most sensor
lowed (Cohen, Goldszmidt, Kelly, Symons, & Chase,            values remain steady, except for measurement noise.
2004); and (2) a Minimum Weighted Spanning Tree              Therefore, for this study each flight was represented
(MWST) approach that builds a minimum spanning               as a datapoint consisting of a vector of sensor values,
tree to capture the dependencies among monitors, and         and the entire dataset was made up of n data points
then connects the class (fault mode) to all of the mon-      corresponding to n flights.
itor nodes (Friedman et al., 1997). In either case, a
                                                             Table 1 shows the different features in the CMAPS-S
decision has to be made about the monitor node to
                                                             data set. Some features are marked as a “condition in-
use as the observational root node in the derived tree
                                                             dicator”(CI), which is a term for complex features that
structure. The derived TAN structure is static, i.e., it
                                                             can be used to indicate when an engine is experiencing
does not include explicit temporal information through
                                                             abnormal behavior. A threshold on these values would
causality.
                                                             produce the health indicator (also called a diagnostic
A standard algorithm (e.g., Kruskal’s algo-                  monitor, DM) that a reference model would relate to
rithm (Kruskal, 1956)) is applied to generate                a fault mode.
the MWST. The edge weights of the MWST struc-
                                                             The reference model as defined above is in terms of
ture are a log likelihood function, e.g., Bayesian
                                                             DMs which in this data would be HIs. Since the data
value (Chickering, Heckerman, & Meek, 1997) or
                                                             contains only the CIs for the engine and an expert
the Bayesian Information Criterion (BIC) (Schwarz,
                                                             crafted reference model was unavailable, we used a
1978). The Bayesian likelihood metric is preferred for
                                                             Naı̈ve Bayes structure based on CIs as the ”base” refer-
discrete data, wheras the BIC measure works better
                                                             ence model. This represents an approximation, but the
for continuous distributions. The algorithm we use
                                                             approximation is a good one. As mentioned, experts
calculates the BIC value for every pair of evidence
                                                             avoid complex relationships in these models (such as
nodes (note that directionality matters, therefore,
                                                             between monitors and faults) they often implicitly as-
for nodes A and B, two BIC values are computed
                                                             sume independence. We find a close approximation of
from A to B and B to A). The values are stored in a
                                                             this as a Naive Bayes classifier.
matrix, which facilitates the application of Kruskal’s
algorithm to generate the MWST.                              The rest of the features extracted from the data rep-
                                                             resent the sensors, and thus, features that would most
The MWST version of this algorithm is implemented
                                                             likely be available in data from other complex systems
in the data mining toolkit called Weka (Hall, Eibe,
                                                             of this nature. These features are selectively added
Holmes, Reutemann, & Witten, 2009) It does not han-
                                                             to determine if the reasoner can generate more accu-
dle continuous features, and instead uses a discretiza-
                                                             rate results with the added information and the refined
tion algorithm to bin each of the features into sets
                                                             structures that the learning algorithm generates.
that best discriminate among classes. This produces
better classifiers, but it may create very fine splits for   The CMAPS-S data was generated in a way that the
 Sensor                  Notes                              erence models, we have conducted and evaluated a set
 Altitude                R, unit is feet                    of experiments using the data from the CMAPS-S en-
 Mach Number             R, the unit is Mach                gine system to establish whether the TAN-based model
 Throttle Angle          R, measured in degrees             produces a better diagnostic classifier than a reference
 Fuel Flow               R, measure in percent              model that is implemented as a Naı̈ve Bayes Classifier.
 Stall Margin of         CI                                 Our experiments compare the performance results of
 HPC                                                        the Naı̈ve Bayes versus the TAN models.
 Stall Margin of         CI                                 In the CMAPS-S data, we utilize two feature sets.
 LPC                                                        The first experiment uses the feature set defined as
 Stall Margin of Fan     CI                                 the baseline reference model(only CIs), and extracts
 Temp.      of High      R, measured in Centi-              a classifier structure by running our machine learning
 Pressure Turbine        grade                              algorithms. The next experiment adds additional sen-
 Temp. of the Fan        R, measured in centigrade          sors to the baseline that are not conditional indicators,
 Inlet                                                      to see if using these sensors can improve diagnostic ac-
 Temp. of the Low        R, measured in centigrade          curacy while reducing false alarms.
 Pressure Turbine
 Pressure of Fan In-     R, measured in PSI                 A systematic study of the performance of the algo-
 let                                                        rithms requires running of n-Fold Cross Validation
 Phys. Fan Speed         R, measured in RPM                 experiments. Dividing the data into n equally sized
 Phys. Core Speed        R, measured in RPM                 and distinct sets of samples, each with the balance of
                                                            classes maintained as in the original set allows for the
Table 1: Sensor values and Monitors (Conditional In-        creation of n − 1 training sets with the last set be-
dicators) for the CMAPS-S Engine Data                       ing held out as the test set. This is done n times,
                                                            and the metrics generated are then averaged over each
                                                            of the n runs. This experimental style helps test the
fault(s) and their time of introduction was known, so       robustness of the classifier and keeps the metrics from
it was easy to assign nominal and faulty labels for each    being overly optimistic or pessimistic depending on the
data stream. The CMAP-S data models three faults:           random construction of one hold out set. The exper-
(1) a fan fault (Fan), (2) a High Pressure Compressor       iments include: (1) derivation of models for the in-
fault (HPC), and (3) a High Pressure Turbine fault          dividual faults, and (2) derivation of a model for the
(HPT). The reference model for the three faults could       multi-fault case. The metrics reported in Tables 2 and
be constructed in different ways. For example, one          3 are the average of 10-Fold Cross Validation runs.
could construct three different models – each model
defining a classifier that differentiated a fault condi-    6.1     Experimental Results
tion from nominal behavior. Another possibility was
to treat the model building as a multi class learning       The data generated for the experimental study in-
problem. The result would be a single classifier struc-     cluded the three faults discussed previously, and the
ture that distinguished between four hypotheses that        analysis was conducted in the cruise mode with the air-
included the three faults and nominal operations. This      craft flying at an altitude of 35,000 feet. The data min-
structure as the model would likely produce insights on     ing algorithms were run to derive individual models
how to differentiate between several faults hypothesis.     for the three single fault modes, as well as a combined
Given that we were adopting an exploratory frame-           model with all three faults. Tables 2 and 3, summarize
work to study the effectiveness of different classifier     our experimental results in terms of the accuracy met-
models, it made sense to compare between different          rics, i.e., overall accuracy (Acc), false positives (FP),
classifier structures and analyze the discriminating evi-   and false negatives (FN).
dence provided by each model. Furthermore, the avail-
ability of the CMAPS-S simulator facilitated this ap-       6.1.1    Experiment 1
proach, since in real situations it may be hard to col-
lect sufficient amounts of fault data to build robust       The Naı̈ve Bayes model with only the CIs represents
classifiers that include multiple fault hypotheses.         the reference model for analysis of core engine anoma-
                                                            lies. The TAN structure with additional causal rela-
                                                            tions results in a model with better accuracy. The
6   Experiments                                             results in Tables 2 and 3 demonstrate that the TAN
                                                            Structure for the FAN Fault and the multi-fault clas-
To initially evaluate the ability of the data mining        sifier have higher accuracy. Their superior perfor-
techniques to improve over the Naı̈ve Bayes based ref-      mance shows that even with a small number of fea-
                                 Fan                   HPC                     HPT            All Three
                          Acc    FP     FN     Acc      FP     FN      Acc      FP FN     Acc FP        FN
 Naı̈ve Bayes Network     67.9   15.4   36.7   71.4      0     35.3    94.2      0  9.3   82.1 15.5 19.6
         TAN              99.4    0.4   0.7    80.8    36.7     0      94.7     8.9 2.9   97.4 1.1      3.8

                        Table 2: Cruise Mode: Model with Only Conditional Indicators

                                 Fan                   HPC                     HPT             All Three
                          Acc    FP     FN      Acc     FP      FN      Acc     FP FN      Acc FP FN
 Naı̈ve Bayes Network     68.8   12.5   49.5    72.9     0      56.7    93.8    3.6 9.9    84.9 1.1 23.2
         TAN              99.8    0     0.4    87.96    23.0     0      96.6    5.4 0.5    98.0 0.8 0.7

              Table 3: Cruise Model: Model with Conditional Indicators + Sensor Measurements


tures(3), introduction of two new causal links, the re-       which was problematic in first experiment, but the ac-
sults improved considerably(67.9% to 99.4% for the            curacy increased significantly. This improved the False
Fan and 82.1% to 97.4% for multi-fault). Figure 3             Positive rate, while not increasing the corresponding
shows the representative TAN used in the multi-fault          false negative metric. This additional information im-
scenario(the NB Model on the right is for compari-            proved it significantly over its Naı̈ve Bayes counterpart
son). The CI corresponding to stall margin for the            as well as the models in the first experiment. This im-
Low Pressure Compressor provided the best discrim-            provement without a negative cost to the error rates
inating evidence between different faults when only           is true for the TAN models across all scenarios. As
conditioned by the class variable. For the single fault       interesting observation is that the additional informa-
classifiers, the Fan and HPC TANs outperformed the            tion seems to have had a small negative impact in a
Naı̈ve Bayes, but the HPT classifier provided minimal         few cases of the Naı̈ve Bayes models. In summary, the
improvement. The HPT Classifier seems to require a            additional information provided an advantage to the
simple classifier and both models achieved over 90%           TANs , which were able to generate additional causal
accuracy. The classifiers for the HPC fault were the          relations and information to improve diagnostic accu-
lowest performing set. Although the TAN did better            racy.
than the NB classifier by over 8%, this would indicate
                                                              Figure 4 displays the TAN model structure generated
that the reference model for the engine may not be
                                                              for the HPC scenario. This TAN model with addi-
able to detect and isolate this fault, particularly from
                                                              tional features has an accuracy metric of 88% as com-
cruise data.
                                                              pared to the original TAN model that produced an
                                                              accuracy of 80.8%. The Naı̈ve Bayes Model using the
                                                              additional sensors improved to 72.9%, from the origi-
                                                              nal Naı̈ve Bayes model at 71.4%. The accuracy results
                                                              clearly indicate: (1) additional sensor information in-
                                                              creases diagnostic accuracy and (2) Switching from a
                                                              Naı̈ve Bayes to a TAN model improves diagnostic ac-
                                                              curacy.
                                                              This improvement can be examined visually in Figure
                                                              4, where in place of the three CIs, the Mach Number
                                                              sensor becomes the observational root node. The new
                                                              causal structure, captured in Figure 4 shows the Fuel
Figure 3: NB Model on the left and the TAN Model              Flow sensor as a parent to two of the CIs. Network
on the right for the Multi-Fault Scenario with Only CI        structures such as the one for the HPC fault explic-
                                                              itly illustrate how additional sensor information can
                                                              be included to enhance the accuracy of the reference
6.1.2   Experiment 2                                          model. In general, the new causal relations suggested
                                                              can be examined by a domain expert who in turn can
For the second set of experiments, we consider the ad-        construct new and improved indicators to use in a ref-
ditional sensors. From Table 3, there is an improve-          erence model. The results generated by these data
ment in the accuracy numbers for all of the TAN mod-          driven models can provide numbers on how the new in-
els. This is highlighted by the HPC fault scenario,
             Figure 4: TAN Model for HPC Scenario with Conditional Indicators and Extra Sensors


formation can improve the accuracy of the diagnoser,             Thirteenth Conference on Uncertainty in Artifi-
and how it may impact the error rates.                           cial Intelligence. Morgan Kaufmann.
                                                           Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., &
7     Conclusions and Future Work                                Chase, J. S. (2004). Correlating instrumenta-
                                                                 tion data to system states: a building block for
The results on experiments conducted with the                    automated diagnosis and control. In Proceedings
CMAPS-S data illustrate the promise of the method-               of the 6th conference on Symposium on Opeart-
ology and process we have been developing. To fur-               ing Systems Design & Implementation - Volume
ther validate our work, we have identified a number              6 (pp. 16–16). Berkeley, CA, USA: USENIX As-
of directions and tasks we need to pursue as we move             sociation.
forward in this project.                                   Frederick, D., DeCastro, J., & Litt, J. (2007).
                                                                 Users Guide For the Commercial Modular
    • The Naı̈ve Bayes Classifier is an approximation to         Aero-Propulsion System Simulator (Tech. Rep.).
      the expert built reference models. We would like           NASA.
      to perform a more thorough experiment and use        Friedman, N., Geiger, D., & Goldszmidt, M. (1997).
      actual models constructed by domain experts.               Bayesian Network Classifiers. Machine Learn-
                                                                 ing, 29 , 131–163.
    • Simulation systems, such as CMAPS-S study par-       Hall, M., Eibe, F., Holmes, B., Geoffrey
      ticular systems, like the core engine functions            amd Pfahringer, Reutemann, P., & Witten,
      in greater detail than any information that can            I. H. (2009). The WEKA Data Mining Soft-
      be derived from sensors and monitors in current            ware: An Update.        SIGKDD Explorations,
      aircraft configurations. We are looking to de-             11 (1), pp. 10-18.
      velop methods by which detailed simulation data      Honeywell. (2010). Vehicle Integrated Prognostic
      may be combined with actual aircraft flight data           Reasoner. NASA Contractor Report to appear ,
      to carry on extensive analyses of diagnostic and           NNL09AD44T .
      prognostic events and their propagation through      Kruskal, J., Joseph B. (1956). On the Shortest
      the aircraft system.                                       Spanning Subtree of a Graph and the Traveling
                                                                 Salesman Problem. Proceedings of the American
Acknowledgements                                                 Mathematical Society, 7 (1), pp. 48-50.
This project has been supported by NASA NRA                Schwarz, G. (1978). Estimating the Dimension of a
NNL09AD44T.                                                      Model,. Annals of Statistics, 6 .
                                                           Spitzer, C. (2007). Honeywell Primus Epic Aircraft
                                                                 Diagnostic and Maintenance System. Digital
References                                                       Avionics Handbook (2), pp. 22-23.
Chickering, D. M., Heckerman, D., & Meek, C. (1997).       Witten, I., & Frank, E. (1999). Data Mining : Practi-
     A Bayesian approach to learning Bayesian net-               cal Machine Learning Tools and Techniques with
     works with local structure. In In Proceedings of            Java Implementations.