=Paper= {{Paper |id=Vol-1507/dx15paper29 |storemode=property |title=The Case for a Hybrid Approach to Diagnosis: A Railway Switch |pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper29.pdf |volume=Vol-1507 |dblpUrl=https://dblp.org/rec/conf/safeprocess/MateiGHK15 }} ==The Case for a Hybrid Approach to Diagnosis: A Railway Switch== https://ceur-ws.org/Vol-1507/dx15paper29.pdf
                        Proceedings of the 26th International Workshop on Principles of Diagnosis




            The Case for a Hybrid Approach to Diagnosis: A Railway Switch

              Ion Matei and Anurag Ganguli and Tomonori Honda and Johan de Kleer
                         Palo Alto Research Center, Palo Alto, California, USA
                          e-mail: {imatei,aganguli,thonda,dekleer}@parc.com



                         Abstract                                  model ultimately has 56 continuous time state and more than
                                                                   2000 time-varying variables). We require this model to con-
     Behavioral models are at the core of Fault-                   tain the key mechanisms which comprise a switch mecha-
     Detection and Isolation (FDI) and Model-Based                 nism. Under the limiting conditions, building an accurate
     Diagnosis (MBD) methods. In some practical ap-                model of the system proved to be impractical and therefore
     plications, however, building and validating such             we used simplified models for the system’s components. For
     models may not always be possible, or only par-               example, we model the controller as a PID controller while
     tially validated models can be obtained. In this              the actually mechanism surely has a more complex one. The
     paper we present a diagnosis solution when only               Modelica model is fault augmented [Minhas et al., 2014]
     a partially validated model is available. The solu-           including parameters which represent the fault amounts for
     tion uses a fault-augmented physics-based model               wear, etc. Second, we develop ML classifiers to detect and
     to extract meaningful behavioral features corre-              diagnose faults by running the Modelica model repeatedly
     sponding to the normal and abnormal behavior.                 with various fault amounts. We mix noise in the simulation
     These features together with experimental train-              to avoid over-fitting. For the ML classifier to work requires
     ing data are used to build a data-driven statisti-            developing a set of features for the signal. Each time series
     cal model used for classifying the behavior of the            is segmented at defined conditions and a set of features is
     system based on observations. We apply this ap-               designed (e.g., mean in segment, max in segment). Mul-
     proach for a railway switch diagnosis problem.                tiple ML techniques can develop a classifier, the best we
                                                                   found are based on random-forest. Third, we throw away
1 Introduction                                                     the model — it was only important to develop the features
                                                                   and the classifier. We now use the classifiers developed for
Consider the case of developing diagnostic software for a
                                                                   the synthetic data on the real data. We were able to detect
complex system (for this paper our example is a railway
                                                                   faults with a high level of accuracy, but were only partially
switch). The task is to determine from operational data
                                                                   successful in identifying the correct fault mode (or nomi-
whether the switch is operating correctly or in one of a fixed
                                                                   nal) for the operating system. Independently, we showed
number of fault modes. We are given the following very
                                                                   that given enough data for the various fault modes, using
limiting (but all too common) conditions: (a) very limited
                                                                   the same set of features, a ML classifier can be designed that
resources to complete the project (a few man months); (b)
                                                                   also achieves a high diagnostic accuracy. The latter effort is
limited number of sensors; (c) unavailability of the model
                                                                   not the subject of the paper. Overall, the customer was very
of the system; (d) unavailability of the system itself (would
                                                                   satisfied with the results of the project. Throughout the rest
require an instrumented private rail system); (e) unavailabil-
                                                                   of the paper we describe in detail the procedure described
ity of the parameters of the system components; (f) lim-
                                                                   above.
ited nominal data; (g) extremely limited fault data (supplied
as time series); (h) highly non-linear multi-physics system
having multiple operating modes. Broadly speaking there
                                                                   1.1 FDI and MBD
are three approaches to this type of problem: Model-Based          In model-based approaches (FDI and MBD), the diagnosis
Diagnosis (MBD), Fault Detection and Isolation (FDI) and           engine is provided with a model of the system, values of the
Machine Learning (ML). None of these approaches is ade-            parameters of the model and values of some of its inputs
quate of this task. MBD and FDI require models and param-          and outputs. Its main goal is to determine from only this
eters which are unavailable. ML approaches will require a          information whether the system is malfunctioning, which
large amount of training data, and most approaches would           components might be faulty and what additional informa-
require extensive feature engineering. In this paper we will       tion need to be gathered (if any) to identify the faulty com-
demonstraint a hybrid approach to this task which was ulti-        ponents with relative certainty. The distinguishing features
mately fully satisfactory for the train company. Many real         of the MBD [de Kleer et al., 1992] approach are an empha-
world diagnostic tasks have similar limitations and we be-         sis on general diagnostic reasoning engines that perform a
lieve our approach is one that yields good diagnostic algo-        variety of diagnostic tasks via on-line reasoning, and infer-
rithms for many cases.                                             ence of a system’s global behavior from the automatic com-
   At a high level our approach is as follows. First we build      bination of physical components. Hence, MBD models are
by hand an approximate model in Modelica (our switch               compositional - the model of a combination of two systems




                                                             225
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


is directly constructed from the models of the constituent           2 we motivate and describe the railway switch diagnosis
systems. FDI methods can work with both physics-based                problem. Sections 3 and 4 present the physics-based model,
and empirical models. The physics-based models are usu-              its fault-augmented version and the partial validation of the
ally flattened, that is, the components and sub-components           system. Section 5 describes the diagnosis solution under a
structure is lost into an overall behavioral model. Often,           partially validated physics-based model while Section 6 puts
the faults are seen as separate inputs that need to be com-          our solution in the context of exiting work on railway switch
puted by the diagnosis engine. The disadvantage of this              diagnostics.
approach is that the physical semantics of the faults is ig-
nored. In addition, treating the faults as exogenous inputs          2 Problem Description
ignores the fact that the abnormal behavior may in fact
                                                                     Railway signaling equipment (including switches) generates
depend on the variables of the systems. However, many
                                                                     approximately 60% of the failure statistics related to traffic
FDI techniques were shown to be effective in diagnosing
                                                                     disruptions due to signalling problems. As a consequence
dynamical systems [Gertler, 1998; Isermann, 1997; 2005;
                                                                     more and more attention is paid to railway safety and op-
Patton et al., 2000].
                                                                     timal railway maintenance. As a result of the rapid tech-
   The above discussion emphasizes the need for a model              nological advances in microelectronics and communication
when using either an FDI or MBD approach. As we will see             technologies in the past decades, it has become possible
later in the paper, there are cases when such a model is very        to add sensing and communication capabilities to railway
difficult to obtain and (more importantly) validate, or only         equipment such as switches, to detect equipment failure and
a partial model is available. Naturally, both FDI and MBD            therefore to enhance the quality of the railway service. Al-
approaches would not fare well in such a scenario. When              though these sensing capabilities allow for easy detection of
no model is available, data-driven methods can be used to            faults in the electrical components of the equipment, a sig-
learn the behavior of the system and use this knowledge              nificant number of faults related to the mechanical compo-
to predict the system behavior. Such methods require ex-             nents affect parameters whose monitoring would be difficult
perimental data corresponding to the normal and abnormal             either due to cost or impracticality of sensor placement.
behavior for classification purposes; data that is used to ex-          The rail switch assembly considered in this paper is
tract features representative for the system’s behavior. The         shown Figure 2. The component responsible for moving the
set of features together with observations of the system (out-       switch blades is the point machine. The point machine has
put measurements) are used to learn a data-driven statistical        two sub-components: a servo-motor (generates rotational
model that is further used to classify the current observed          motion) and a gear-cam mechanism (amplifies the torque
behavior. Namely, when new data is available it is fed into          generated by the motor and transforms the rotational motion
the data-driven model, which in turn will provide a “best            into a translational motion).
guess” to which class of behavior (normal or abnormal) the              The adjuster transfers the motion from the point machine
data corresponds to. It is well recognized that in data-driven       to the load (switch blades) through a drive rod. In particular,
approaches, the effectiveness of the classification is highly        by adjusting two bolts, the adjuster controls the time when
dependent on the quality of the features used for learning.          the switch blades start moving having as reference the time
   In this paper, we begin to bridge the gap between pure            when the drive rod commence moving. The switch blades
model-based and data-driven methods with a more hybrid               are supported by a set of rolling bearings to minimize mo-
approach. We propose the use of a partially validated model          tion friction. The manufacturer of the point machine en-
to help us determine a set of features that are representa-          dowed the equipment with a series of sensors that can mea-
tive for the normal and abnormal behavior. In this approach          sure the motor’s angular velocity and torque, and the cam’s
we build a physics based model of the system, emphasiz-              angle and stroke (linear position). These sensors log data
ing its components and sub-components. Due to the lack               in real time which is ten sent to a central station for anal-
of sufficient technical specifications and measurement data,         ysis. These sensors were installed by design on the point
only partial validation is achieved. By this we mean that            machine to monitor its safety. Although the operator of the
only a sub-set of the variables of interest match their coun-        railway switch is also interested in the diagnosis of the point
terpart in the experimental data. The rest of the variables,         machine, other possible faults are of interest as well. The
although not completely matching the real data, they do ex-          faults considered in this paper are as follows: loose lock-pin
hibit similar characteristics compared to the real data, e.g.,       fault (at the connection between the drive rod and the point
same number of maxima, minima, or common regions of                  machine), adjuster bolts misalignment (the bolts move away
increasing/decreasing values, etc. In other words they are           from their nominal position), missing bearings and the pres-
qualitatively equivalent. The physics-based model is further         ence of an obstacle preventing the completion of the switch
extended to include behaviors under different fault operating        blades motion. Adding new sensors measuring forces ap-
modes. In particular, physics-based models for the faults            plied to the switch blades or the position of the switch blades
are included in the nominal model. The fault-augmented               may facilitate immediate detection of such faults. How-
model is then used to generate synthetic simulated normal            ever, due to the sheer number and possible configurations
and abnormal (including multiple faults) behavior and ex-            of switches in the railway transportation network, this is not
tract representative features that are used in a data-driven         a scalable solution. Therefore, the challenge is to diagnose
approach. Note that although ideally we would like to exe-           the aforementioned faults using only the available measure-
cute the feature extraction step automatically, in this paper it     ments.
is performed manually as the automatic feature extraction is
a challenging problem in its own. The diagnosis procedure            3 System Modeling
described above is pictorially presented in Figure 1.                This section presents the fault augmented physics-based
   The rest of the paper is organized as follows: in Section         model of railway switch assembly, together with some




                                                               226
                       Proceedings of the 26th International Workshop on Principles of Diagnosis




                               Figure 1: Diagnosis procedure with partially validated model


                                                                 ates a rotational motion. The gear-cam mechanism scales
                                                                 down the angular velocity of the motor and amplifies the
                                                                 torque generated by the motor. In addition, it transforms the
                                                                 rotational motion into a translational motion.
                                                                 Servomotor
                                                                 No technical details were provided on this component, such
                                                                 as type of motor or type of controller. Values for technical
                                                                 parameters (e.g., armature resistance, motor shaft inertia)
                                                                 were not available either. This information was not avail-
                                                                 able to the switch operator either. Therefore, as a result of
                                                                 a literature review on the type of motors used in railway
                                                                 switches, a DC-permanent motor was chosen to be the most
                                                                 likely candidate. The dynamical model for this component
                                                                 is given by
                                                                               di(t)
                                                                           La          =    −Ra i(t) − Ke ω(t) + v(t),
                                                                                dt
Figure 2: Diagnosis procedure with partially validated
                                                                              dω(t)
model                                                                       J          =    Kt i(t) − Bω(t) − τ (t),
                                                                                dt

model validation results. Such models provide deeper in-          where v(t) acts as input signal, ω(t) is the angular veloc-
sight on the behavior of the physical system. Simulated          ity at the motor flange that acts as output, τ (t) is the torque
behavior helps with learning of normal and abnormal be-          load of the motor and i(t) is the current through the arma-
havior patterns. The abnormal patterns are especially useful     ture. Generic motor parameters from the literature were also
when not enough experimental data describing the abnormal        chosen [Zattoni, 2006]. One question that may arise is if an
behavior is available. The modeling process consists of de-      empirical model can be estimated. Unfortunately since only
composing the system into its main components, build phys-       the output ω(t) is available, an empirical model based on
ical models and combining them into an overall model of          system identification cannot be estimated, since no voltage
the system. We used the Modelica language to construct the       measurements are available. No information on the type of
model, which is a non-proprietary, object-oriented, equation     controller was available to us either. As a consequence, we
based language to model complex physical systems [Tiller,        used a PID controller for the feedback loop. Based on the
2001]. Models for the three main components of the rail-         observed profile of the motor output we determined that the
way switch, the point machine, the adjuster and the switch       controlled variable is the angular velocity ω(t). Indeed, Fig-
blades, are presented in what follows.                           ure 3 shows the motor’s angular velocity1 that is maintained
                                                                 at a constant value by the controller. To compute the pa-
3.1 Point machine                                                rameters of the PID controller we estimated metrics corre-
                                                                 sponding to the transient component of the output (angular
The point machine is the component of the railway switch         velocity), such as rise time and overshoot; metrics that are
system that is responsible for moving the switch blades and      formulated in .
locking them in the final position until a new motion action
is initiated. It is composed of two sub-components: servo-           1
                                                                       The angular velocity profile shown in the graph is similar but
motor and gear-cam mechanism. The electrical motor trans-        not exactly the observed one, due to proprietary information re-
forms electrical energy into mechanical energy and gener-        strictions.




                                                           227
                        Proceedings of the 26th International Workshop on Principles of Diagnosis




                                                                                  Figure 5: Adjuster diagram

                                                                  ing the adjuster was modeling the non-sticking contact be-
                                                                  tween the drive rod and the adjuster extremes. Stiff contact
             Figure 3: Motor angular velocity                     two bodies is usually modeled using a spring-damper com-
                                                                  ponent with very large values for the elasticity and damping
                                                                  constants. However, under this approach once contact takes
The Gear-Cam mechanism                                            place, it is permanent. To solve this challenge, we built a
As mentioned earlier, the gear-cam mechanism amplifies the        custom component that models the non-sticking contact.
torque generated by the motor and transforms the rotational
motion into a translational motion. The technical details         3.3 Switch blades
provided to us confirmed only the presence of the cam, but        The adjuster is connected to two switch blades that are
not of the gear. We inferred the presence of the latter, by       moved from left to right or right to left, depending on
comparing the angular velocity of the motor with the cam’s        the traffic needs. We look at a switch blade as a flexi-
angular velocity, estimated from the measured cam’s angle.        ble body and used an approximation method to modeling
This allowed us to estimate the ratio between the two veloci-     beams, namely the lumped parameter approximation. This
ties, and therefore estimate the gear ratio. The cam diagram      method assumes that beam deflection is small and in the lin-
is shown in Figure 4, where a wheel rotates as a result of        ear regime. The lumped parameter approach approximates
the torque transmitted through the gear and acts on a lever       a flexible body as a set of rigid bodies coupled with springs
that pushes the drive rod. Using the geometry of the cam,         and dampers. It can be implemented by a chain of alter-
                                                                  nating bodies and joints. The springs and dampers act on
                                                                  the bodies or the joints. The spring stiffness and damping
                                                                  coefficients are functions of the material properties and the
                                                                  geometry of the flexible elements. Parameters such a rail
                                                                  length, mass and mass moment of inertia were provided to
                                                                  us through technical documentation. To model the effect of
                                                                  the rail moving on rolling bearings, we included a friction
                                                                  component that accounts for energy loss due to friction. Al-
                                                                  though the component can model different friction models,
                                                                  the default models is Coulomb friction.

                                                                  3.4 Fault augmentation
                 Figure 4: Cam schematics                         In this section we describe the modeling artifacts that were
                                                                  used to include in the behavior of the system the four fault
the relation between the rotation motion and the linear mo-       operating modes: loose lock-pin, misaligned adjuster bolts,
tion (that is, the relation between the angle and the stroke)     obstacle and missing bearings.
is given by
                   stroke = R × sin(angle),                       Loose lock-pin
where R denotes the radius of the cam. In addition, the map       The lock-pin referred in this fault mode connects the point
between the applied torque and the generated force is             machine with the drive rod that transfers the motion to the
                                                                  switch blades. More precisely, it locks the drive rod to the
                        1                                         point machine. When this lock-pin becomes loose due to
              force =     × torque × cos(angle).
                        R                                         wear, it introduces a slackness in the way the motion is
As both the cam angle and the stroke were included in the         transferred to the switch blades. The lock-pin fault affects
available measurements, we used a least square method to          stability the connection point between the drive rod and
estimate the radius of the cam.                                   the point machine. In time, if not fixed, this can lead to a
                                                                  complete failure of the pin, and therefore the point-machine
3.2 Adjuster                                                      cannot longer act upon the blades. A custom-built compo-
The adjuster links the drive rod connected to the point ma-       nent whose main characteristic is that it implements a non-
chine to the switch blades, and hence it is responsible for       sticking pushing and pulling between two rods was built to
transferring the translational motion. There is a delay be-       model the effects of this fault. The impact between the two
tween the time instants the drive rod and the switch blades       rods is assumed to be elastic, that is, we use a spring-damper
start moving. This delay is controlled by setting the po-         assembly with large values for their parameters to model the
sitions of two bolts on the drive rod. Tighter bolt setting       contact. There are two types of contact: contact of the rods
means a smaller delay, while looser bolt setting produce a        with the boundaries of the locking mechanism and contact
larger delay. The high level diagram of the adjuster is de-       between the rods. Both these types of contact must exhibit
picted in Figure 5. The most challenging part in construct-       non-sticking pushing and pulling properties.




                                                            228
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


Misaligned adjuster bolts
In this fault mode the bolts of the adjuster deviate from their
nominal position. As a result, the instant at which the drive
rod meets the adjuster (and therefore the instant at which the
the switch rail starts moving) happens either earlier or later.
For example in a left-to-right motion, if the left bolt moves
to the right, the contact happens earlier. The reason is that
since the distance between the two bolts decreases, the left
bolt reaches the adjuster faster. As a result, when the drive
rod reaches its final position, there may be a gap between
the right switch blade and the right stock rail. In contrast, if
the left bolt moves to the left the contact happens later. The
model of the adjuster includes parameters that can set the
positions of the bolts, and therefore the effects of this fault
mode can be modeled without difficulty.
                                                                         Figure 6: Motor torque with its five operating zones
Obstacle
In this fault mode, an obstacle prevents the switch blades
reach their final nominal position, and therefore a gap be-          where the drive rod catches up again with switch blades an
tween the switch blades and the stock rail appears. The ef-          pushes them to their final position. Finally, in Zone 5 the
fect on the motor torque is a sudden increase in value, as the       switch blades are pushed against the stock rails for a short
motor tries to overcome the obstacle. To model this fault            period of time, hence the increase in torque. In support of
we included a component that implements a hard stop for              the validation of these five operating zone, a set of movies
the position of the switch blades. This component has two            depicting the motion of the switch blades were used. With
parameters for setting the left and right limits within motion       respect to the fault operating modes, we managed to gener-
of the switch blades is allowed. By changing the values of           ate similar effects in the simulated data, as the ones observed
these parameters, the presence of an obstacle can be simu-           in the measured data. Figure 7 shows the effect of the mis-
lated.                                                               aligned bolts fault, and in particular the case where the left
                                                                     bolt moves to the left. The effect is a delay applied on the
Missing bearings                                                     time instant the drive rod reaches the switch blades. In ad-
To minimize friction, the rails are supported by a set of            dition, Zone 5 is also affected since due to the decreased
rolling bearings. When they become stuck or lost, the en-            distance, the switch blades are no longer pushed against the
ergy losses due to friction increase. As mentioned in the            stock rails. In the case of an obstacle, the switch blades (and
section describing the switch blades modeling, a component
was included to account for friction. This component has a
parameter that sets the value for the friction coefficient. By
increasing the value of this parameter, the effect of the miss-
ing bearings fault can be simulated.

4 Model Validation
Motor angular velocity, cam angle and stroke, together with
the motor torque were used in the validation process. To
these measurements, we added the rail position that was
estimated from a set of movies depicting the rail motion,
to which image processing techniques were applied. We
achieved partial validation of the model. The simulated mo-
tor angular velocity, cam angle and stroke closely match
the measured data. The simulated motor torque however
matches in a qualitative sense its measured counterpart. The
main reason is the fact that we had to make assumptions on
the type controller motor and controller, without no way to          Figure 7: Motor torque in the normal and misaligned bolts
validate these assumptions. In addition, the available mea-          fault modes
surements did not allowe for the estimating the parameters
in the assumed models, as this problem is ill posed. Figure 6        hence the drive rod) push against an obstacle that does not
depicts the simulated torque, emphasizing the five operating         allow the completion of the motion. Therefore, the electric
zone. In Zone 1, the motor rotates the cam and the drive rod         motor develops the maximum allowable torque as seen in
moves freely. No contact with the switch blades takes place          Figure 8. In the case of the missing bearing fault mode, the
in this zone, and the (small) energy loss is due to friction in      motion friction of the switch blades increases, and hence
the mechanical components. Zone 2 corresponds to the case            the torque generated by the motor must accommodate this
where the drive rod pushes the two switch blades. The elas-          increase. We obtained this effect in simulation as shown in
ticity in the switch blades can be noticed in the toque profile      Figure 9. Finally, Figure 10 shows the effects of the lock-
in this zone. In Zone 3, the switch blades accelerate (as they       pin fault. The slackness introduced by the looseness of the
drop off the rolling bearings) and again the drive rod moves         pin induces a delay in the rail motion which also affects the
freely (note the drop in torque). Zone 4 depicts the case            behavior in Zone 5. In terms of the changes in the five op-




                                                               229
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


                                                                   effects in simulation. The choice of features described in the
                                                                   next section was supported by this understanding.

                                                                   5 Fault Detection and Diagnosis
                                                                   In the case of a railway switch, our measurements include
                                                                   the motor torque and motor angular velocity. As the switch
                                                                   moves from one extreme position to the other, these quan-
                                                                   tities are measured at a fixed sampling rate. Thus, we
                                                                   obtain a time series for each of the measurements. Let
                                                                   {τ (t1 ), . . . , τ (tN )} denote torque measured at time instants
                                                                   {t1 , . . . , tN }. Likewise, let {ω(t1 ), . . . , ω(tN )} denote the
                                                                   angular velocity. For simplicity’s sake, we denote the two
                                                                   time series of measurements by X. The diagnosis objective
                                                                   is to determine the underlying condition of the system from
                                                                   these time series. In other words, the objective is to deter-
                                                                   mine a classifier f : X → {N, F1 , F2 , F3 , F4 , F5 }, where
                                                                   N refers to the class label corresponding to the normal con-
Figure 8: Motor torque in the normal and obstacle fault            dition and F1 , F2 , F3 and F4 denote the class labels loose
modes                                                              bolt, tight bolt, loose lock-pin, missing bearings, and obsta-
                                                                   cle respectively.
                                                                       We adopt a machine learning approach to constructing the
                                                                   above mentioned classifier. The two main steps in building
                                                                   a machine learning classifier are feature selection and clas-
                                                                   sifier type selection. These two steps are discussed next.

                                                                   5.1 Feature selection
                                                                   As seen in Figure 6, the motor torque profile shows five dis-
                                                                   tinct operating zones. Moreover, we notice from Figures 7,
                                                                   8, 9 and 10 that a given fault’s impact on the torque pro-
                                                                   file seems limited to only some of the five zones. With this
                                                                   observation, our feature selection strategy is as follows.
                                                                     1. Identify the approximate time instants that define the
                                                                        boundaries of the five zones. For example, Zone 1 is
                                                                        defined to be between times 0.8 seconds and 2 seconds,
                                                                        zone 2 is defined to be between times 2 seconds and 4.1
                                                                        seconds, and so on.
Figure 9: Motor torque in the normal and missing bearings
                                                                     2. Within each zone, compute a set of measures. An ex-
fault modes
                                                                        ample of a measure is the total energy dissipated within
                                                                        the zone. This is computed as instantaneous power in-
                                                                        tegrated over the duration of the zone. The instanta-
                                                                        neous power is the product of instantaneous torque and
                                                                        angular velocity. Other examples of features include
                                                                        maximum and minimum torque values within the zone.
                                                                        The disclosure of the full set of measures used is not
                                                                        possible at this time for proprietary reasons. The fea-
                                                                        tures are normalized to have zero mean and unit stan-
                                                                        dard deviation.
                                                                   Note that it might be possible to combine one or more zones
                                                                   into one for feature selection.

                                                                   5.2 Classifier selection
                                                                   To map the features to the classes, {N, F1 , F2 , F3 , F4 , F5 },
                                                                   we use machine learning. Examples of types of classifiers
                                                                   commonly used include k− nearest neighbors, support vec-
Figure 10: Motor torque in the normal and lock-pin fault           tor machines, neural networks and decision trees. We chose
modes                                                              Random Forest, an ensemble classifier, because of its ro-
                                                                   bustness to overfitting. For a more detailed discussion on
                                                                   the advantages of Random Forest, we refer the reader to
erating zones, the simulated behavior showed similar char-         [Breiman, 2001]. In addition, we also developed a binary
acteristics as in the case of the real data. The understanding     classifier for fault detection based on Alternating Decision
of these behaviors come as a result of building the model,         Tree (AD Tree). The advantage of AD Tree is that the re-
augmenting the model with fault modes, and analyzing their         sults are human interpretable.




                                                             230
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


5.3 Results                                                          primarily due to confusion between missing bearings and
For each fault type, we introduce varying magnitudes of              normal. Figure 12 shows part of the fault detection AD
fault and simulate the switch model described earlier. The           Tree. A pink oval represents a feature node. Depending
fault magnitude is parameterized by a factor k which is var-         on the value of the feature, one of two branches is followed
ied over a pre specified range. A value of k equal to zero           until a leaf node is reached. Each edge that is traversed re-
corresponds to normal case. Higher values of k correspond            sults in a score shown within the blue rectangles. For every
to the faulty cases. In addition, we also add representative         root to leaf traversal, the total score is the sum of the scores
noise to the measurements. Figure 11 shows some example              accumulated on each edge. For a given data sample, mul-
torque profiles generated by the simulation.                         tiple root to leaf paths may be traversed. In that case, the
                                                                     final score is the sum of the scores accumulated over all the
                                                                     paths. If the final score is negative, the decision is normal;
                                                                     otherwise the decision is abnormal.

                                                                     Table 2: Fault detection confusion matrix on simulated data
                                                                                               Normal Abnormal
                                                                                   Normal        94.6        5.4
                                                                                 Abnormal        9.6        90.4


                                                                        Next, we test the classifiers on real data. A key prepro-
                                                                     cessing step is to compute a linear transformation that trans-
                                                                     forms the mean and standard deviation of the features of the
                                                                     nominal (normal) real data to make them equal to the mean
                                                                     and standard deviation of the features of the nominal simu-
                                                                     lated data. The same transformation is then applied on the
                                                                     real faulty data before testing with the ML classifier. We
                                                                     emphasize here that to compute the transformation we only
                                                                     require examples of real data showing normal behavior. We
Figure 11: Simulated torque measurements with added                  do not use any real fault data for training the ML classifier.
noise.                                                               Table 3 shows the fault detection results on real data. As
   The data generated is recorded and used to train and test         can be seen, we achieve a high accuracy of greater than 80
the machine learning classifier. We use leave-one-out cross-         percent. We also tested the multi-class random forest classi-
validation for training and testing the classifiers. In this ap-     fier to diagnose the various faults. We were able to diagnose
proach, one data sample is used for testing whereas all the          correctly all missing bearing faults but were unable to cor-
rest of the data is used for training. This is repeated un-          rectly diagnose the other faults.
til each data sample has been tested once. Table 1 shows
the confusion matrix for the simulated data described ear-              Table 3: Fault detection confusion matrix on real data
lier. The (i, j)th entry of the confusion matrix refers to the
percentage of cases where the true class was i but was clas-                                   Normal Abnormal
sified as j by the classifier. A matrix with 100 along all                         Normal        85.5       14.5
the diagonal entries would correspond to a perfect classifier.                    Abnormal        20         80
In the results shown in Table 1, we observe some misclas-
sification between classes N and F4 . Recall that N is the
normal class and F4 is the missing bearing class. On fur-
ther investigation, we determined that the misclassification         6 Related Work
occurs between the normal data and data corresponding to             A malfunctioning railway switch assembly can have a high
low magnitudes of the missing bearing fault.                         impact on the railway transportation safety, and therefore
                                                                     the problem of diagnosing such systems has been addressed
                                                                     in other works. [Zattoni, 2006] proposes a detection sys-
Table 1: Fault diagnosis confusion matrix on simulated data          tem based on off-line processing of the armature current
                 N     F1      F2    F3    F4     F5                 and voltage. The system implements an algorithm that real-
         N      97.2    0       0     2    0.8     0                 izes a finite impulse response system designed on the basis
         F1       0    100      0     0     0      0                 of an H2 -norm criterion, and allows for detection of incre-
         F2       0     0      99     1     0      0                 mental faults (e.g., loss of lubrication, increasing obstruc-
         F3       9     0       4    87     0      0                 tions, etc.). The approach hinges on the availability of a
         F4      11     0       0     0    89      0                 validated model of the point machine, which was not the
                                                                     case in our setup. [Zhou et al., 2001; 2002] propose a re-
         F5       0     0       0     0     0     100
                                                                     mote monitoring system for railway point machines. The
                                                                     system includes a variety of sensors for acquiring trackside
  The binary classification or fault detection result using          data related to parameters such as, distance, driving force,
AD Tree is shown in Table 2. As in the multi-class classifi-         voltage, electrical noise, or temperature. The monitoring
cation case, the false positives (normal classified as abnor-        system logs data for offline analysis that offers detailed in-
mal), and false negatives (abnormal classified as normal) are        formation on the condition of the system in the form of event




                                                               231
                          Proceedings of the 26th International Workshop on Principles of Diagnosis


                                                                0&(=&3.718&


                                                                          0.135&          <1.645&




                                                                                              Max&torque&in&                Total&energy&
                   Feature&4&                               Feature&5&
                                                                                                zone&2&                      dissipated&




                                  Feature&5&                         Feature&6&




                                                 Figure 12: Part of the fault detection AD Tree


analysis and data trends. Hence unlike in our setup, the fo-                            normal and abnormal behavior. This approach relies on a set
cus is on detection rather than isolation. In addition, due                             of sensors measurements such as motors, voltage, current or
to scalability constraints, our solution is based on the em-                            switch blade positions, not all of them being available in our
bedded sensors, no other sensor being added. In [Asada                                  case. In addition, the computation of the net energy requires
et al., 2013] classification based fault detection and diag-                            parameters of the electrical motor (armature resistance and
nosis algorithm is developed using measurements such as                                 motor shaft inertia) that again are not available in our setup.
drive force, electrical current and voltage. In particular, a                           In addition, unlike our diagnosis objective, the focus in on
classifier based on support vector machines is used. Our                                detecting abnormalities within the point machine.
work also uses classification for diagnosis, but considers a
wider verity of classifiers such as Multiclass Random For-                              7 Conclusions
est or Logitboosted Random Forest that were proved to be
more robust [Opitz and Maclin, 1999]. The classification                                The three main general approaches to developing diagnostic
step in [Asada et al., 2013] depends on a set of features ex-                           software (FDI, MBR, and ML) all have severe limitations in
tracted by applying the discrete wavelet transform on the                               many real-world applications. We believe we will see many
active power. This step is oblivious on the operating modes                             more hybrid approaches to diagnosis that include the best of
of the point machine, which we showed to relevant in our                                these three approaches to build accurate diagnosers.The rail-
case. Hence, the diagnosis approach in [Asada et al., 2013]                             way switch is a critical and complex piece of equipment re-
is purely data driven. Since we had no access to current and                            quiring extremely high diagnostic accuracy (the main reason
voltage measurements this avenue for feature construction                               this project was initiated), and the approach outlined in this
was not available to us. Depending of the type of electri-                              paper was ultimately successful. Ultimately deployment of
cal motors, the current and the voltage could be computed                               this approach will depend on expanding the set of faults de-
from the angular velocity and torque, respectively. How-                                tecting and on installation of more sensor rich switches in
ever, knowledge of motor parameters is needed. [Asada                                   railroad infrastructures.
et al., 2013] consider two type of faults: underdriving and
overdriving of the drive rod. Overdriving refers to the case                            References
where the switch blades are pushed against the stock rails
due to misalignment, and a higher force then normal ap-                                 [Asada et al., 2013] T. Asada, C. Roberts, and T. Koseki.
pears between the stock rails and the switch blades. Over-                                An algorithm for improved performance of railway con-
driving map to misaligned bolts, missing bearings and ob-                                 dition monitoring equipment: Alternating-current point
stacles in our setup. All these fault modes exhibit higher                                machine case study. Transportation Research Part C:
forces than normal. Underdriving maps to a particular in-                                 Emerging Technologies, 30(0):81 – 92, 2013.
stance of the misaligned bolts fault (left bolt moves to the                            [Breiman, 2001] Leo Breiman. Random forests. Machine
left for example). Therefore, our solution differentiate be-                              learning, 45(1):5–32, 2001.
tween more possible causes of higher forces since we take
advantage of the particular signature these forces have in                              [de Kleer et al., 1992] J. de Kleer, A. Mackworth, and
each fault corresponding to overdriving. Another pure data-                                R. Reiter. Characterizing diagnoses and systems. 56(2-
driven approach for railway point machine monitoring was                                   3):197–222, 1992.
proposed in [Oyebande and Renfrew, 2002], where a net                                   [Gertler, 1998] J. Gertler. Fault-Detection and Diagnosis in
energy analysis technique was used to discriminate between                                Engineering Systems. New York: Marcel Dekker, 1998.




                                                                                  232
                       Proceedings of the 26th International Workshop on Principles of Diagnosis


[Isermann, 1997] R. Isermann. Supervision, fault-detection
   and fault-diagnosis methods - An introduction. Control
   Engineering Practice, 5(5):639 – 652, 1997.
[Isermann, 2005] Rolf Isermann.          Model-based fault-
   detection and diagnosis - status and applications. Annual
   Reviews in Control, 29(1):71 – 85, 2005.
[Minhas et al., 2014] R. Minhas, J. de Kleer, I. Matei,
   B. Saha, B. Janssen, D.G. Bobrow, and T Kortuglu. Us-
   ing fault augmented Modelica model for diagnostics. In
   Proceedings of the 10th International Modelica Confer-
   ence, Dec 2014.
[Opitz and Maclin, 1999] David Opitz and Richard Maclin.
   Popular ensemble methods: an empirical study. Journal
   of Artificial Intelligence Research, 11:169–198, 1999.
[Oyebande and Renfrew, 2002] B.O. Oyebande and A.C.
   Renfrew. Condition monitoring of railway electric point
   machines. Electric Power Applications, IEE Proceedings
   -, 149(6):465–473, Nov 2002.
[Patton et al., 2000] Ron J. Patton, Paul M. Frank, and
   Robert N. Clark. Issues of Fault Diagnosis for Dynamic
   Systems. Springer-Verlag London, 2000.
[Tiller, 2001] Michael Tiller. Introduction to Physical Mod-
   eling with Modelica. Kluwer Academic Publishers, Nor-
   well, MA, USA, 2001.
[Zattoni, 2006] Elena Zattoni. Detection of incipient fail-
   ures by using an -norm criterion: Application to rail-
   way switching points. Control Engineering Practice,
   14(8):885 – 895, 2006.
[Zhou et al., 2001] F. Zhou, M. Duta, M. Henry, S. Baker,
   and C. Burton. Condition monitoring and validation
   of railway point machines. In Intelligent and Self-
   Validating Instruments – Sensors and Actuators (Ref. No.
   2001/179), IEE Seminar on, pages 6/1–6/7, Dec 2001.
[Zhou et al., 2002] F.B. Zhou, M.D. Duta, M.P. Henry,
   S. Baker, and C. Burton. Remote condition monitoring
   for railway point machine. In Railroad Conference, 2002
   ASME/IEEE Joint, pages 103–108, April 2002.




                                                           233
Proceedings of the 26th International Workshop on Principles of Diagnosis




                                  234