=Paper= {{Paper |id=Vol-1507/dx15paper29 |storemode=property |title=The Case for a Hybrid Approach to Diagnosis: A Railway Switch |pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper29.pdf |volume=Vol-1507 |dblpUrl=https://dblp.org/rec/conf/safeprocess/MateiGHK15 }} ==The Case for a Hybrid Approach to Diagnosis: A Railway Switch== https://ceur-ws.org/Vol-1507/dx15paper29.pdf

Proceedings of the 26th International Workshop on Principles of Diagnosis

The Case for a Hybrid Approach to Diagnosis: A Railway Switch

Ion Matei and Anurag Ganguli and Tomonori Honda and Johan de Kleer
Palo Alto Research Center, Palo Alto, California, USA
e-mail: {imatei,aganguli,thonda,dekleer}@parc.com

Abstract model ultimately has 56 continuous time state and more than
2000 time-varying variables). We require this model to con-
Behavioral models are at the core of Fault- tain the key mechanisms which comprise a switch mecha-
Detection and Isolation (FDI) and Model-Based nism. Under the limiting conditions, building an accurate
Diagnosis (MBD) methods. In some practical ap- model of the system proved to be impractical and therefore
plications, however, building and validating such we used simplified models for the system’s components. For
models may not always be possible, or only par- example, we model the controller as a PID controller while
tially validated models can be obtained. In this the actually mechanism surely has a more complex one. The
paper we present a diagnosis solution when only Modelica model is fault augmented [Minhas et al., 2014]
a partially validated model is available. The solu- including parameters which represent the fault amounts for
tion uses a fault-augmented physics-based model wear, etc. Second, we develop ML classifiers to detect and
to extract meaningful behavioral features corre- diagnose faults by running the Modelica model repeatedly
sponding to the normal and abnormal behavior. with various fault amounts. We mix noise in the simulation
These features together with experimental train- to avoid over-fitting. For the ML classifier to work requires
ing data are used to build a data-driven statisti- developing a set of features for the signal. Each time series
cal model used for classifying the behavior of the is segmented at defined conditions and a set of features is
system based on observations. We apply this ap- designed (e.g., mean in segment, max in segment). Mul-
proach for a railway switch diagnosis problem. tiple ML techniques can develop a classifier, the best we
found are based on random-forest. Third, we throw away
1 Introduction the model — it was only important to develop the features
and the classifier. We now use the classifiers developed for
Consider the case of developing diagnostic software for a
the synthetic data on the real data. We were able to detect
complex system (for this paper our example is a railway
faults with a high level of accuracy, but were only partially
switch). The task is to determine from operational data
successful in identifying the correct fault mode (or nomi-
whether the switch is operating correctly or in one of a fixed
nal) for the operating system. Independently, we showed
number of fault modes. We are given the following very
that given enough data for the various fault modes, using
limiting (but all too common) conditions: (a) very limited
the same set of features, a ML classifier can be designed that
resources to complete the project (a few man months); (b)
also achieves a high diagnostic accuracy. The latter effort is
limited number of sensors; (c) unavailability of the model
not the subject of the paper. Overall, the customer was very
of the system; (d) unavailability of the system itself (would
satisfied with the results of the project. Throughout the rest
require an instrumented private rail system); (e) unavailabil-
of the paper we describe in detail the procedure described
ity of the parameters of the system components; (f) lim-
above.
ited nominal data; (g) extremely limited fault data (supplied
as time series); (h) highly non-linear multi-physics system
having multiple operating modes. Broadly speaking there
1.1 FDI and MBD
are three approaches to this type of problem: Model-Based In model-based approaches (FDI and MBD), the diagnosis
Diagnosis (MBD), Fault Detection and Isolation (FDI) and engine is provided with a model of the system, values of the
Machine Learning (ML). None of these approaches is ade- parameters of the model and values of some of its inputs
quate of this task. MBD and FDI require models and param- and outputs. Its main goal is to determine from only this
eters which are unavailable. ML approaches will require a information whether the system is malfunctioning, which
large amount of training data, and most approaches would components might be faulty and what additional informa-
require extensive feature engineering. In this paper we will tion need to be gathered (if any) to identify the faulty com-
demonstraint a hybrid approach to this task which was ulti- ponents with relative certainty. The distinguishing features
mately fully satisfactory for the train company. Many real of the MBD [de Kleer et al., 1992] approach are an empha-
world diagnostic tasks have similar limitations and we be- sis on general diagnostic reasoning engines that perform a
lieve our approach is one that yields good diagnostic algo- variety of diagnostic tasks via on-line reasoning, and infer-
rithms for many cases. ence of a system’s global behavior from the automatic com-
At a high level our approach is as follows. First we build bination of physical components. Hence, MBD models are
by hand an approximate model in Modelica (our switch compositional - the model of a combination of two systems

225
Proceedings of the 26th International Workshop on Principles of Diagnosis

is directly constructed from the models of the constituent 2 we motivate and describe the railway switch diagnosis
systems. FDI methods can work with both physics-based problem. Sections 3 and 4 present the physics-based model,
and empirical models. The physics-based models are usu- its fault-augmented version and the partial validation of the
ally flattened, that is, the components and sub-components system. Section 5 describes the diagnosis solution under a
structure is lost into an overall behavioral model. Often, partially validated physics-based model while Section 6 puts
the faults are seen as separate inputs that need to be com- our solution in the context of exiting work on railway switch
puted by the diagnosis engine. The disadvantage of this diagnostics.
approach is that the physical semantics of the faults is ig-
nored. In addition, treating the faults as exogenous inputs 2 Problem Description
ignores the fact that the abnormal behavior may in fact
Railway signaling equipment (including switches) generates
depend on the variables of the systems. However, many
approximately 60% of the failure statistics related to traffic
FDI techniques were shown to be effective in diagnosing
disruptions due to signalling problems. As a consequence
dynamical systems [Gertler, 1998; Isermann, 1997; 2005;
more and more attention is paid to railway safety and op-
Patton et al., 2000].
timal railway maintenance. As a result of the rapid tech-
The above discussion emphasizes the need for a model nological advances in microelectronics and communication
when using either an FDI or MBD approach. As we will see technologies in the past decades, it has become possible
later in the paper, there are cases when such a model is very to add sensing and communication capabilities to railway
difficult to obtain and (more importantly) validate, or only equipment such as switches, to detect equipment failure and
a partial model is available. Naturally, both FDI and MBD therefore to enhance the quality of the railway service. Al-
approaches would not fare well in such a scenario. When though these sensing capabilities allow for easy detection of
no model is available, data-driven methods can be used to faults in the electrical components of the equipment, a sig-
learn the behavior of the system and use this knowledge nificant number of faults related to the mechanical compo-
to predict the system behavior. Such methods require ex- nents affect parameters whose monitoring would be difficult
perimental data corresponding to the normal and abnormal either due to cost or impracticality of sensor placement.
behavior for classification purposes; data that is used to ex- The rail switch assembly considered in this paper is
tract features representative for the system’s behavior. The shown Figure 2. The component responsible for moving the
set of features together with observations of the system (out- switch blades is the point machine. The point machine has
put measurements) are used to learn a data-driven statistical two sub-components: a servo-motor (generates rotational
model that is further used to classify the current observed motion) and a gear-cam mechanism (amplifies the torque
behavior. Namely, when new data is available it is fed into generated by the motor and transforms the rotational motion
the data-driven model, which in turn will provide a “best into a translational motion).
guess” to which class of behavior (normal or abnormal) the The adjuster transfers the motion from the point machine
data corresponds to. It is well recognized that in data-driven to the load (switch blades) through a drive rod. In particular,
approaches, the effectiveness of the classification is highly by adjusting two bolts, the adjuster controls the time when
dependent on the quality of the features used for learning. the switch blades start moving having as reference the time
In this paper, we begin to bridge the gap between pure when the drive rod commence moving. The switch blades
model-based and data-driven methods with a more hybrid are supported by a set of rolling bearings to minimize mo-
approach. We propose the use of a partially validated model tion friction. The manufacturer of the point machine en-
to help us determine a set of features that are representa- dowed the equipment with a series of sensors that can mea-
tive for the normal and abnormal behavior. In this approach sure the motor’s angular velocity and torque, and the cam’s
we build a physics based model of the system, emphasiz- angle and stroke (linear position). These sensors log data
ing its components and sub-components. Due to the lack in real time which is ten sent to a central station for anal-
of sufficient technical specifications and measurement data, ysis. These sensors were installed by design on the point
only partial validation is achieved. By this we mean that machine to monitor its safety. Although the operator of the
only a sub-set of the variables of interest match their coun- railway switch is also interested in the diagnosis of the point
terpart in the experimental data. The rest of the variables, machine, other possible faults are of interest as well. The
although not completely matching the real data, they do ex- faults considered in this paper are as follows: loose lock-pin
hibit similar characteristics compared to the real data, e.g., fault (at the connection between the drive rod and the point
same number of maxima, minima, or common regions of machine), adjuster bolts misalignment (the bolts move away
increasing/decreasing values, etc. In other words they are from their nominal position), missing bearings and the pres-
qualitatively equivalent. The physics-based model is further ence of an obstacle preventing the completion of the switch
extended to include behaviors under different fault operating blades motion. Adding new sensors measuring forces ap-
modes. In particular, physics-based models for the faults plied to the switch blades or the position of the switch blades
are included in the nominal model. The fault-augmented may facilitate immediate detection of such faults. How-
model is then used to generate synthetic simulated normal ever, due to the sheer number and possible configurations
and abnormal (including multiple faults) behavior and ex- of switches in the railway transportation network, this is not
tract representative features that are used in a data-driven a scalable solution. Therefore, the challenge is to diagnose
approach. Note that although ideally we would like to exe- the aforementioned faults using only the available measure-
cute the feature extraction step automatically, in this paper it ments.
is performed manually as the automatic feature extraction is
a challenging problem in its own. The diagnosis procedure 3 System Modeling
described above is pictorially presented in Figure 1. This section presents the fault augmented physics-based
The rest of the paper is organized as follows: in Section model of railway switch assembly, together with some

226
Proceedings of the 26th International Workshop on Principles of Diagnosis

Figure 1: Diagnosis procedure with partially validated model

ates a rotational motion. The gear-cam mechanism scales
down the angular velocity of the motor and amplifies the
torque generated by the motor. In addition, it transforms the
rotational motion into a translational motion.
Servomotor
No technical details were provided on this component, such
as type of motor or type of controller. Values for technical
parameters (e.g., armature resistance, motor shaft inertia)
were not available either. This information was not avail-
able to the switch operator either. Therefore, as a result of
a literature review on the type of motors used in railway
switches, a DC-permanent motor was chosen to be the most
likely candidate. The dynamical model for this component
is given by
di(t)
La = −Ra i(t) − Ke ω(t) + v(t),
dt
Figure 2: Diagnosis procedure with partially validated
dω(t)
model J = Kt i(t) − Bω(t) − τ (t),
dt

model validation results. Such models provide deeper in- where v(t) acts as input signal, ω(t) is the angular veloc-
sight on the behavior of the physical system. Simulated ity at the motor flange that acts as output, τ (t) is the torque
behavior helps with learning of normal and abnormal be- load of the motor and i(t) is the current through the arma-
havior patterns. The abnormal patterns are especially useful ture. Generic motor parameters from the literature were also
when not enough experimental data describing the abnormal chosen [Zattoni, 2006]. One question that may arise is if an
behavior is available. The modeling process consists of de- empirical model can be estimated. Unfortunately since only
composing the system into its main components, build phys- the output ω(t) is available, an empirical model based on
ical models and combining them into an overall model of system identification cannot be estimated, since no voltage
the system. We used the Modelica language to construct the measurements are available. No information on the type of
model, which is a non-proprietary, object-oriented, equation controller was available to us either. As a consequence, we
based language to model complex physical systems [Tiller, used a PID controller for the feedback loop. Based on the
2001]. Models for the three main components of the rail- observed profile of the motor output we determined that the
way switch, the point machine, the adjuster and the switch controlled variable is the angular velocity ω(t). Indeed, Fig-
blades, are presented in what follows. ure 3 shows the motor’s angular velocity1 that is maintained
at a constant value by the controller. To compute the pa-
3.1 Point machine rameters of the PID controller we estimated metrics corre-
sponding to the transient component of the output (angular
The point machine is the component of the railway switch velocity), such as rise time and overshoot; metrics that are
system that is responsible for moving the switch blades and formulated in .
locking them in the final position until a new motion action
is initiated. It is composed of two sub-components: servo- 1
The angular velocity profile shown in the graph is similar but
motor and gear-cam mechanism. The electrical motor trans- not exactly the observed one, due to proprietary information re-
forms electrical energy into mechanical energy and gener- strictions.

227
Proceedings of the 26th International Workshop on Principles of Diagnosis

Figure 5: Adjuster diagram

ing the adjuster was modeling the non-sticking contact be-
tween the drive rod and the adjuster extremes. Stiff contact
Figure 3: Motor angular velocity two bodies is usually modeled using a spring-damper com-
ponent with very large values for the elasticity and damping
constants. However, under this approach once contact takes
The Gear-Cam mechanism place, it is permanent. To solve this challenge, we built a
As mentioned earlier, the gear-cam mechanism amplifies the custom component that models the non-sticking contact.
torque generated by the motor and transforms the rotational
motion into a translational motion. The technical details 3.3 Switch blades
provided to us confirmed only the presence of the cam, but The adjuster is connected to two switch blades that are
not of the gear. We inferred the presence of the latter, by moved from left to right or right to left, depending on
comparing the angular velocity of the motor with the cam’s the traffic needs. We look at a switch blade as a flexi-
angular velocity, estimated from the measured cam’s angle. ble body and used an approximation method to modeling
This allowed us to estimate the ratio between the two veloci- beams, namely the lumped parameter approximation. This
ties, and therefore estimate the gear ratio. The cam diagram method assumes that beam deflection is small and in the lin-
is shown in Figure 4, where a wheel rotates as a result of ear regime. The lumped parameter approach approximates
the torque transmitted through the gear and acts on a lever a flexible body as a set of rigid bodies coupled with springs
that pushes the drive rod. Using the geometry of the cam, and dampers. It can be implemented by a chain of alter-
nating bodies and joints. The springs and dampers act on
the bodies or the joints. The spring stiffness and damping
coefficients are functions of the material properties and the
geometry of the flexible elements. Parameters such a rail
length, mass and mass moment of inertia were provided to
us through technical documentation. To model the effect of
the rail moving on rolling bearings, we included a friction
component that accounts for energy loss due to friction. Al-
though the component can model different friction models,
the default models is Coulomb friction.

3.4 Fault augmentation
Figure 4: Cam schematics In this section we describe the modeling artifacts that were
used to include in the behavior of the system the four fault
the relation between the rotation motion and the linear mo- operating modes: loose lock-pin, misaligned adjuster bolts,
tion (that is, the relation between the angle and the stroke) obstacle and missing bearings.
is given by
stroke = R × sin(angle), Loose lock-pin
where R denotes the radius of the cam. In addition, the map The lock-pin referred in this fault mode connects the point
between the applied torque and the generated force is machine with the drive rod that transfers the motion to the
switch blades. More precisely, it locks the drive rod to the
1 point machine. When this lock-pin becomes loose due to
force = × torque × cos(angle).
R wear, it introduces a slackness in the way the motion is
As both the cam angle and the stroke were included in the transferred to the switch blades. The lock-pin fault affects
available measurements, we used a least square method to stability the connection point between the drive rod and
estimate the radius of the cam. the point machine. In time, if not fixed, this can lead to a
complete failure of the pin, and therefore the point-machine
3.2 Adjuster cannot longer act upon the blades. A custom-built compo-
The adjuster links the drive rod connected to the point ma- nent whose main characteristic is that it implements a non-
chine to the switch blades, and hence it is responsible for sticking pushing and pulling between two rods was built to
transferring the translational motion. There is a delay be- model the effects of this fault. The impact between the two
tween the time instants the drive rod and the switch blades rods is assumed to be elastic, that is, we use a spring-damper
start moving. This delay is controlled by setting the po- assembly with large values for their parameters to model the
sitions of two bolts on the drive rod. Tighter bolt setting contact. There are two types of contact: contact of the rods
means a smaller delay, while looser bolt setting produce a with the boundaries of the locking mechanism and contact
larger delay. The high level diagram of the adjuster is de- between the rods. Both these types of contact must exhibit
picted in Figure 5. The most challenging part in construct- non-sticking pushing and pulling properties.

228
Proceedings of the 26th International Workshop on Principles of Diagnosis

Misaligned adjuster bolts
In this fault mode the bolts of the adjuster deviate from their
nominal position. As a result, the instant at which the drive
rod meets the adjuster (and therefore the instant at which the
the switch rail starts moving) happens either earlier or later.
For example in a left-to-right motion, if the left bolt moves
to the right, the contact happens earlier. The reason is that
since the distance between the two bolts decreases, the left
bolt reaches the adjuster faster. As a result, when the drive
rod reaches its final position, there may be a gap between
the right switch blade and the right stock rail. In contrast, if
the left bolt moves to the left the contact happens later. The
model of the adjuster includes parameters that can set the
positions of the bolts, and therefore the effects of this fault
mode can be modeled without difficulty.
Figure 6: Motor torque with its five operating zones
Obstacle
In this fault mode, an obstacle prevents the switch blades
reach their final nominal position, and therefore a gap be- where the drive rod catches up again with switch blades an
tween the switch blades and the stock rail appears. The ef- pushes them to their final position. Finally, in Zone 5 the
fect on the motor torque is a sudden increase in value, as the switch blades are pushed against the stock rails for a short
motor tries to overcome the obstacle. To model this fault period of time, hence the increase in torque. In support of
we included a component that implements a hard stop for the validation of these five operating zone, a set of movies
the position of the switch blades. This component has two depicting the motion of the switch blades were used. With
parameters for setting the left and right limits within motion respect to the fault operating modes, we managed to gener-
of the switch blades is allowed. By changing the values of ate similar effects in the simulated data, as the ones observed
these parameters, the presence of an obstacle can be simu- in the measured data. Figure 7 shows the effect of the mis-
lated. aligned bolts fault, and in particular the case where the left
bolt moves to the left. The effect is a delay applied on the
Missing bearings time instant the drive rod reaches the switch blades. In ad-
To minimize friction, the rails are supported by a set of dition, Zone 5 is also affected since due to the decreased
rolling bearings. When they become stuck or lost, the en- distance, the switch blades are no longer pushed against the
ergy losses due to friction increase. As mentioned in the stock rails. In the case of an obstacle, the switch blades (and
section describing the switch blades modeling, a component
was included to account for friction. This component has a
parameter that sets the value for the friction coefficient. By
increasing the value of this parameter, the effect of the miss-
ing bearings fault can be simulated.

4 Model Validation
Motor angular velocity, cam angle and stroke, together with
the motor torque were used in the validation process. To
these measurements, we added the rail position that was
estimated from a set of movies depicting the rail motion,
to which image processing techniques were applied. We
achieved partial validation of the model. The simulated mo-
tor angular velocity, cam angle and stroke closely match
the measured data. The simulated motor torque however
matches in a qualitative sense its measured counterpart. The
main reason is the fact that we had to make assumptions on
the type controller motor and controller, without no way to Figure 7: Motor torque in the normal and misaligned bolts
validate these assumptions. In addition, the available mea- fault modes
surements did not allowe for the estimating the parameters
in the assumed models, as this problem is ill posed. Figure 6 hence the drive rod) push against an obstacle that does not
depicts the simulated torque, emphasizing the five operating allow the completion of the motion. Therefore, the electric
zone. In Zone 1, the motor rotates the cam and the drive rod motor develops the maximum allowable torque as seen in
moves freely. No contact with the switch blades takes place Figure 8. In the case of the missing bearing fault mode, the
in this zone, and the (small) energy loss is due to friction in motion friction of the switch blades increases, and hence
the mechanical components. Zone 2 corresponds to the case the torque generated by the motor must accommodate this
where the drive rod pushes the two switch blades. The elas- increase. We obtained this effect in simulation as shown in
ticity in the switch blades can be noticed in the toque profile Figure 9. Finally, Figure 10 shows the effects of the lock-
in this zone. In Zone 3, the switch blades accelerate (as they pin fault. The slackness introduced by the looseness of the
drop off the rolling bearings) and again the drive rod moves pin induces a delay in the rail motion which also affects the
freely (note the drop in torque). Zone 4 depicts the case behavior in Zone 5. In terms of the changes in the five op-

229
Proceedings of the 26th International Workshop on Principles of Diagnosis

effects in simulation. The choice of features described in the
next section was supported by this understanding.

5 Fault Detection and Diagnosis
In the case of a railway switch, our measurements include
the motor torque and motor angular velocity. As the switch
moves from one extreme position to the other, these quan-
tities are measured at a fixed sampling rate. Thus, we
obtain a time series for each of the measurements. Let
{τ (t1 ), . . . , τ (tN )} denote torque measured at time instants
{t1 , . . . , tN }. Likewise, let {ω(t1 ), . . . , ω(tN )} denote the
angular velocity. For simplicity’s sake, we denote the two
time series of measurements by X. The diagnosis objective
is to determine the underlying condition of the system from
these time series. In other words, the objective is to deter-
mine a classifier f : X → {N, F1 , F2 , F3 , F4 , F5 }, where
N refers to the class label corresponding to the normal con-
Figure 8: Motor torque in the normal and obstacle fault dition and F1 , F2 , F3 and F4 denote the class labels loose
modes bolt, tight bolt, loose lock-pin, missing bearings, and obsta-
cle respectively.
We adopt a machine learning approach to constructing the
above mentioned classifier. The two main steps in building
a machine learning classifier are feature selection and clas-
sifier type selection. These two steps are discussed next.

5.1 Feature selection
As seen in Figure 6, the motor torque profile shows five dis-
tinct operating zones. Moreover, we notice from Figures 7,
8, 9 and 10 that a given fault’s impact on the torque pro-
file seems limited to only some of the five zones. With this
observation, our feature selection strategy is as follows.
1. Identify the approximate time instants that define the
boundaries of the five zones. For example, Zone 1 is
defined to be between times 0.8 seconds and 2 seconds,
zone 2 is defined to be between times 2 seconds and 4.1
seconds, and so on.
Figure 9: Motor torque in the normal and missing bearings
2. Within each zone, compute a set of measures. An ex-
fault modes
ample of a measure is the total energy dissipated within
the zone. This is computed as instantaneous power in-
tegrated over the duration of the zone. The instanta-
neous power is the product of instantaneous torque and
angular velocity. Other examples of features include
maximum and minimum torque values within the zone.
The disclosure of the full set of measures used is not
possible at this time for proprietary reasons. The fea-
tures are normalized to have zero mean and unit stan-
dard deviation.
Note that it might be possible to combine one or more zones
into one for feature selection.

5.2 Classifier selection
To map the features to the classes, {N, F1 , F2 , F3 , F4 , F5 },
we use machine learning. Examples of types of classifiers
commonly used include k− nearest neighbors, support vec-
Figure 10: Motor torque in the normal and lock-pin fault tor machines, neural networks and decision trees. We chose
modes Random Forest, an ensemble classifier, because of its ro-
bustness to overfitting. For a more detailed discussion on
the advantages of Random Forest, we refer the reader to
erating zones, the simulated behavior showed similar char- [Breiman, 2001]. In addition, we also developed a binary
acteristics as in the case of the real data. The understanding classifier for fault detection based on Alternating Decision
of these behaviors come as a result of building the model, Tree (AD Tree). The advantage of AD Tree is that the re-
augmenting the model with fault modes, and analyzing their sults are human interpretable.

230
Proceedings of the 26th International Workshop on Principles of Diagnosis

5.3 Results primarily due to confusion between missing bearings and
For each fault type, we introduce varying magnitudes of normal. Figure 12 shows part of the fault detection AD
fault and simulate the switch model described earlier. The Tree. A pink oval represents a feature node. Depending
fault magnitude is parameterized by a factor k which is var- on the value of the feature, one of two branches is followed
ied over a pre specified range. A value of k equal to zero until a leaf node is reached. Each edge that is traversed re-
corresponds to normal case. Higher values of k correspond sults in a score shown within the blue rectangles. For every
to the faulty cases. In addition, we also add representative root to leaf traversal, the total score is the sum of the scores
noise to the measurements. Figure 11 shows some example accumulated on each edge. For a given data sample, mul-
torque profiles generated by the simulation. tiple root to leaf paths may be traversed. In that case, the
final score is the sum of the scores accumulated over all the
paths. If the final score is negative, the decision is normal;
otherwise the decision is abnormal.

Table 2: Fault detection confusion matrix on simulated data
Normal Abnormal
Normal 94.6 5.4
Abnormal 9.6 90.4

Next, we test the classifiers on real data. A key prepro-
cessing step is to compute a linear transformation that trans-
forms the mean and standard deviation of the features of the
nominal (normal) real data to make them equal to the mean
and standard deviation of the features of the nominal simu-
lated data. The same transformation is then applied on the
real faulty data before testing with the ML classifier. We
emphasize here that to compute the transformation we only
require examples of real data showing normal behavior. We
Figure 11: Simulated torque measurements with added do not use any real fault data for training the ML classifier.
noise. Table 3 shows the fault detection results on real data. As
The data generated is recorded and used to train and test can be seen, we achieve a high accuracy of greater than 80
the machine learning classifier. We use leave-one-out cross- percent. We also tested the multi-class random forest classi-
validation for training and testing the classifiers. In this ap- fier to diagnose the various faults. We were able to diagnose
proach, one data sample is used for testing whereas all the correctly all missing bearing faults but were unable to cor-
rest of the data is used for training. This is repeated un- rectly diagnose the other faults.
til each data sample has been tested once. Table 1 shows
the confusion matrix for the simulated data described ear- Table 3: Fault detection confusion matrix on real data
lier. The (i, j)th entry of the confusion matrix refers to the
percentage of cases where the true class was i but was clas- Normal Abnormal
sified as j by the classifier. A matrix with 100 along all Normal 85.5 14.5
the diagonal entries would correspond to a perfect classifier. Abnormal 20 80
In the results shown in Table 1, we observe some misclas-
sification between classes N and F4 . Recall that N is the
normal class and F4 is the missing bearing class. On fur-
ther investigation, we determined that the misclassification 6 Related Work
occurs between the normal data and data corresponding to A malfunctioning railway switch assembly can have a high
low magnitudes of the missing bearing fault. impact on the railway transportation safety, and therefore
the problem of diagnosing such systems has been addressed
in other works. [Zattoni, 2006] proposes a detection sys-
Table 1: Fault diagnosis confusion matrix on simulated data tem based on off-line processing of the armature current
N F1 F2 F3 F4 F5 and voltage. The system implements an algorithm that real-
N 97.2 0 0 2 0.8 0 izes a finite impulse response system designed on the basis
F1 0 100 0 0 0 0 of an H2 -norm criterion, and allows for detection of incre-
F2 0 0 99 1 0 0 mental faults (e.g., loss of lubrication, increasing obstruc-
F3 9 0 4 87 0 0 tions, etc.). The approach hinges on the availability of a
F4 11 0 0 0 89 0 validated model of the point machine, which was not the
case in our setup. [Zhou et al., 2001; 2002] propose a re-
F5 0 0 0 0 0 100
mote monitoring system for railway point machines. The
system includes a variety of sensors for acquiring trackside
The binary classification or fault detection result using data related to parameters such as, distance, driving force,
AD Tree is shown in Table 2. As in the multi-class classifi- voltage, electrical noise, or temperature. The monitoring
cation case, the false positives (normal classified as abnor- system logs data for offline analysis that offers detailed in-
mal), and false negatives (abnormal classified as normal) are formation on the condition of the system in the form of event

231
Proceedings of the 26th International Workshop on Principles of Diagnosis

0&(=&3.718&

0.135& <1.645&

Max&torque&in& Total&energy&
Feature&4& Feature&5&
zone&2& dissipated&

Feature&5& Feature&6&

Figure 12: Part of the fault detection AD Tree

analysis and data trends. Hence unlike in our setup, the fo- normal and abnormal behavior. This approach relies on a set
cus is on detection rather than isolation. In addition, due of sensors measurements such as motors, voltage, current or
to scalability constraints, our solution is based on the em- switch blade positions, not all of them being available in our
bedded sensors, no other sensor being added. In [Asada case. In addition, the computation of the net energy requires
et al., 2013] classification based fault detection and diag- parameters of the electrical motor (armature resistance and
nosis algorithm is developed using measurements such as motor shaft inertia) that again are not available in our setup.
drive force, electrical current and voltage. In particular, a In addition, unlike our diagnosis objective, the focus in on
classifier based on support vector machines is used. Our detecting abnormalities within the point machine.
work also uses classification for diagnosis, but considers a
wider verity of classifiers such as Multiclass Random For- 7 Conclusions
est or Logitboosted Random Forest that were proved to be
more robust [Opitz and Maclin, 1999]. The classification The three main general approaches to developing diagnostic
step in [Asada et al., 2013] depends on a set of features ex- software (FDI, MBR, and ML) all have severe limitations in
tracted by applying the discrete wavelet transform on the many real-world applications. We believe we will see many
active power. This step is oblivious on the operating modes more hybrid approaches to diagnosis that include the best of
of the point machine, which we showed to relevant in our these three approaches to build accurate diagnosers.The rail-
case. Hence, the diagnosis approach in [Asada et al., 2013] way switch is a critical and complex piece of equipment re-
is purely data driven. Since we had no access to current and quiring extremely high diagnostic accuracy (the main reason
voltage measurements this avenue for feature construction this project was initiated), and the approach outlined in this
was not available to us. Depending of the type of electri- paper was ultimately successful. Ultimately deployment of
cal motors, the current and the voltage could be computed this approach will depend on expanding the set of faults de-
from the angular velocity and torque, respectively. How- tecting and on installation of more sensor rich switches in
ever, knowledge of motor parameters is needed. [Asada railroad infrastructures.
et al., 2013] consider two type of faults: underdriving and
overdriving of the drive rod. Overdriving refers to the case References
where the switch blades are pushed against the stock rails
due to misalignment, and a higher force then normal ap- [Asada et al., 2013] T. Asada, C. Roberts, and T. Koseki.
pears between the stock rails and the switch blades. Over- An algorithm for improved performance of railway con-
driving map to misaligned bolts, missing bearings and ob- dition monitoring equipment: Alternating-current point
stacles in our setup. All these fault modes exhibit higher machine case study. Transportation Research Part C:
forces than normal. Underdriving maps to a particular in- Emerging Technologies, 30(0):81 – 92, 2013.
stance of the misaligned bolts fault (left bolt moves to the [Breiman, 2001] Leo Breiman. Random forests. Machine
left for example). Therefore, our solution differentiate be- learning, 45(1):5–32, 2001.
tween more possible causes of higher forces since we take
advantage of the particular signature these forces have in [de Kleer et al., 1992] J. de Kleer, A. Mackworth, and
each fault corresponding to overdriving. Another pure data- R. Reiter. Characterizing diagnoses and systems. 56(2-
driven approach for railway point machine monitoring was 3):197–222, 1992.
proposed in [Oyebande and Renfrew, 2002], where a net [Gertler, 1998] J. Gertler. Fault-Detection and Diagnosis in
energy analysis technique was used to discriminate between Engineering Systems. New York: Marcel Dekker, 1998.

232
Proceedings of the 26th International Workshop on Principles of Diagnosis

[Isermann, 1997] R. Isermann. Supervision, fault-detection
and fault-diagnosis methods - An introduction. Control
Engineering Practice, 5(5):639 – 652, 1997.
[Isermann, 2005] Rolf Isermann. Model-based fault-
detection and diagnosis - status and applications. Annual
Reviews in Control, 29(1):71 – 85, 2005.
[Minhas et al., 2014] R. Minhas, J. de Kleer, I. Matei,
B. Saha, B. Janssen, D.G. Bobrow, and T Kortuglu. Us-
ing fault augmented Modelica model for diagnostics. In
Proceedings of the 10th International Modelica Confer-
ence, Dec 2014.
[Opitz and Maclin, 1999] David Opitz and Richard Maclin.
Popular ensemble methods: an empirical study. Journal
of Artificial Intelligence Research, 11:169–198, 1999.
[Oyebande and Renfrew, 2002] B.O. Oyebande and A.C.
Renfrew. Condition monitoring of railway electric point
machines. Electric Power Applications, IEE Proceedings
-, 149(6):465–473, Nov 2002.
[Patton et al., 2000] Ron J. Patton, Paul M. Frank, and
Robert N. Clark. Issues of Fault Diagnosis for Dynamic
Systems. Springer-Verlag London, 2000.
[Tiller, 2001] Michael Tiller. Introduction to Physical Mod-
eling with Modelica. Kluwer Academic Publishers, Nor-
well, MA, USA, 2001.
[Zattoni, 2006] Elena Zattoni. Detection of incipient fail-
ures by using an -norm criterion: Application to rail-
way switching points. Control Engineering Practice,
14(8):885 – 895, 2006.
[Zhou et al., 2001] F. Zhou, M. Duta, M. Henry, S. Baker,
and C. Burton. Condition monitoring and validation
of railway point machines. In Intelligent and Self-
Validating Instruments – Sensors and Actuators (Ref. No.
2001/179), IEE Seminar on, pages 6/1–6/7, Dec 2001.
[Zhou et al., 2002] F.B. Zhou, M.D. Duta, M.P. Henry,
S. Baker, and C. Burton. Remote condition monitoring
for railway point machine. In Railroad Conference, 2002
ASME/IEEE Joint, pages 103–108, April 2002.

233
Proceedings of the 26th International Workshop on Principles of Diagnosis

234