=Paper=
{{Paper
|id=Vol-1507/dx15paper24
|storemode=property
|title=Data-Driven Monitoring of Cyber-Physical Systems Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control
|pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper24.pdf
|volume=Vol-1507
|dblpUrl=https://dblp.org/rec/conf/safeprocess/NiggemannBKKVB15
}}
==Data-Driven Monitoring of Cyber-Physical Systems Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control==
Proceedings of the 26th International Workshop on Principles of Diagnosis
Data-Driven Monitoring of Cyber-Physical Systems
Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control
Oliver Niggemann1,3 , Gautam Biswas2 , John S. Kinnebrew2 , Hamed Khorasgani2 ,
Sören Volgmann1 and Andreas Bunte3
1
Fraunhofer Application Center Industrial Automation, Lemgo, Germany
e-mail: {oliver.niggemann, soeren.volgmann}@iosb-ina.fraunhofer.de
2
Vanderbilt University and Institute for Software Integrated Systems, Nashville, TN, USA
e-mail: {john.s.kinnebrew, hamed.g.khorasgani, gautam.biswas}@vanderbilt.edu
3
Institute Industrial IT, Lemgo, Germany
e-mail: {andreas.bunte}@hs-owl.de
Abstract modeled. However, the last 20 years have clearly shown that
such models are rarely available for complex CPSs; when
The majority of projects dealing with monitoring they do exist, they are often incomplete and sometimes in-
and diagnosis of Cyber Physical Systems (CPSs) accurate, and it is hard to maintain the effectiveness of these
relies on models created by human experts. But models during a system’s life-cycle.
these models are rarely available, are hard to ver- A promising alternative is the use of data-driven ap-
ify and to maintain and are often incomplete. proaches, where monitoring and diagnosis knowledge can
Data-driven approaches are a promising alterna- be learned by observing and analyzing system behavior.
tive: They leverage on the large amount of data Such approaches have only recently become possible: CPSs
which is collected nowadays in CPSs, this data is now collect and communicate large amounts of data (see Big
then used to learn the necessary models automati- Data [9]) via standardized interfaces, giving rise to what is
cally. For this, several challenges have to be tack- now called the Internet of Things [10]. This large amount
led, such as real-time data acquisition and storage of data can be exploited for the purpose of detecting and an-
solutions, data analysis and machine learning al- alyzing anomalous situations and faults in these large sys-
gorithms, task specific human-machine-interfaces tems: The vision is developing CPSs that can observe their
(HMI) and feedback/control mechanisms. In this own behavior, recognize unusual situations during opera-
paper, we propose a cognitive reference architec- tions, inform experts, who can then update operations proce-
ture which addresses these challenges. This ref- dures, and also inform operators, who use this information
erence architecture should both ease the reuse of to modify operations or plan for repair and maintenance.
algorithms and support scientific discussions by In this paper, we take on the challenges of proposing
providing a comparison schema. Use cases from a common data-driven framework to support monitoring,
different industries are outlined and support the anomaly detection, prognosis (degradation modeling), diag-
correctness of the architecture. nosis, and control. We discuss the challenges for developing
such a framework, and then discuss case studies that demon-
1 Motivation strate some initial steps toward data-driven CPSs.
The increasing complexity and the distributed nature of
technical systems (e.g. power generation plants, manufac-
2 Challenges
turing processes, aircraft and automobiles) have provided In order to implement data-driven solutions for the moni-
traction for important research agendas, such as Cyber Phys- toring, diagnosis, and control of CPSs, a variety of chal-
ical Systems (CPSs) [1; 2], the US initiative on the “Indus- lenges must be overcome to enable the learning pathways
trial Internet” [3] and its German counterpart “Industrie 4.0” illustrated in Figure 1:
[4]. In these agendas, a major focus is on self-monitoring, Data Acquisition: All data collected from distributed
self-diagnosis and adaptivity to maintain both operability CPSs, e.g. sensors, actuators, software logs, and business
and safety, while also taking into account humans-in-the- data, must meet real-time requirements, as well as includ-
loop for system operation and decision making. Typical ing time synchronization and spatial labeling when relevant.
goals of such self-diagnosis approaches are the detection Often sensors and actuators operate at different rates, so data
and isolation of faults and anomalies, identifying and an- alignment, especially for high-velocity data, becomes an is-
alyzing the effects of degradation and wear, providing fault- sue. Furthermore, data must be annotated semantically to
adaptive control, and optimizing energy consumption [5; allow for a later data analysis.
6]. Data Storage, Curation, and Preprocessing: Data will be
So far, the majority of projects and papers for analy- stored and preprocessed in a distributed way. Environmen-
sis and diagnosis has relied on manually-created diagno- tal factors and the actual system configuration (e.g., for the
sis models of the system’s physics and operations [6; 7; current product in a production system) must also be stored.
8]: If a drive is used, this drive is modeled, if a reactor is in- Depending on the applications, a relational database format,
stalled, the associated chemical and physical processes are or increasingly distributed noSQL technologies [11], may
185
Proceedings of the 26th International Workshop on Principles of Diagnosis
Cyber Physical System Task-specific
Human-Machine-Interface
Distributed Abstracted
Diagnosis Condition Monitoring
Data Storage
Data Knowledge
Acquisition Machine
Usage and Editing Diagnosis
OK Cancel
Learning
…… of Knowledge
Controller Controller
…... Energy
OK Analysis
Cancel
…....
Network OK Cancel
Feedback mechanisms
and control
Figure 1: Challenges for the analysis of CPSs.
need to be adopted, so that the right subsets of data may be raising the overall efforts, preventing any reuse of hard-
retrieved for different analyses. Real-world data can also be ware/software and impeding a comparison between solu-
noisy, partially corrupted, and have missing values. All of tions.
these need to be accommodated in the curation, storage, and To achieve better standardization, efficiency, and repeata-
pre-processing applications. bility, we suggest a generic cognitive reference architecture
Data Analysis and Machine Learning: Data must be ana- for the analysis of CPSs. Please note that this architecture is
lyzed to derive patterns and abstract the data into condensed a pure reference architecture which does not constraint later
usable knowledge. For example, machine learning algo- implementations and introduction of application-specific
rithms can generate models of normal system behavior in methods.
order to detect anomalous patterns in the data [12]. Other Figure 2 shows its main components:
algorithms can be employed to identify root causes of ob-
served problems or anomalies. The choice and design of
appropriate analyses and algorithms must consider factors User
Task-Specific HMI
like the ability to handle large volumes and sometimes high Task-Specific HMI
velocities of heterogeneous data. At a minimum, this gener- Conceptual Interface
I/F 4 I/F 5
ally requires machine learning, data mining, and other anal-
System Synthesis
Data
System Analysis
System
ysis algorithms that can be executed in parallel, e.g., using Abstraction
and ML
Repair Conceptual Layer
the Spark [13], Hadoop [14], and MapReduce [15] architec-
System Synthesis
I/F 3 I/F 6
System Analysis
Real-time Big Data Platform
tures. In some cases, this may be essential to meet real-time Learning Adaptation
analysis requirements. Cyber Physical System
I/F 2 I/F 7
Task-specific Human-Machine-Interfaces: Tasks such as
condition monitoring, energy management, predictive main- Controller Controller
Big Data Platform
tenance or diagnosis require specific user interfaces [16]. Network
I/F 1
One set of interfaces may be more tailored for offline analy- Cyber Physical System
sis to allow experts to interact with the system. For example,
experts may employ information from data mining and ana-
lytics to derive new knowledge that is beneficial to the future Controller
Network
Controller
operations of the system. Another set of interfaces would be
appropriate for system operators and maintenance person-
nel. For example, appropriate operator interfaces would be Figure 2: A cognitive architecture as a solution for the anal-
tailored to provide analysis results in interpretable and ac- ysis of CPSs.
tionable forms, so that the operators can use them to drive
decisions when managing a current mission or task, as well Big Data Platform (I/F 1 & 2): This layer receives all rel-
as to determine future maintenance and repair. evant system data, e.g., configuration information as well
Feedback Mechanisms and Control: As a reaction to rec- as raw data from sensors and actuators. This is done by
ognized patterns in the data or to identified problems, the means of domain-dependent, often proprietary interfaces,
user may initiate actions such as a reconfiguration of the here called interface 1 (I/F 1). This layer then integrates,
plant or an interruption of the production for the purpose of often in real-time, all of the data, time-synchronizes them
maintenance. In some cases, the system may react without and annotates them with meta-data that will support later
user interactions; in this case, the user is only informed. analysis and interpretation. For example, sensor meta-data
may consist of the sensor type, its position in the system and
3 Solutions its precision. This data is provided via I/F 2, which, there-
fore, must comprise the data itself and also the meta-data
As Section 4 will show, the challenges from Section 2 reap- (i.e., the semantics). A possible implementation approach
pear in the majority of CPS examples. While details, such for I/F 2 may be the mapping into and use of existing of Big
as the machine learning algorithms employed or the nature Data platforms, such as Sparks or Hadoop, for storing the
of data and data storage formats can vary, the primary steps data and the Data Distribution Service (DDS) for acquiring
are about the same. Most CPS solutions re-implement all of the data (and meta-data).
these steps and even employ different solution strategies— Learning Algorithms (I/F 2 & 3): This layer receives all
186
Proceedings of the 26th International Workshop on Principles of Diagnosis
data via I/F 2. Since I/F 2 also comprises meta-data, the ma- 4 Case Studies
chine learning and diagnosis algorithms need not be imple- We present a set of case studies that cover the manufacturing
mented specifically for a domain but may adapt themselves and process industries, as well as complex CPS systems,
to the data provided. In this layer, unusual patterns in the such as aircraft.
data (used for anomaly detection), degradation effects (used
for condition monitoring) and system predictions (used for 4.1 Manufacturing Industry
predictive maintenance) are computed and provided via I/F The modeling and learning of discrete timing behavior for
3. Given the rapid changes in data analysis needs and capa- manufacturing industry (e.g., automative industry) is a new
bilities, this layer may be a toolbox of algorithms where new field of research. Due to the intuitive interpretation, Timed
algorithms can be added by means of plug-and-play mecha- Automata are well-suited to model the timing behavior of
nisms. I/F 3 might again be implemented using DDS. these systems. Several algorithms have been introduced to
Conceptual Layer (I/F 3 & 4): The information provided learn such Timed Automata, e.g. RTI+ [17] and BUTLA
by I/F 3 must be interpreted according to the current task [18]. Please note that the expert still has to provide struc-
at hand, e.g. computing the health state of the system.
tural information about the system (e.g. asynchronous sub-
Therefore, the provided information about unusual patterns, systems) and that only the temporal behavior is learned.
degradation effects and predictions are combined with do-
main knowledge to identify faults, their causes and rate them
according to the urgency of repair. A semantic notation will Aspirator on
[25…2500]
Muscle on
[8…34] Silo empty Muscle off
be added to the information, e.g. the time for next main- 0 1 2
[8…34]
3
[8…34]
tenance or a repair instruction, which will be provided at Muscle off
4
I/F 4 in a human understandable manner. From a computer [7…35] Aspirator off
[2200…2500]
science perspective, this layer provides reasoning capabili-
ties on a symbolic or conceptual level and adds a semantic
context to the results.
Task-Specific HMI (I/F 4 & 5): The user is in the center
of the architecture presented here, and, therefore, requires
task-, context- and role-specific Human-Machine-Interfaces
(HMIs). This HMI uses I/F 4 to get all needed analysis
results and presents them to the user. Adaptive interfaces,
rather than always showing the results of the same set of
Silo empty Conveyor off
[8…3400] [8…25]
1 2 3
analyses, could allow a wider range of information to be Silo full
provided, while maintaining efficiency and preventing in- 0 [1000…34000]
4
formation overload. Beyond obvious dynamic capabilities
like alerts for detected problems or anomalies, the interfaces
could further adapt the information displayed to be more Figure 3: Learned Timed Automata for a manufacturing plant.
relevant to the current user context (e.g. the user’s loca-
tion within a production plant, recognition of tasks the user The data acquisition for this solution (I/F 1 in Figure 2)
may be engaged in, observed patterns of the user’s previous has been implemented using a direct capturing of Profinet
information-seeking behavior, and knowledge of the user’s signals including an IEEE 1588 time-synchronization. The
technical background). If the user decides to influence the data is offered via OPC UA (I/F 2). On the learning layer,
system (e.g. shutdown of a subsystem or adaptation of the timed automata are learned from historical data and com-
system behavior), I/F 5 is used to communicate this deci- pared to the observed behavior. Also, the sequential behav-
sion to the conceptual layer. Again, I/F 4 and I/F 5 might be ior of the observed events as well as the timing behavior
implemented using DDS. is checked, anomalies are signaled via I/F 3. On the con-
Conceptual Layer (I/F 5 & 6): The user decisions will be ceptual layer it is decided whether an anomaly is relevant.
received via I/F 5. The conceptual layer will use the knowl- Finally, a graphical user interface is connected to the con-
edge to identify actions which are needed to carry out the ceptual layer via OPC UA (I/F 4).
users’ decisions. For example, a decision to decrease the Figure 3 shows learned automata for a manufacturing
machine’s cycle time by 10 % could lead to actions such as plant: The models correspond to modules of the plants, tran-
decreasing the robot speed by 10 % and the conveyor speed sitions are triggered by a control signals and are annotated
by 5 % or the decision to shutdown a subsystem. These ac- with a learned timing interval.
tions are communicated via I/F 6 to the adaption layer.
Adaption (I/F 6 & 7): This layer receives system adaption 4.2 Energy Analysis In Process Industry
commands on the conceptual level via I/F 6—which again Analyzing the energy consumption in production plants has
might be based on DDS. Examples are the decrease of robot some special challenges: Unlike the discrete systems de-
speed by 10 % or a shutdown of a subsystem. The adap- scribed in Section 4.1, also continuous signals such as the
tion layer takes these commands on the conceptual level energy consumption must be learned and analyzed. But also
and computes, in real-time, the corresponding changes to the discrete signals must be taken into consideration because
the control system. For example, a subsystem shutdown continuous signals can only be interpreted with respect to
might require a specific network signal or a machine’s tim- the current system’s status, e.g. it is crucial to know whether
ing is changed by adapting parameters of the control algo- a valve is open or whether a robot is turned on. And the
rithms, again by means of network signals. I/F 7 therefore system’s status is usually defined by the history of discrete
uses domain-dependent interfaces. control signals.
187
Proceedings of the 26th International Workshop on Principles of Diagnosis
the production cycles. In Figure 6 the architecture of the big
data platform is depicted.
Cyber-Physical System Hadoop Ecosystem Grafana Webvisualisation
Hadoop Distributed
Filesystem (HDFS)
OpenTSDB
Controller Controller HBase
Network
Figure 6: Data Analysis Plattform in Manufacturing
The CPS is connected through OPC UA (I/F 1 in Figure 2)
with an Hadoop ecosystem. Hadoop itself is an software
framework for scalable distributed computing. The process
llll
2 data is stored in an non-relational database (HBase) which is
based on a distributed file-system (HDFS). On top of HBase,
a time-series database OpenT SDB is used as an interface
to explore and analyze the data (I/F 2 in Figure 2). Through
Figure 4: A learned hybrid automaton modeling a pump. this database it is possible to do simple statistics such as
mean-values, sums or differences, which is usually not pos-
sible within the non relational data stores.
In [19], an energy anomaly detection system is de-
Using the interfaces of OpenTSDB or Hadoop, it be-
scribed which analyzes three production plants. Ethercat
comes possible to analyze the data directly on the storage
and Profinet is used for I/F 1 and OPC UA for I/F 2. The col-
system. Hence, the volume of a historical dataset need not
lected data is then condensed on the learning layer into hy-
be loaded into a single computer system, instead the algo-
brid timed automata. Also on this layer, the current energy
rithms can work distributively on the data. A web interface
consumption is compared to the energy prediction. Anoma-
can be used to visualize the data as well as the computed re-
lies in the continuous variables are signaled to the user via
sults. In Figure 6, grafana is used for data visualization. In
mobile platforms using web services (I/F 3 and 4).
the SmartFactoryOWL this big data platform is currently be-
In Figure 4, a pump is modeled by means of such au-
ing connected to the application scenarios from Sections 4.1
tomata using the flow rate and switching signals. The three
and 4.2.
states S0 to S2 are separating the continuous function into
three linear pieces which can then be learned automatically. 4.4 Anomaly Detection in Aircraft Flight Data
Figure 5 shows a typical learned energy consumption
(here for bulk good production). Fault detection and isolation schemes are designed to detect
the onset of adverse events during operations of complex
systems, such as aircraft and industrial processes. In other
work, we have discussed approaches using machine learn-
ing classifier techniques to improve the diagnostic accuracy
of the online reasoner on board of the aircraft [20]. In this
paper, we discuss an anomaly detection method to find pre-
viously undetected faults in aircraft system [21].
The flight data used for improving detection of existing
faults and discovering new faults was provided by Honey-
well Aerospace and recorded from a former regional airline
that operated a fleet of 4-engine aircraft, primarily in the
Midwest region of the United States. Each plane in the fleet
flew approximately 5 flights a day and data from about 37
aircraft was collected over a five year period. This produced
Figure 5: A measured (black line) and a learned power consump- over 60,000 flights. Since the airline was a regional carrier,
tion (red line). most flight durations were between 30 and 90 minutes. For
each flight, 182 features were recorded at sample rates that
varied from 1Hz to 16Hz. Overall this produced about 0.7
4.3 Big Data Analysis in Manufacturing Systems TB of data.
Analyzing historical process data during the whole produc- Situations may occur during flight operations, where the
tion cycle requires new architectures and platforms for han- aircraft operates in previously unknown modes that could be
dling the enormous volume, variety and velocity of the data. attributed to the equipment, the human operators, or envi-
Data analysis pushes the classical data acquisition and stor- ronmental conditions (e.g., the weather). In such situations,
age up to its limits, i.e. big data platforms are need. data-driven anomaly detection methods [12], i.e., finding
In the assembling line of the SmartFactoryOWL, a small patterns in the operations data of the system that were not
factory used for production and research, a big data platform expected before can be applied. Sometimes, anomalies
is established to acquire, store and visualize the data from may represent truly aberrant, undesirable and faulty behav-
ior; however, in other situations they may represent behav-
188
Proceedings of the 26th International Workshop on Principles of Diagnosis
iors that are just unexpected. We have developed unsuper- 4.5 Reliability and Fault Tolerant Control
vised learning or clustering methods for off-line detection Most complex CPSs are safety-critical systems that operate
of anomalous situations. Once detected and analyzed, rele- with humans-in-the-loop. In addition to equipment degrada-
vant information is presented to human experts and mission tion and faults, humans can also introduce erroneous deci-
controllers to interpret and classify the anomalies. sions, which becomes a new source of failure in the system.
Figure 7 illustrates our approach. We started with cu- Figure 8 represents possible faults and cyber-attacks that can
rated raw flight data (layer ”Big Data Platform” in Figure occur in a CPS.
2), transforming the time series data associated with the dif- There are several model-based fault tolerant control
ferent flight parameters to a compressed vector form using strategies for dynamic systems in the literature (see for ex-
wavelet transforms. The next step included building a dis- ample [23] and [24]). Research has also been conducted to
similarity matrix of pairwise flight segments using the Eu- address network security and robust network control prob-
clidean distance measure, followed by a subsequent step lems (see for example [25] and [26]). However, these meth-
where the pairwise between flight distances was used to ods need mathematical models of the system, which may
run a ‘complete link’ hierarchical clustering algorithm [22] not exist for large scale complex systems. Therefore, data
(layer ”Learning” in Figure 2). Run on the flight data, the driven control [27] and data driven fault tolerant control [28]
algorithm produced a number of large clusters that we con- have become an important research topic in recent years.
sidered to represent nominal flights, and a number of smaller For CPSs, there are more aspects of the problem that need
clusters and outlier flights that we initially labeled as anoma- to be considered. As it is shown in Figure 8, there are many
lous. By studying the feature value differences between the sources of failure in these systems.
larger nominal and smaller anomalous clusters with the help We propose a hybrid approach that uses an abstract model
of domain experts, we were able to interpret and explain the of the complex system and utilizes the data to ensure the
anomalous nature (”Conceptual Layer” in Figure 2). compatibility between model and the complex system. Data
These anomalies or faults represented situations that the abstraction and machine learning techniques are employed
experts had not considered before; therefore, this unsuper- to extract patterns between different control configurations
vised or semi-supervised data driven approach provided a and system outputs unit by computing the correlation be-
mechanism for learning new knowledge about unanticipated tween control signals and the physical subsystems outputs.
system behaviors. For example, when analyzing the aircraft The highly correlated subsystems (layer ”Learning” in Fig-
data, we found a number of anomalous clusters. One of ure 2) become candidates for further study of the effects of
them turned out to be situations where one of the four en- failure and degradation at the boundary of these interacting
gines of the aircraft was inoperative. On further study of ad- subsystems. For complex systems, all possible inteeractions
ditional features, the experts concluded that these were test and their consequences are hard to pre-determine, and data-
flights conducted to test aspects of the aircraft, and, there- driven approaches help fill this gap in knowledge to support
fore, they repesented known situations, and, therefore, not more informed decision-making and control. A case-based
an interesting anomaly. A second group of flights were in- reasoning module can be designed to provide input on past
terpreted to be take offs, where the engine power was set successes and failed opportunities, which can then be trans-
much higher than most flights in the same take off condition. lated by human experts into operational monitoring, fault di-
Further analysis of environmental features related to these agnosis, and control situations (’Conceptual Layer” in Fig-
set of take-off’s revealed that these were take-offs from a ure 2). Some of the control paradigms that govern appro-
high altitude airport at 7900 feet above sea level. priate control configurations, such as modifying sequence
A third cluster provided a more interesting situation. The of mission tasks and switching between different objectives
experts when checking on the features that had significantly or changing the controller parameters (layer Adaptation in
different values from the nominal flights realized that the Figure 2) are being studied in a number of labs including
auto throttle disengaged in the middle of the aircraft’s climb ours [29].
trajectory. The automatic throttle is designed to maintain Example Fault Tolerant Control of Fuel Transfer Sys-
either constant speed during takeoff or constant thrust for tem The fuel system supplies fuel to the aircraft engines.
other modes of flight. This was an unusual situation where Each individual mission will have its own set of require-
the auto thruster switched from maintaining speed for a ments. However, common requirements such as saving the
takeoff to a setting that applied constant thrust, implying aircraft Center of Gravity (CG), safety, and system relia-
that the aircraft was on the verge of a stall. This situation bility are always critical. A set of sensors included in the
was verified by the flight path acceleration sensor shown in system to measure different system variables such as the
Figure 7. By further analysis, the experts determined that in fuel quantity contained in each tank, engines fuel flow rates,
such situations the automatic throttle would switch to a pos- boost pump pressures, position of the valves and etc.
sibly lower thrust setting to level the aircraft and compensate There are several failure modes such as the total loss or
for the loss in velocity. By examining the engine parame- degradation in the electrical pumps or a leakage in the tanks
ters, the expert verified that all the engines responded in an or connecting pipes in the system. Using the data and the ab-
appropriate fashion to this throttle command. Whereas this stract model we can detect and isolate the fault and estimate
analysis did not lead to a definitive conclusion other than the its parameters. Then based on the type fault and its severity
fact the auto throttle, and therefore, the aircraft equipment, the system reconfiguration unit chooses the proper control
responded correctly, the expert determined that further anal- scenario form the control library. For example in normal sit-
ysis was required to answer the question “why did the air- uation the transfer pumps and valves are controlled to main-
craft accelerate in such a fashion and come so close to a tain a transfer sequence to keep the aircraft center of gravity
stall condition?”. One initial hypothesis to explain these within limits. This control includes maintaining a balance
situations was pilot error. between the left and right sides of the aircraft. When there
189
Proceedings of the 26th International Workshop on Principles of Diagnosis
Raw Flight Data
Hierarchical
Wavelet Transform
Clustering
Dii Dij … Din
Flight
Dissimilarity
Matrix
Anomalous
max(dAN)
Current
Flight
Figure 7: Data Driven Anomaly Detection Approach for Aircraft Flights
System Reconfiguration
adapted frequently.
In Sections 4.1 and 4.2, structural information about the
plant is imported from the engineering chain and the tempo-
Cyber-attack
System and
ral behavior is learned in form of timed automata. In Section
Human error
actuator faults 4.5, an abstract system model describing the input/output
Actuator faults
Operator
Controller
Parameters
Sensor fault structure and the main failure types is provided and again the
Physical
behavior is learned. These approaches are typical because in
Controller Actuators
System
Sensors
most applications structural information can be gained from
Controller earlier engineer phases while behavior models hardly exist
Library
and are almost never validated with the real system.
Communication network Looking at the learning phase, the first thing to notice
is that all described approaches work and deliver good re-
Communication error and noise sults: For CPSs, data-driven approaches have moved into
the focus of research and industry. And they are well suited
Cyber Physical System for CPSs: They adjust automatically to new system config-
urations, they do not need manual engineering efforts and
Figure 8: Possible faults in a CPS. they make usage of the now available large number of data
signals—connectivity being a typical feature of CPSs.
Another common denominator of the described appli-
is a small leak, normally the system can tolerate it depend- cation examples is that the focus is on anomaly detec-
ing on where the leak is, but the leak usually grows over tion rather than on root cause analysis: for data-driven ap-
time. Therefore we need to estimate the leakage rate and re- proaches it is easier to learn a model of the normal behav-
configure the system to move the fuel from the tank or close ior than learning erroneous behavior. And it is also typi-
the pipe before critical situation. cal that the only root cause analysis uses a case-based ap-
proach (Section 4.5), case-based approaches being suitable
5 Conclusions for data-driven solutions to diagnosis.
Data-driven approaches to the analysis and diagnosis of Finally, the examples show that the proposed cognitive
Cyber-Physical Systems (CPSs) are always inferior to clas- architecture (Figure 2) matches the given examples:
sical model-based approaches, where models are created Big Data Platform: Only a few examples (e.g. Section 4.3)
manually by experts: Experts have background knowledge make usage of explicit big data platforms, so-far solutions
which can not be learned from models and experts automat- often use proprietary solutions. But with the growing size of
ically think about a larger set of system scenarios than can the data involved, new platforms for storing and processing
be observed during a system’s normal lifetime. the data are needed.
So the question is not whether data-driven or expert- Learning: All examples employ machine learning
driven approaches are superior. The question is rather technologies—with a clear focus on unsupervised learning
which kind of models can we realistically expect to ex- techniques which require no a-priori knowledge such as
ist in real-world applications—and which kind of models clustering (Section 4.4) or automata identification (Sections
must therefore be learned automatically. This becomes es- 4.1, 4.2).
pecially important in the context of CPSs since these sys- Conceptual Layer: In all examples, the learned models are
tems adapt themselves to their environment and show there- evaluated on a conceptual or symbolic level: In Section 4.4,
fore a changing behavior, i.e. models would also have be clusters are compared to new observations and data-cluster
190
Proceedings of the 26th International Workshop on Principles of Diagnosis
distances are used for decision making. In Sections 4.1 and The 24th International Workshop on Principles of Di-
4.2, model predictions are compared to observations. And agnosis, pages 71–78, 2013.
again, derivations are decided on by a conceptual layer. [8] D. Klar, M. Huhn, and J. Gruhser. Symptom propaga-
Task-Specific HMI: None of the given examples works com- tion and transformation analysis: A pragmatic model
pletely automatically, in all cases the user is involved in the for system-level diagnosis of large automation sys-
decision making. tems. In Emerging Technologies Factory Automation
Adaption: In most cases, reactions to detected problems (ETFA), 2011 IEEE 16th Conference on, pages 1–9,
are up to the expert. The use case from Section 4.5 is an Sept 2011.
example for an automatic reaction and the usage of analysis
results for the control mechanism. [9] GE. The rise of big data - leveraging large time-
series data sets to drive innovation, competitiveness
Using such a cognitive architecture would bring several and growth - capitalizing on the big data oppurtu-
benefits to the community: First of all, algorithms and nity. Technical report, General Electric Intelligent
technologies in the different layers can be changed quickly Platforms, 2012.
and can be re-used. E.g. learning algorithms from one
application field can be put on top of different big data [10] A. Katasonov, O. Kaykova, O. Khriyenko, S. Nikitin,
platforms. Furthermore, currently most existing approaches and V. Terziyan. Smart semantic middleware for the
mix the different layers, making the comparison of ap- internet of things. In 5th International Conference
proaches to the analysis of CPSs difficult. Finally, such an on Informatics in Control, Automation and Robotics
architecture helps to clearly identify open issues for the (ICINCO), 2008.
development of smart monitoring systems. [11] Michael Stonebraker. Sql databases v. nosql databases.
Communications of the ACM, 53(4):10–11, 2010.
Acknowledgments The work was partly supported [12] Varun Chandola, Arindam Banerjee, and Vipin Kumar.
by the German Federal Ministry of Education and Re- Anomaly detection: A survey. ACM Computing Sur-
search (BMBF) under the project "Semantics4Automation" veys (CSUR), 41.3:1–72, Sept 2009.
(funding code: 03FH020I3), under the project "Analyse [13] M. Zaharia, M. Chowdhury, M. J. Franklin,
großer Datenmengen in Verarbeitungsprozessen (AGATA)" S. Shenker, and I Stoica. Spark: cluster comput-
(funding code: 01IS14008 A-F) and by NASA NRA ing with working sets. In Proceedings of the 2nd
NNL09AA08B from the Aviation Safety program. We also USENIX conference on Hot topics in cloud computing,
acknowledges the contributions of Daniel Mack, Dinkar page 10, June 2010.
Mylaraswamy, and Raj Bharadwaj on the aircraft fault di-
agnosis work. [14] K. Shvachko, H. Kuang, S. Radia, and R. Chansler.
The hadoop distributed file system. In Proceedings
References 26th IEEE Symposium on Mass Storage Systems and
Technologies (MSST), pages 1–10, May 2010.
[1] E.A. Lee. Cyber physical systems: Design challenges.
In Object Oriented Real-Time Distributed Computing [15] M JAYASREE. Data mining: Exploring big data us-
(ISORC), 2008 11th IEEE International Symposium ing hadoop and mapreduce. International Journal of
on, pages 363–369, 2008. Engineering Sciences Research-IJESR, 4(1), 2013.
[2] Ragunathan (Raj) Rajkumar, Insup Lee, Lui Sha, and [16] Friedhelm Nachreiner, Peter Nickel, and Inga Meyer.
John Stankovic. Cyber-physical systems: The next Human factors in process control systems: The de-
computing revolution. In Proceedings of the 47th De- sign of human–machine interfaces. Safety Science,
sign Automation Conference, DAC ’10, pages 731– 44(1):5–26, 2006.
736, New York, NY, USA, 2010. ACM. [17] Sicco Verwer. Efficient Identification of Timed Au-
[3] Peter C. Evans and Marco Annunziata. Industrial in- tomata: Theory and Practice. PhD thesis, Delft Uni-
ternet: Pushing the boundaries of minds and machines. versity of Technology, 2010.
Technical report, GE, 2012. [18] Oliver Niggemann, Benno Stein, Asmir Vodenčarević,
[4] Promotorengruppe Kommunikation. Im fokus: Das Alexander Maier, and Hans Kleine Büning. Learning
industrieprojekt industrie 4.0, handlungsempfehlun- behavior models for hybrid timed systems. In Twenty-
gen zur umsetzung. Forschungsunion Wirtschaft- Sixth Conference on Artificial Intelligence (AAAI-12),
Wissenschaft, March 2013. pages 1083–1090, Toronto, Ontario, Canada, 2012.
[5] L. Christiansen, A. Fay, B. Opgenoorth, and J. Neidig. [19] Bjoern Kroll, David Schaffranek, Sebastian Schriegel,
Improved diagnosis by combining structural and pro- and Oliver Niggemann. System modeling based on
cess knowledge. In Emerging Technologies Factory machine learning for anomaly detection and predic-
Automation (ETFA), 2011 IEEE 16th Conference on, tive maintenance in industrial plants. In 19th IEEE In-
Sept 2011. ternational Conference on Emerging Technologies and
[6] Rolf Isermann. Model-based fault detection and diag- Factory Automation (ETFA), Sep 2014.
nosis - status and applications. In 16th IFAC Sympo- [20] D.L.C. Mack, G. Biswas, X. Koutsoukos, and D. My-
sium on Automatic Control in Aerospace, St. Peters- laraswamy. Learning bayesian network structures to
bug, Russia, 2004. augment aircraft diagnostic reference model, “to ap-
[7] Johan de Kleer, Bill Janssen, Daniel G. Bobrow, Tolga pear”. IEEE Transactions on Automation Science and
Kurtoglu Bhaskar Saha, Nicholas R. Moore, and Sar- Engineering, 17:447–474, 2015.
avan Sutharshana. Fault augmented modelica models.
191
Proceedings of the 26th International Workshop on Principles of Diagnosis
[21] Daniel LC Mack. Anomaly Detection from Complex
Temporal Spaces in Large Data. PhD thesis, Vander-
bilt University, Nashville, TN. USA, 2013.
[22] Stephen C Johnson. Hierarchical clustering schemes.
Psychometrika, 32(3):241–254, 1967.
[23] Jiang Jin. Fault tolerant control systems - an intro-
ductory overview. Acta Automatica Sinica, 31(1):161–
174, 2005.
[24] M. Blanke, M. Kinnaert, J. Lunze, and
M. Staroswiecki. Diagnosis and fault-tolerant
control. Springer-Verlag, Sep 2003.
[25] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla,
and S. S. Sastry. Foundations of control and estimation
over lossy networks. In In Proceedings of the IEEE,
volume 95, pages 163 – 187, Jan 2007.
[26] B. Schneier. Security monitoring: Network security
for the 21st century. In Computers Security, 2001.
[27] Zhong-Sheng Hou and Zhuo Wang. From model-
based control to data-driven control: Survey, classi-
fication and perspective. Information Sciences, 235:3–
35, 2013.
[28] Hongm Wang, Tian-You Chai, Jin-Liang Ding, and
Martin Brown. Data driven fault diagnosis and fault
tolerant control: some advances and possible new
directions. Acta Automatica Sinica, 25(6):739–747,
2009.
[29] Z. S. Hou and J. X. Xu. On data-driven control theory:
the state of the art and perspective. Acta Automatica
Sinica, 35:650–667, 2009.
192