=Paper= {{Paper |id=Vol-1507/dx15paper24 |storemode=property |title=Data-Driven Monitoring of Cyber-Physical Systems Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control |pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper24.pdf |volume=Vol-1507 |dblpUrl=https://dblp.org/rec/conf/safeprocess/NiggemannBKKVB15 }} ==Data-Driven Monitoring of Cyber-Physical Systems Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control== https://ceur-ws.org/Vol-1507/dx15paper24.pdf
                          Proceedings of the 26th International Workshop on Principles of Diagnosis




               Data-Driven Monitoring of Cyber-Physical Systems
   Leveraging on Big Data and the Internet-of-Things for Diagnosis and Control
          Oliver Niggemann1,3 , Gautam Biswas2 , John S. Kinnebrew2 , Hamed Khorasgani2 ,
                                 Sören Volgmann1 and Andreas Bunte3
                1
                  Fraunhofer Application Center Industrial Automation, Lemgo, Germany
                  e-mail: {oliver.niggemann, soeren.volgmann}@iosb-ina.fraunhofer.de
        2
          Vanderbilt University and Institute for Software Integrated Systems, Nashville, TN, USA
            e-mail: {john.s.kinnebrew, hamed.g.khorasgani, gautam.biswas}@vanderbilt.edu
                                 3
                                   Institute Industrial IT, Lemgo, Germany
                                    e-mail: {andreas.bunte}@hs-owl.de
                          Abstract                                     modeled. However, the last 20 years have clearly shown that
                                                                       such models are rarely available for complex CPSs; when
     The majority of projects dealing with monitoring                  they do exist, they are often incomplete and sometimes in-
     and diagnosis of Cyber Physical Systems (CPSs)                    accurate, and it is hard to maintain the effectiveness of these
     relies on models created by human experts. But                    models during a system’s life-cycle.
     these models are rarely available, are hard to ver-                  A promising alternative is the use of data-driven ap-
     ify and to maintain and are often incomplete.                     proaches, where monitoring and diagnosis knowledge can
     Data-driven approaches are a promising alterna-                   be learned by observing and analyzing system behavior.
     tive: They leverage on the large amount of data                   Such approaches have only recently become possible: CPSs
     which is collected nowadays in CPSs, this data is                 now collect and communicate large amounts of data (see Big
     then used to learn the necessary models automati-                 Data [9]) via standardized interfaces, giving rise to what is
     cally. For this, several challenges have to be tack-              now called the Internet of Things [10]. This large amount
     led, such as real-time data acquisition and storage               of data can be exploited for the purpose of detecting and an-
     solutions, data analysis and machine learning al-                 alyzing anomalous situations and faults in these large sys-
     gorithms, task specific human-machine-interfaces                  tems: The vision is developing CPSs that can observe their
     (HMI) and feedback/control mechanisms. In this                    own behavior, recognize unusual situations during opera-
     paper, we propose a cognitive reference architec-                 tions, inform experts, who can then update operations proce-
     ture which addresses these challenges. This ref-                  dures, and also inform operators, who use this information
     erence architecture should both ease the reuse of                 to modify operations or plan for repair and maintenance.
     algorithms and support scientific discussions by                     In this paper, we take on the challenges of proposing
     providing a comparison schema. Use cases from                     a common data-driven framework to support monitoring,
     different industries are outlined and support the                 anomaly detection, prognosis (degradation modeling), diag-
     correctness of the architecture.                                  nosis, and control. We discuss the challenges for developing
                                                                       such a framework, and then discuss case studies that demon-
1 Motivation                                                           strate some initial steps toward data-driven CPSs.
The increasing complexity and the distributed nature of
technical systems (e.g. power generation plants, manufac-
                                                                       2 Challenges
turing processes, aircraft and automobiles) have provided              In order to implement data-driven solutions for the moni-
traction for important research agendas, such as Cyber Phys-           toring, diagnosis, and control of CPSs, a variety of chal-
ical Systems (CPSs) [1; 2], the US initiative on the “Indus-           lenges must be overcome to enable the learning pathways
trial Internet” [3] and its German counterpart “Industrie 4.0”         illustrated in Figure 1:
[4]. In these agendas, a major focus is on self-monitoring,            Data Acquisition: All data collected from distributed
self-diagnosis and adaptivity to maintain both operability             CPSs, e.g. sensors, actuators, software logs, and business
and safety, while also taking into account humans-in-the-              data, must meet real-time requirements, as well as includ-
loop for system operation and decision making. Typical                 ing time synchronization and spatial labeling when relevant.
goals of such self-diagnosis approaches are the detection              Often sensors and actuators operate at different rates, so data
and isolation of faults and anomalies, identifying and an-             alignment, especially for high-velocity data, becomes an is-
alyzing the effects of degradation and wear, providing fault-          sue. Furthermore, data must be annotated semantically to
adaptive control, and optimizing energy consumption [5;                allow for a later data analysis.
6].                                                                    Data Storage, Curation, and Preprocessing: Data will be
   So far, the majority of projects and papers for analy-              stored and preprocessed in a distributed way. Environmen-
sis and diagnosis has relied on manually-created diagno-               tal factors and the actual system configuration (e.g., for the
sis models of the system’s physics and operations [6; 7;               current product in a production system) must also be stored.
8]: If a drive is used, this drive is modeled, if a reactor is in-     Depending on the applications, a relational database format,
stalled, the associated chemical and physical processes are            or increasingly distributed noSQL technologies [11], may




                                                                 185
                                       Proceedings of the 26th International Workshop on Principles of Diagnosis


       Cyber Physical System                                                                                                                                                Task-specific
                                                                                                                                                                        Human-Machine-Interface
                                                         Distributed                    Abstracted
                                                                                        Diagnosis                                                    Condition Monitoring
                                                        Data Storage
                                          Data                                          Knowledge
                                          Acquisition                  Machine
                                                                                                                     Usage and Editing                                  Diagnosis
                                                                                                                                                                            OK Cancel
                                                                       Learning
                                                          ……                                                         of Knowledge

   Controller             Controller
                                                          …...                                                                                                             Energy
                                                                                                                                                                              OK Analysis
                                                                                                                                                                                   Cancel
                                                          …....
                Network                                                                                                                                                                      OK          Cancel



                                                Feedback mechanisms
                                                and control




                                                        Figure 1: Challenges for the analysis of CPSs.

need to be adopted, so that the right subsets of data may be                             raising the overall efforts, preventing any reuse of hard-
retrieved for different analyses. Real-world data can also be                            ware/software and impeding a comparison between solu-
noisy, partially corrupted, and have missing values. All of                              tions.
these need to be accommodated in the curation, storage, and                                 To achieve better standardization, efficiency, and repeata-
pre-processing applications.                                                             bility, we suggest a generic cognitive reference architecture
Data Analysis and Machine Learning: Data must be ana-                                    for the analysis of CPSs. Please note that this architecture is
lyzed to derive patterns and abstract the data into condensed                            a pure reference architecture which does not constraint later
usable knowledge. For example, machine learning algo-                                    implementations and introduction of application-specific
rithms can generate models of normal system behavior in                                  methods.
order to detect anomalous patterns in the data [12]. Other                                  Figure 2 shows its main components:
algorithms can be employed to identify root causes of ob-
served problems or anomalies. The choice and design of
appropriate analyses and algorithms must consider factors                                                                                                                                                      User
                                                                                                                 Task-Specific HMI
like the ability to handle large volumes and sometimes high                                                                                                                                          Task-Specific HMI
velocities of heterogeneous data. At a minimum, this gener-                                                    Conceptual Interface
                                                                                                                                                                                                   I/F 4                    I/F 5
ally requires machine learning, data mining, and other anal-


                                                                                                                                                     System Synthesis
                                                                                                               Data
                                                                                          System Analysis




                                                                                                                                            System
ysis algorithms that can be executed in parallel, e.g., using                                               Abstraction
                                                                                                             and ML
                                                                                                                                            Repair                                                    Conceptual Layer

the Spark [13], Hadoop [14], and MapReduce [15] architec-




                                                                                                                                                                                                                                      System Synthesis
                                                                                                                                                                                                   I/F 3                    I/F 6




                                                                                                                                                                           System Analysis
                                                                                                            Real-time Big Data Platform
tures. In some cases, this may be essential to meet real-time                                                                                                                                Learning                    Adaptation
analysis requirements.                                                                                              Cyber Physical System


                                                                                                                                                                                                    I/F 2                   I/F 7
Task-specific Human-Machine-Interfaces: Tasks such as
condition monitoring, energy management, predictive main-                                                       Controller             Controller
                                                                                                                                                                                                Big Data Platform

tenance or diagnosis require specific user interfaces [16].                                                                  Network
                                                                                                                                                                                                    I/F 1
One set of interfaces may be more tailored for offline analy-                                                                                                                                        Cyber Physical System


sis to allow experts to interact with the system. For example,
experts may employ information from data mining and ana-
lytics to derive new knowledge that is beneficial to the future                                                                                                                                  Controller

                                                                                                                                                                                                              Network
                                                                                                                                                                                                                        Controller




operations of the system. Another set of interfaces would be
appropriate for system operators and maintenance person-
nel. For example, appropriate operator interfaces would be                               Figure 2: A cognitive architecture as a solution for the anal-
tailored to provide analysis results in interpretable and ac-                            ysis of CPSs.
tionable forms, so that the operators can use them to drive
decisions when managing a current mission or task, as well                               Big Data Platform (I/F 1 & 2): This layer receives all rel-
as to determine future maintenance and repair.                                           evant system data, e.g., configuration information as well
Feedback Mechanisms and Control: As a reaction to rec-                                   as raw data from sensors and actuators. This is done by
ognized patterns in the data or to identified problems, the                              means of domain-dependent, often proprietary interfaces,
user may initiate actions such as a reconfiguration of the                               here called interface 1 (I/F 1). This layer then integrates,
plant or an interruption of the production for the purpose of                            often in real-time, all of the data, time-synchronizes them
maintenance. In some cases, the system may react without                                 and annotates them with meta-data that will support later
user interactions; in this case, the user is only informed.                              analysis and interpretation. For example, sensor meta-data
                                                                                         may consist of the sensor type, its position in the system and
3 Solutions                                                                              its precision. This data is provided via I/F 2, which, there-
                                                                                         fore, must comprise the data itself and also the meta-data
As Section 4 will show, the challenges from Section 2 reap-                              (i.e., the semantics). A possible implementation approach
pear in the majority of CPS examples. While details, such                                for I/F 2 may be the mapping into and use of existing of Big
as the machine learning algorithms employed or the nature                                Data platforms, such as Sparks or Hadoop, for storing the
of data and data storage formats can vary, the primary steps                             data and the Data Distribution Service (DDS) for acquiring
are about the same. Most CPS solutions re-implement all of                               the data (and meta-data).
these steps and even employ different solution strategies—                               Learning Algorithms (I/F 2 & 3): This layer receives all




                                                                                  186
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


data via I/F 2. Since I/F 2 also comprises meta-data, the ma-      4 Case Studies
chine learning and diagnosis algorithms need not be imple-         We present a set of case studies that cover the manufacturing
mented specifically for a domain but may adapt themselves          and process industries, as well as complex CPS systems,
to the data provided. In this layer, unusual patterns in the       such as aircraft.
data (used for anomaly detection), degradation effects (used
for condition monitoring) and system predictions (used for         4.1 Manufacturing Industry
predictive maintenance) are computed and provided via I/F          The modeling and learning of discrete timing behavior for
3. Given the rapid changes in data analysis needs and capa-        manufacturing industry (e.g., automative industry) is a new
bilities, this layer may be a toolbox of algorithms where new      field of research. Due to the intuitive interpretation, Timed
algorithms can be added by means of plug-and-play mecha-           Automata are well-suited to model the timing behavior of
nisms. I/F 3 might again be implemented using DDS.                 these systems. Several algorithms have been introduced to
Conceptual Layer (I/F 3 & 4): The information provided             learn such Timed Automata, e.g. RTI+ [17] and BUTLA
by I/F 3 must be interpreted according to the current task         [18]. Please note that the expert still has to provide struc-
at hand, e.g. computing the health state of the system.
                                                                   tural information about the system (e.g. asynchronous sub-
Therefore, the provided information about unusual patterns,        systems) and that only the temporal behavior is learned.
degradation effects and predictions are combined with do-
main knowledge to identify faults, their causes and rate them
according to the urgency of repair. A semantic notation will                                  Aspirator on
                                                                                               [25…2500]
                                                                                                                 Muscle on
                                                                                                                  [8…34]           Silo empty            Muscle off
be added to the information, e.g. the time for next main-                                0                   1                2
                                                                                                                                     [8…34]
                                                                                                                                                     3
                                                                                                                                                          [8…34]

tenance or a repair instruction, which will be provided at                                                       Muscle off
                                                                                                                                                                      4
I/F 4 in a human understandable manner. From a computer                                                           [7…35]             Aspirator off
                                                                                                                                     [2200…2500]
science perspective, this layer provides reasoning capabili-
ties on a symbolic or conceptual level and adds a semantic
context to the results.
Task-Specific HMI (I/F 4 & 5): The user is in the center
of the architecture presented here, and, therefore, requires
task-, context- and role-specific Human-Machine-Interfaces
(HMIs). This HMI uses I/F 4 to get all needed analysis
results and presents them to the user. Adaptive interfaces,
rather than always showing the results of the same set of
                                                                                                                 Silo empty       Conveyor off
                                                                                                                 [8…3400]           [8…25]
                                                                                                             1                2                  3
analyses, could allow a wider range of information to be                                                                    Silo full
provided, while maintaining efficiency and preventing in-                                 0                              [1000…34000]
                                                                                                                                                                      4
formation overload. Beyond obvious dynamic capabilities
like alerts for detected problems or anomalies, the interfaces
could further adapt the information displayed to be more            Figure 3: Learned Timed Automata for a manufacturing plant.
relevant to the current user context (e.g. the user’s loca-
tion within a production plant, recognition of tasks the user         The data acquisition for this solution (I/F 1 in Figure 2)
may be engaged in, observed patterns of the user’s previous        has been implemented using a direct capturing of Profinet
information-seeking behavior, and knowledge of the user’s          signals including an IEEE 1588 time-synchronization. The
technical background). If the user decides to influence the        data is offered via OPC UA (I/F 2). On the learning layer,
system (e.g. shutdown of a subsystem or adaptation of the          timed automata are learned from historical data and com-
system behavior), I/F 5 is used to communicate this deci-          pared to the observed behavior. Also, the sequential behav-
sion to the conceptual layer. Again, I/F 4 and I/F 5 might be      ior of the observed events as well as the timing behavior
implemented using DDS.                                             is checked, anomalies are signaled via I/F 3. On the con-
Conceptual Layer (I/F 5 & 6): The user decisions will be           ceptual layer it is decided whether an anomaly is relevant.
received via I/F 5. The conceptual layer will use the knowl-       Finally, a graphical user interface is connected to the con-
edge to identify actions which are needed to carry out the         ceptual layer via OPC UA (I/F 4).
users’ decisions. For example, a decision to decrease the             Figure 3 shows learned automata for a manufacturing
machine’s cycle time by 10 % could lead to actions such as         plant: The models correspond to modules of the plants, tran-
decreasing the robot speed by 10 % and the conveyor speed          sitions are triggered by a control signals and are annotated
by 5 % or the decision to shutdown a subsystem. These ac-          with a learned timing interval.
tions are communicated via I/F 6 to the adaption layer.
Adaption (I/F 6 & 7): This layer receives system adaption          4.2 Energy Analysis In Process Industry
commands on the conceptual level via I/F 6—which again             Analyzing the energy consumption in production plants has
might be based on DDS. Examples are the decrease of robot          some special challenges: Unlike the discrete systems de-
speed by 10 % or a shutdown of a subsystem. The adap-              scribed in Section 4.1, also continuous signals such as the
tion layer takes these commands on the conceptual level            energy consumption must be learned and analyzed. But also
and computes, in real-time, the corresponding changes to           the discrete signals must be taken into consideration because
the control system. For example, a subsystem shutdown              continuous signals can only be interpreted with respect to
might require a specific network signal or a machine’s tim-        the current system’s status, e.g. it is crucial to know whether
ing is changed by adapting parameters of the control algo-         a valve is open or whether a robot is turned on. And the
rithms, again by means of network signals. I/F 7 therefore         system’s status is usually defined by the history of discrete
uses domain-dependent interfaces.                                  control signals.




                                                             187
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


                                                                   the production cycles. In Figure 6 the architecture of the big
                                                                   data platform is depicted.

                                                                      Cyber-Physical System      Hadoop Ecosystem     Grafana Webvisualisation

                                                                                                 Hadoop Distributed
                                                                                                 Filesystem (HDFS)
                                                                                                    OpenTSDB


                                                                      Controller    Controller         HBase

                                                                              Network




                                                                       Figure 6: Data Analysis Plattform in Manufacturing

                                                                      The CPS is connected through OPC UA (I/F 1 in Figure 2)
                                                                   with an Hadoop ecosystem. Hadoop itself is an software
                                                                   framework for scalable distributed computing. The process
                                               llll
                                               2                   data is stored in an non-relational database (HBase) which is
                                                                   based on a distributed file-system (HDFS). On top of HBase,
                                                                   a time-series database OpenT SDB is used as an interface
                                                                   to explore and analyze the data (I/F 2 in Figure 2). Through
   Figure 4: A learned hybrid automaton modeling a pump.           this database it is possible to do simple statistics such as
                                                                   mean-values, sums or differences, which is usually not pos-
                                                                   sible within the non relational data stores.
   In [19], an energy anomaly detection system is de-
                                                                      Using the interfaces of OpenTSDB or Hadoop, it be-
scribed which analyzes three production plants. Ethercat
                                                                   comes possible to analyze the data directly on the storage
and Profinet is used for I/F 1 and OPC UA for I/F 2. The col-
                                                                   system. Hence, the volume of a historical dataset need not
lected data is then condensed on the learning layer into hy-
                                                                   be loaded into a single computer system, instead the algo-
brid timed automata. Also on this layer, the current energy
                                                                   rithms can work distributively on the data. A web interface
consumption is compared to the energy prediction. Anoma-
                                                                   can be used to visualize the data as well as the computed re-
lies in the continuous variables are signaled to the user via
                                                                   sults. In Figure 6, grafana is used for data visualization. In
mobile platforms using web services (I/F 3 and 4).
                                                                   the SmartFactoryOWL this big data platform is currently be-
   In Figure 4, a pump is modeled by means of such au-
                                                                   ing connected to the application scenarios from Sections 4.1
tomata using the flow rate and switching signals. The three
                                                                   and 4.2.
states S0 to S2 are separating the continuous function into
three linear pieces which can then be learned automatically.       4.4 Anomaly Detection in Aircraft Flight Data
   Figure 5 shows a typical learned energy consumption
(here for bulk good production).                                   Fault detection and isolation schemes are designed to detect
                                                                   the onset of adverse events during operations of complex
                                                                   systems, such as aircraft and industrial processes. In other
                                                                   work, we have discussed approaches using machine learn-
                                                                   ing classifier techniques to improve the diagnostic accuracy
                                                                   of the online reasoner on board of the aircraft [20]. In this
                                                                   paper, we discuss an anomaly detection method to find pre-
                                                                   viously undetected faults in aircraft system [21].
                                                                      The flight data used for improving detection of existing
                                                                   faults and discovering new faults was provided by Honey-
                                                                   well Aerospace and recorded from a former regional airline
                                                                   that operated a fleet of 4-engine aircraft, primarily in the
                                                                   Midwest region of the United States. Each plane in the fleet
                                                                   flew approximately 5 flights a day and data from about 37
                                                                   aircraft was collected over a five year period. This produced
Figure 5: A measured (black line) and a learned power consump-     over 60,000 flights. Since the airline was a regional carrier,
tion (red line).                                                   most flight durations were between 30 and 90 minutes. For
                                                                   each flight, 182 features were recorded at sample rates that
                                                                   varied from 1Hz to 16Hz. Overall this produced about 0.7
4.3 Big Data Analysis in Manufacturing Systems                     TB of data.
Analyzing historical process data during the whole produc-            Situations may occur during flight operations, where the
tion cycle requires new architectures and platforms for han-       aircraft operates in previously unknown modes that could be
dling the enormous volume, variety and velocity of the data.       attributed to the equipment, the human operators, or envi-
Data analysis pushes the classical data acquisition and stor-      ronmental conditions (e.g., the weather). In such situations,
age up to its limits, i.e. big data platforms are need.            data-driven anomaly detection methods [12], i.e., finding
   In the assembling line of the SmartFactoryOWL, a small          patterns in the operations data of the system that were not
factory used for production and research, a big data platform      expected before can be applied. Sometimes, anomalies
is established to acquire, store and visualize the data from       may represent truly aberrant, undesirable and faulty behav-
                                                                   ior; however, in other situations they may represent behav-




                                                             188
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


iors that are just unexpected. We have developed unsuper-            4.5 Reliability and Fault Tolerant Control
vised learning or clustering methods for off-line detection          Most complex CPSs are safety-critical systems that operate
of anomalous situations. Once detected and analyzed, rele-           with humans-in-the-loop. In addition to equipment degrada-
vant information is presented to human experts and mission           tion and faults, humans can also introduce erroneous deci-
controllers to interpret and classify the anomalies.                 sions, which becomes a new source of failure in the system.
   Figure 7 illustrates our approach. We started with cu-            Figure 8 represents possible faults and cyber-attacks that can
rated raw flight data (layer ”Big Data Platform” in Figure           occur in a CPS.
2), transforming the time series data associated with the dif-          There are several model-based fault tolerant control
ferent flight parameters to a compressed vector form using           strategies for dynamic systems in the literature (see for ex-
wavelet transforms. The next step included building a dis-           ample [23] and [24]). Research has also been conducted to
similarity matrix of pairwise flight segments using the Eu-          address network security and robust network control prob-
clidean distance measure, followed by a subsequent step              lems (see for example [25] and [26]). However, these meth-
where the pairwise between flight distances was used to              ods need mathematical models of the system, which may
run a ‘complete link’ hierarchical clustering algorithm [22]         not exist for large scale complex systems. Therefore, data
(layer ”Learning” in Figure 2). Run on the flight data, the          driven control [27] and data driven fault tolerant control [28]
algorithm produced a number of large clusters that we con-           have become an important research topic in recent years.
sidered to represent nominal flights, and a number of smaller        For CPSs, there are more aspects of the problem that need
clusters and outlier flights that we initially labeled as anoma-     to be considered. As it is shown in Figure 8, there are many
lous. By studying the feature value differences between the          sources of failure in these systems.
larger nominal and smaller anomalous clusters with the help             We propose a hybrid approach that uses an abstract model
of domain experts, we were able to interpret and explain the         of the complex system and utilizes the data to ensure the
anomalous nature (”Conceptual Layer” in Figure 2).                   compatibility between model and the complex system. Data
   These anomalies or faults represented situations that the         abstraction and machine learning techniques are employed
experts had not considered before; therefore, this unsuper-          to extract patterns between different control configurations
vised or semi-supervised data driven approach provided a             and system outputs unit by computing the correlation be-
mechanism for learning new knowledge about unanticipated             tween control signals and the physical subsystems outputs.
system behaviors. For example, when analyzing the aircraft           The highly correlated subsystems (layer ”Learning” in Fig-
data, we found a number of anomalous clusters. One of                ure 2) become candidates for further study of the effects of
them turned out to be situations where one of the four en-           failure and degradation at the boundary of these interacting
gines of the aircraft was inoperative. On further study of ad-       subsystems. For complex systems, all possible inteeractions
ditional features, the experts concluded that these were test        and their consequences are hard to pre-determine, and data-
flights conducted to test aspects of the aircraft, and, there-       driven approaches help fill this gap in knowledge to support
fore, they repesented known situations, and, therefore, not          more informed decision-making and control. A case-based
an interesting anomaly. A second group of flights were in-           reasoning module can be designed to provide input on past
terpreted to be take offs, where the engine power was set            successes and failed opportunities, which can then be trans-
much higher than most flights in the same take off condition.        lated by human experts into operational monitoring, fault di-
Further analysis of environmental features related to these          agnosis, and control situations (’Conceptual Layer” in Fig-
set of take-off’s revealed that these were take-offs from a          ure 2). Some of the control paradigms that govern appro-
high altitude airport at 7900 feet above sea level.                  priate control configurations, such as modifying sequence
   A third cluster provided a more interesting situation. The        of mission tasks and switching between different objectives
experts when checking on the features that had significantly         or changing the controller parameters (layer Adaptation in
different values from the nominal flights realized that the          Figure 2) are being studied in a number of labs including
auto throttle disengaged in the middle of the aircraft’s climb       ours [29].
trajectory. The automatic throttle is designed to maintain              Example Fault Tolerant Control of Fuel Transfer Sys-
either constant speed during takeoff or constant thrust for          tem The fuel system supplies fuel to the aircraft engines.
other modes of flight. This was an unusual situation where           Each individual mission will have its own set of require-
the auto thruster switched from maintaining speed for a              ments. However, common requirements such as saving the
takeoff to a setting that applied constant thrust, implying          aircraft Center of Gravity (CG), safety, and system relia-
that the aircraft was on the verge of a stall. This situation        bility are always critical. A set of sensors included in the
was verified by the flight path acceleration sensor shown in         system to measure different system variables such as the
Figure 7. By further analysis, the experts determined that in        fuel quantity contained in each tank, engines fuel flow rates,
such situations the automatic throttle would switch to a pos-        boost pump pressures, position of the valves and etc.
sibly lower thrust setting to level the aircraft and compensate         There are several failure modes such as the total loss or
for the loss in velocity. By examining the engine parame-            degradation in the electrical pumps or a leakage in the tanks
ters, the expert verified that all the engines responded in an       or connecting pipes in the system. Using the data and the ab-
appropriate fashion to this throttle command. Whereas this           stract model we can detect and isolate the fault and estimate
analysis did not lead to a definitive conclusion other than the      its parameters. Then based on the type fault and its severity
fact the auto throttle, and therefore, the aircraft equipment,       the system reconfiguration unit chooses the proper control
responded correctly, the expert determined that further anal-        scenario form the control library. For example in normal sit-
ysis was required to answer the question “why did the air-           uation the transfer pumps and valves are controlled to main-
craft accelerate in such a fashion and come so close to a            tain a transfer sequence to keep the aircraft center of gravity
stall condition?”. One initial hypothesis to explain these           within limits. This control includes maintaining a balance
situations was pilot error.                                          between the left and right sides of the aircraft. When there




                                                               189
                                    Proceedings of the 26th International Workshop on Principles of Diagnosis


                        Raw Flight Data

                                                                                                                                     Hierarchical




                                                                Wavelet Transform
                                                                                                                                     Clustering
                                                                                          Dii   Dij        …    Din


                                                                                                   Flight
                                                                                                Dissimilarity
                                                                                                   Matrix




                        Anomalous
                                  max(dAN)

                             Current
                             Flight



                                         Figure 7: Data Driven Anomaly Detection Approach for Aircraft Flights


                     System Reconfiguration
                                                                                                                      adapted frequently.
                                                                                                                         In Sections 4.1 and 4.2, structural information about the
                                                                                                                      plant is imported from the engineering chain and the tempo-
                   Cyber-attack
                                                                       System and
                                                                                                                      ral behavior is learned in form of timed automata. In Section
  Human error
                                                                       actuator faults                                4.5, an abstract system model describing the input/output
                                              Actuator faults
       Operator
                           Controller
                          Parameters
                                                                                            Sensor fault              structure and the main failure types is provided and again the
                                                                               Physical
                                                                                                                      behavior is learned. These approaches are typical because in
                           Controller           Actuators
                                                                               System
                                                                                                Sensors
                                                                                                                      most applications structural information can be gained from
                           Controller                                                                                 earlier engineer phases while behavior models hardly exist
                            Library
                                                                                                                      and are almost never validated with the real system.
                                            Communication network                                                        Looking at the learning phase, the first thing to notice
                                                                                                                      is that all described approaches work and deliver good re-
                                        Communication error and noise                                                 sults: For CPSs, data-driven approaches have moved into
                                                                                                                      the focus of research and industry. And they are well suited
                                  Cyber Physical System                                                               for CPSs: They adjust automatically to new system config-
                                                                                                                      urations, they do not need manual engineering efforts and
                  Figure 8: Possible faults in a CPS.                                                                 they make usage of the now available large number of data
                                                                                                                      signals—connectivity being a typical feature of CPSs.
                                                                                                                         Another common denominator of the described appli-
is a small leak, normally the system can tolerate it depend-                                                          cation examples is that the focus is on anomaly detec-
ing on where the leak is, but the leak usually grows over                                                             tion rather than on root cause analysis: for data-driven ap-
time. Therefore we need to estimate the leakage rate and re-                                                          proaches it is easier to learn a model of the normal behav-
configure the system to move the fuel from the tank or close                                                          ior than learning erroneous behavior. And it is also typi-
the pipe before critical situation.                                                                                   cal that the only root cause analysis uses a case-based ap-
                                                                                                                      proach (Section 4.5), case-based approaches being suitable
5 Conclusions                                                                                                         for data-driven solutions to diagnosis.
Data-driven approaches to the analysis and diagnosis of                                                                  Finally, the examples show that the proposed cognitive
Cyber-Physical Systems (CPSs) are always inferior to clas-                                                            architecture (Figure 2) matches the given examples:
sical model-based approaches, where models are created                                                                Big Data Platform: Only a few examples (e.g. Section 4.3)
manually by experts: Experts have background knowledge                                                                make usage of explicit big data platforms, so-far solutions
which can not be learned from models and experts automat-                                                             often use proprietary solutions. But with the growing size of
ically think about a larger set of system scenarios than can                                                          the data involved, new platforms for storing and processing
be observed during a system’s normal lifetime.                                                                        the data are needed.
   So the question is not whether data-driven or expert-                                                              Learning:        All examples employ machine learning
driven approaches are superior. The question is rather                                                                technologies—with a clear focus on unsupervised learning
which kind of models can we realistically expect to ex-                                                               techniques which require no a-priori knowledge such as
ist in real-world applications—and which kind of models                                                               clustering (Section 4.4) or automata identification (Sections
must therefore be learned automatically. This becomes es-                                                             4.1, 4.2).
pecially important in the context of CPSs since these sys-                                                            Conceptual Layer: In all examples, the learned models are
tems adapt themselves to their environment and show there-                                                            evaluated on a conceptual or symbolic level: In Section 4.4,
fore a changing behavior, i.e. models would also have be                                                              clusters are compared to new observations and data-cluster




                                                                                                               190
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


distances are used for decision making. In Sections 4.1 and              The 24th International Workshop on Principles of Di-
4.2, model predictions are compared to observations. And                 agnosis, pages 71–78, 2013.
again, derivations are decided on by a conceptual layer.            [8] D. Klar, M. Huhn, and J. Gruhser. Symptom propaga-
Task-Specific HMI: None of the given examples works com-                 tion and transformation analysis: A pragmatic model
pletely automatically, in all cases the user is involved in the          for system-level diagnosis of large automation sys-
decision making.                                                         tems. In Emerging Technologies Factory Automation
Adaption: In most cases, reactions to detected problems                  (ETFA), 2011 IEEE 16th Conference on, pages 1–9,
are up to the expert. The use case from Section 4.5 is an                Sept 2011.
example for an automatic reaction and the usage of analysis
results for the control mechanism.                                  [9] GE. The rise of big data - leveraging large time-
                                                                         series data sets to drive innovation, competitiveness
   Using such a cognitive architecture would bring several               and growth - capitalizing on the big data oppurtu-
benefits to the community: First of all, algorithms and                  nity. Technical report, General Electric Intelligent
technologies in the different layers can be changed quickly              Platforms, 2012.
and can be re-used. E.g. learning algorithms from one
application field can be put on top of different big data           [10] A. Katasonov, O. Kaykova, O. Khriyenko, S. Nikitin,
platforms. Furthermore, currently most existing approaches               and V. Terziyan. Smart semantic middleware for the
mix the different layers, making the comparison of ap-                   internet of things. In 5th International Conference
proaches to the analysis of CPSs difficult. Finally, such an             on Informatics in Control, Automation and Robotics
architecture helps to clearly identify open issues for the               (ICINCO), 2008.
development of smart monitoring systems.                            [11] Michael Stonebraker. Sql databases v. nosql databases.
                                                                         Communications of the ACM, 53(4):10–11, 2010.
   Acknowledgments The work was partly supported                    [12] Varun Chandola, Arindam Banerjee, and Vipin Kumar.
by the German Federal Ministry of Education and Re-                      Anomaly detection: A survey. ACM Computing Sur-
search (BMBF) under the project "Semantics4Automation"                   veys (CSUR), 41.3:1–72, Sept 2009.
(funding code: 03FH020I3), under the project "Analyse               [13] M. Zaharia, M. Chowdhury, M. J. Franklin,
großer Datenmengen in Verarbeitungsprozessen (AGATA)"                    S. Shenker, and I Stoica. Spark: cluster comput-
(funding code: 01IS14008 A-F) and by NASA NRA                            ing with working sets. In Proceedings of the 2nd
NNL09AA08B from the Aviation Safety program. We also                     USENIX conference on Hot topics in cloud computing,
acknowledges the contributions of Daniel Mack, Dinkar                    page 10, June 2010.
Mylaraswamy, and Raj Bharadwaj on the aircraft fault di-
agnosis work.                                                       [14] K. Shvachko, H. Kuang, S. Radia, and R. Chansler.
                                                                         The hadoop distributed file system. In Proceedings
References                                                               26th IEEE Symposium on Mass Storage Systems and
                                                                         Technologies (MSST), pages 1–10, May 2010.
[1] E.A. Lee. Cyber physical systems: Design challenges.
    In Object Oriented Real-Time Distributed Computing              [15] M JAYASREE. Data mining: Exploring big data us-
    (ISORC), 2008 11th IEEE International Symposium                      ing hadoop and mapreduce. International Journal of
    on, pages 363–369, 2008.                                             Engineering Sciences Research-IJESR, 4(1), 2013.
[2] Ragunathan (Raj) Rajkumar, Insup Lee, Lui Sha, and              [16] Friedhelm Nachreiner, Peter Nickel, and Inga Meyer.
    John Stankovic. Cyber-physical systems: The next                     Human factors in process control systems: The de-
    computing revolution. In Proceedings of the 47th De-                 sign of human–machine interfaces. Safety Science,
    sign Automation Conference, DAC ’10, pages 731–                      44(1):5–26, 2006.
    736, New York, NY, USA, 2010. ACM.                              [17] Sicco Verwer. Efficient Identification of Timed Au-
[3] Peter C. Evans and Marco Annunziata. Industrial in-                  tomata: Theory and Practice. PhD thesis, Delft Uni-
    ternet: Pushing the boundaries of minds and machines.                versity of Technology, 2010.
    Technical report, GE, 2012.                                     [18] Oliver Niggemann, Benno Stein, Asmir Vodenčarević,
[4] Promotorengruppe Kommunikation. Im fokus: Das                        Alexander Maier, and Hans Kleine Büning. Learning
    industrieprojekt industrie 4.0, handlungsempfehlun-                  behavior models for hybrid timed systems. In Twenty-
    gen zur umsetzung. Forschungsunion Wirtschaft-                       Sixth Conference on Artificial Intelligence (AAAI-12),
    Wissenschaft, March 2013.                                            pages 1083–1090, Toronto, Ontario, Canada, 2012.
[5] L. Christiansen, A. Fay, B. Opgenoorth, and J. Neidig.          [19] Bjoern Kroll, David Schaffranek, Sebastian Schriegel,
    Improved diagnosis by combining structural and pro-                  and Oliver Niggemann. System modeling based on
    cess knowledge. In Emerging Technologies Factory                     machine learning for anomaly detection and predic-
    Automation (ETFA), 2011 IEEE 16th Conference on,                     tive maintenance in industrial plants. In 19th IEEE In-
    Sept 2011.                                                           ternational Conference on Emerging Technologies and
[6] Rolf Isermann. Model-based fault detection and diag-                 Factory Automation (ETFA), Sep 2014.
    nosis - status and applications. In 16th IFAC Sympo-            [20] D.L.C. Mack, G. Biswas, X. Koutsoukos, and D. My-
    sium on Automatic Control in Aerospace, St. Peters-                  laraswamy. Learning bayesian network structures to
    bug, Russia, 2004.                                                   augment aircraft diagnostic reference model, “to ap-
[7] Johan de Kleer, Bill Janssen, Daniel G. Bobrow, Tolga                pear”. IEEE Transactions on Automation Science and
    Kurtoglu Bhaskar Saha, Nicholas R. Moore, and Sar-                   Engineering, 17:447–474, 2015.
    avan Sutharshana. Fault augmented modelica models.




                                                              191
                       Proceedings of the 26th International Workshop on Principles of Diagnosis


[21] Daniel LC Mack. Anomaly Detection from Complex
     Temporal Spaces in Large Data. PhD thesis, Vander-
     bilt University, Nashville, TN. USA, 2013.
[22] Stephen C Johnson. Hierarchical clustering schemes.
     Psychometrika, 32(3):241–254, 1967.
[23] Jiang Jin. Fault tolerant control systems - an intro-
     ductory overview. Acta Automatica Sinica, 31(1):161–
     174, 2005.
[24] M. Blanke, M. Kinnaert, J. Lunze, and
     M. Staroswiecki.        Diagnosis and fault-tolerant
     control. Springer-Verlag, Sep 2003.
[25] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla,
     and S. S. Sastry. Foundations of control and estimation
     over lossy networks. In In Proceedings of the IEEE,
     volume 95, pages 163 – 187, Jan 2007.
[26] B. Schneier. Security monitoring: Network security
     for the 21st century. In Computers Security, 2001.
[27] Zhong-Sheng Hou and Zhuo Wang. From model-
     based control to data-driven control: Survey, classi-
     fication and perspective. Information Sciences, 235:3–
     35, 2013.
[28] Hongm Wang, Tian-You Chai, Jin-Liang Ding, and
     Martin Brown. Data driven fault diagnosis and fault
     tolerant control: some advances and possible new
     directions. Acta Automatica Sinica, 25(6):739–747,
     2009.
[29] Z. S. Hou and J. X. Xu. On data-driven control theory:
     the state of the art and perspective. Acta Automatica
     Sinica, 35:650–667, 2009.




                                                           192