-

Optimizing Con guration Data using Prescriptive Analytics?

Alexander Wurl

alexander.wurl@siemens.com 0 0 Siemens AG O

158 162

Suboptimal or erroneous con guration data in rail automation systems may cause serious safety and cost issues. This research work aims to address continuous optimization of such data by (i) connecting con guration and operation data by integration of heterogeneous data sources, and by (ii) application of prescriptive analytics methods to propose decision options on how to correct and optimize con guration data.

Con guration Process Data Integration Asset Management Prescriptive Analytics

In industrial and infrastructural systems, like rail automation, product con guration is the activity of engineering and customizing a product to meet the needs of a particular customer. The product in question may consist of mechanical parts, services, and software - each with various parameters and properties that re ect the variability. The result of a con guration process are the con guration data, i.e., a digital model of the system specifying all details for installation, operation and maintenance of the facility. In the case of a rail automation system, con guration data contain, e.g., the bill of material, detailed con guration plans of all hardware parts, the station and track topology, the screen layout of the operator terminals, parametrization of the control software, etc.

While con guration data is speci ed at engineering time, operation data is continuously generated by trains and the interlocking and control systems at operation time. The amount of operation data generated each day is huge, because it contains all positions and speeds of all trains, as well as logs of all telegrams exchanged by the di erent subsystems.

The combination of these two data models - con guration data and operation data - allows for a feedback loop which has the potential to detect hardware defects or errors in the con guration data. Anomalies and unexpected behavior in operation data can be detected by statistical methods like principal components analysis or discriminant analysis. Error causes can only be detected by locating the corresponding con guration objects in the con guration data. This promising setting enables to build a prescriptive analytics framework [ 1 ] including various statistical analysis methods to explain the observed behavior and to propose decision options on how to modify the con guration data.

Another interesting application enabled by the availability of con guration and operation models is predictive asset management, where prediction models for the obsolescence of the various hardware parts are computed from di erent heterogeneous data sources. Con guration and operation data are complemented here by sales and order forecast models and contextual data, like weather data.

Based on these challenges, the following research questions will be tackled in the proposed dissertation thesis: { Which data integration processes are suitable for preparing con guration and operation data for prescriptive analytics applications? { Which sequences of statistical methods are appropriate for predictive asset management and anomaly detection in con guration and operation data? { How can new or modi ed con guration rules/constraints be derived by prescriptive analytics methods in order to optimize con guration data? 2

Background

Beyond the application of mere statistical analysis methods, data analytics methods require a federated architecture of descriptive, predictive, and prescriptive analytics in combination with data models and a data warehouse [ 2 ]. To achieve reasonable results in analytics, ensuring data quality in the process of data integration is an inevitable prerequisite [ 3 ]. Bridging the gap of heterogeneous data sets, we aim at de ning a data schema for both operation and con guration data which can be realized by providing a data model that accepts all properties of heterogeneous data sets. Since similar data implies various representations, the interchange of data between operations and con guration models are important tasks, i.e. the data scheme strongly relates to the resulting data quality. Despite of a lot of important e orts, model interoperability is still a challenging task, leading most often to hand-crafted bilateral integration solutions [ 4 ], suffering from high maintenance overheads, technology dependence, and scalability problems. Therefore, previous results regarding data integration in schema-based approach [ 5 ] shall be extended within the course of the proposed dissertation.

Data analytics in rail automation gains more and more interest [ 6 ]. Applying data analytics on con guration data, we intend to use techniques from data mining, machine learning, and anomaly detection. These techniques enable to examine large data sets to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions. In more detail, considering data from various integrated data sets methods from multivariate analysis [ 10 ] serve as basis for our data analysis, i.e., statistical models capture relationships among many factors to allow the assessment of information which has signi cant impacts in trend predictions.

Following the statistical results of descriptive and predictive analytics, we believe in applying prescriptive analytics methods to propose optimal decision options for optimizations in product con guration. Basically, con guration can be de ned as a "special case of design activity, where the artefact being con gured is assembled from instances of a xed set of well-de ned component types which can be composed conforming to a set of constraints" [ 11 ]. As prescriptive analytics has evolved as a new research eld in industry which focuses on describing the courses of actions and shows the in uence of each action [12{14], there is a great potential to apply this concept for constraints that represent technical restrictions, restrictions related to economic aspects, and conditions related to production processes. Analytics methods are able to capture all related con guration data which contribute to a wider analysis of restrictions and can therefore show potential optimization options. 3

Signi cance

Data analytics is a highly active research eld which has been driven mainly in business applications in the last decade. Recently, more and more industrial applications are adapting these methods, e.g., for sensor analysis for predictive maintenance. We explore the links between data analytics technologies and product con guration. The integration of con guration data in the process of analyzing operation data enables an easy localization and therefore a better understanding of the source of an anomaly.

In this work we contribute to solutions for two crucial problems in the domain of rail automation at the company Siemens AG O sterreich: (i) predictive asset management and (ii) prescriptive analytics for correcting/optimizing railway engineering data. In the rst scenario, forecasts of the form "How many assets of type A will be needed within the next N years?" are computed. Reliable forecasts are very important for guaranteeing the availability of all necessary modules in the future and allow for a solid version and lifetime management of module variants. The second scenario - the continuous correction and optimization of con guration data - is an important building block of guaranteeing safe and high-performance train operation.

The framework developed within the proposed dissertation goes beyond the rail automation use case. The methods developed are of general nature and may be adapted and applied to other industrial elds such as industry automation, power plants, or energy management. This is novel and highly promising, especially in the context of Smart Production and Industry 4.0.

Research design and methods

According to the research questions, we design a framework of methods with the following contributions. 1. Heterogeneous Data Model Integration. We need to integrate various data sources of di erent formats, like Excel and XML. As di erent business units use di erent tools and formats to maintain data, integration of data is challenging and prone to errors. Existing data quality methods fall short of a generalized approach that covers such a variety of data types in the domain of rail automation. Our contribution extends the notion of signi ers for a robust and at the same time typo-tolerant identi cation of objects of di erent sources [ 15 ]. 2. Multivariate Data Analysis. Multivariate statistics are eminently suitable for anomaly detection and prediction trends [ 9 ]. Therefore, we develop techniques to extract statistical information and anomalies from the operation data. E.g., are there any hardware models or interfaces which frequently reboot? Have trains of di erent vendors di erent driving behaviour? These analyses will also integrate con guration and contextual data. 3. Feedback from Operation to Con guration Data. Con guration models represent all the di erent HW and SW element types along with their structure and constraints to build a system (e.g. a rail automation system). By following the statistical results of prediction, we believe that new rules or constraints can be learned by using, e.g., classi cation and regression tree methods to improve the con guration models. For example, certain types of modules may cause overheating if located next to each other in the hardware rack. External, contextual data, usually available as linked open data, may also be integrated to derive additional rules (e.g. heat sensibility of a module derived from module shutdowns in combination with meteorological data). 5

Research stage

The project related to this research has been started in April 2016. The work follows the contributions described in Section 4 assuming that they build on each other.

The rst stage, Heterogeneous Data Model Integration, is nished. The result of the this contribution is an approach \Using Signi ers for Data Integration in Rail Automation" which was presented at the 6th International Conference on Data Science, Technology and Applications [ 15 ]. This approach enables a semi-automatic process for data import, where the user resolves ambiguous data classi cations. We introduced a technique using a signi er, which is a natural extension of composite primary keys to nd the correct data warehouse classi cation of source values in a proprietary, often semi-structured format. This approach is already in use and results show a signi cant improvement of data quality.

The di erent data analytics tasks for predictive asset management and anomaly detection in operation data are de ned and documented in a user requirements speci cation. Next, we will study the applicability of di erent multivariate methods to our analytics tasks. The selection and application of statistical methods is a highly sensitive task since the results serve as basis for further prescriptive analytics methods to optimize rules and constraints in product con guration.

1. Maglio , P.J.H.P.P. , Selinger , P.G. , Tan , W.C. : Data is dead without what-if models . Proceedings of the VLDB Endowment 4 ( 12 ) ( 2011 )

2. Soltanpoor , R. , Sellis , T. : Prescriptive analytics for big data . In: Australasian Database Conference , Springer ( 2016 ) 245 { 256

3. Bleiholder , J. , Naumann , F. : Data fusion . ACM Computing Surveys (CSUR) 41(1) ( 2009 ) 1

4. Schurr, A. , Dorr, H.: Introduction to the special sosym section on model-based tool integration . Software and Systems Modeling 4 ( 2 ) ( 2005 ) 109 { 111

5. Papadakis , G. , Alexiou , G. , Papastefanatos , G. , Koutrika , G.: Schema-agnostic vs schema-based con gurations for blocking methods on homogeneous data . Proceedings of the VLDB Endowment 9 ( 4 ) ( 2015 ) 312 { 323

6. Rapolu , B. : Focus: How big data is making tracks in the rail industry. Building the Digital Transport Network of the Future ( 2015 )

7. Han, J ., Pei , J. , Kamber , M. : Data mining: concepts and techniques . Elsevier ( 2011 )

8. Bishop , C.M. : Pattern recognition . Machine Learning 128 ( 2006 ) 1 { 58

9. Chandola , V. , Banerjee , A. , Kumar , V. : Anomaly detection: A survey. ACM computing surveys (CSUR) 41(3) ( 2009 ) 15

10. Esbensen , K.H. , Guyot , D. , Westad , F. , Houmoller , L.P. : Multivariate data analysis: in practice: an introduction to multivariate data analysis and experimental design . Multivariate Data Analysis ( 2002 )

11. Sabin , D. , Weigel , R.: Product con guration frameworks-a survey . IEEE Intelligent Systems and their applications 13(4) ( 1998 ) 42 { 49

12. Souza , G.C. : Supply chain analytics . Business Horizons 57 ( 5 ) ( 2014 ) 595 { 605

13. Porter , M.E. , Heppelmann , J.E. : How smart, connected products are transforming companies . Harvard Business Review 93 ( 10 ) ( 2015 ) 96 { 114

14. Siksnys , L. : Towards prescriptive analytics in cyber-physical systems . Dissertation ( 2014 )

15. Wurl , A. , Falkner , A. , Haselbock, A. , Mazak , A. : Using signi ers for data integration in rail automation . In: Proceedings of the 6th International Conference on Data Science, Technology and Applications - Volume 1 : DATA, ,

INSTICC

, SciTePress ( 2017 ) 172 { 179