<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimizing Con guration Data using Prescriptive Analytics?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Wurl</string-name>
          <email>alexander.wurl@siemens.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Siemens AG O</institution>
        </aff>
      </contrib-group>
      <fpage>158</fpage>
      <lpage>162</lpage>
      <abstract>
        <p>Suboptimal or erroneous con guration data in rail automation systems may cause serious safety and cost issues. This research work aims to address continuous optimization of such data by (i) connecting con guration and operation data by integration of heterogeneous data sources, and by (ii) application of prescriptive analytics methods to propose decision options on how to correct and optimize con guration data.</p>
      </abstract>
      <kwd-group>
        <kwd>Con guration Process</kwd>
        <kwd>Data Integration</kwd>
        <kwd>Asset Management</kwd>
        <kwd>Prescriptive Analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In industrial and infrastructural systems, like rail automation, product con
guration is the activity of engineering and customizing a product to meet the needs
of a particular customer. The product in question may consist of mechanical
parts, services, and software - each with various parameters and properties that
re ect the variability. The result of a con guration process are the con guration
data, i.e., a digital model of the system specifying all details for installation,
operation and maintenance of the facility. In the case of a rail automation system,
con guration data contain, e.g., the bill of material, detailed con guration plans
of all hardware parts, the station and track topology, the screen layout of the
operator terminals, parametrization of the control software, etc.</p>
      <p>While con guration data is speci ed at engineering time, operation data is
continuously generated by trains and the interlocking and control systems at
operation time. The amount of operation data generated each day is huge, because
it contains all positions and speeds of all trains, as well as logs of all telegrams
exchanged by the di erent subsystems.</p>
      <p>
        The combination of these two data models - con guration data and operation
data - allows for a feedback loop which has the potential to detect hardware
defects or errors in the con guration data. Anomalies and unexpected behavior in
operation data can be detected by statistical methods like principal components
analysis or discriminant analysis. Error causes can only be detected by locating
the corresponding con guration objects in the con guration data. This promising
setting enables to build a prescriptive analytics framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] including various
statistical analysis methods to explain the observed behavior and to propose
decision options on how to modify the con guration data.
      </p>
      <p>Another interesting application enabled by the availability of con guration
and operation models is predictive asset management, where prediction models
for the obsolescence of the various hardware parts are computed from di erent
heterogeneous data sources. Con guration and operation data are complemented
here by sales and order forecast models and contextual data, like weather data.</p>
      <p>Based on these challenges, the following research questions will be tackled in
the proposed dissertation thesis:
{ Which data integration processes are suitable for preparing con guration
and operation data for prescriptive analytics applications?
{ Which sequences of statistical methods are appropriate for predictive asset
management and anomaly detection in con guration and operation data?
{ How can new or modi ed con guration rules/constraints be derived by
prescriptive analytics methods in order to optimize con guration data?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Beyond the application of mere statistical analysis methods, data analytics
methods require a federated architecture of descriptive, predictive, and prescriptive
analytics in combination with data models and a data warehouse [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To achieve
reasonable results in analytics, ensuring data quality in the process of data
integration is an inevitable prerequisite [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Bridging the gap of heterogeneous data
sets, we aim at de ning a data schema for both operation and con guration data
which can be realized by providing a data model that accepts all properties of
heterogeneous data sets. Since similar data implies various representations, the
interchange of data between operations and con guration models are important
tasks, i.e. the data scheme strongly relates to the resulting data quality.
Despite of a lot of important e orts, model interoperability is still a challenging
task, leading most often to hand-crafted bilateral integration solutions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
suffering from high maintenance overheads, technology dependence, and scalability
problems. Therefore, previous results regarding data integration in schema-based
approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] shall be extended within the course of the proposed dissertation.
      </p>
      <p>
        Data analytics in rail automation gains more and more interest [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Applying data analytics on con guration data, we intend to use techniques from data
mining, machine learning, and anomaly detection. These techniques enable to
examine large data sets to uncover hidden patterns, unknown correlations and
other useful information that can be used to make better decisions. In more
detail, considering data from various integrated data sets methods from
multivariate analysis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] serve as basis for our data analysis, i.e., statistical models
capture relationships among many factors to allow the assessment of information
which has signi cant impacts in trend predictions.
      </p>
      <p>
        Following the statistical results of descriptive and predictive analytics, we
believe in applying prescriptive analytics methods to propose optimal decision
options for optimizations in product con guration. Basically, con guration can
be de ned as a "special case of design activity, where the artefact being con
gured is assembled from instances of a xed set of well-de ned component types
which can be composed conforming to a set of constraints" [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As prescriptive
analytics has evolved as a new research eld in industry which focuses on
describing the courses of actions and shows the in uence of each action [12{14],
there is a great potential to apply this concept for constraints that represent
technical restrictions, restrictions related to economic aspects, and conditions
related to production processes. Analytics methods are able to capture all
related con guration data which contribute to a wider analysis of restrictions and
can therefore show potential optimization options.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Signi cance</title>
      <p>Data analytics is a highly active research eld which has been driven mainly
in business applications in the last decade. Recently, more and more industrial
applications are adapting these methods, e.g., for sensor analysis for
predictive maintenance. We explore the links between data analytics technologies and
product con guration. The integration of con guration data in the process of
analyzing operation data enables an easy localization and therefore a better
understanding of the source of an anomaly.</p>
      <p>In this work we contribute to solutions for two crucial problems in the
domain of rail automation at the company Siemens AG O sterreich: (i) predictive
asset management and (ii) prescriptive analytics for correcting/optimizing
railway engineering data. In the rst scenario, forecasts of the form "How many
assets of type A will be needed within the next N years?" are computed.
Reliable forecasts are very important for guaranteeing the availability of all necessary
modules in the future and allow for a solid version and lifetime management of
module variants. The second scenario - the continuous correction and
optimization of con guration data - is an important building block of guaranteeing safe
and high-performance train operation.</p>
      <p>The framework developed within the proposed dissertation goes beyond the
rail automation use case. The methods developed are of general nature and may
be adapted and applied to other industrial elds such as industry automation,
power plants, or energy management. This is novel and highly promising,
especially in the context of Smart Production and Industry 4.0.</p>
    </sec>
    <sec id="sec-4">
      <title>Research design and methods</title>
      <p>
        According to the research questions, we design a framework of methods with the
following contributions.
1. Heterogeneous Data Model Integration. We need to integrate various
data sources of di erent formats, like Excel and XML. As di erent business
units use di erent tools and formats to maintain data, integration of data
is challenging and prone to errors. Existing data quality methods fall short
of a generalized approach that covers such a variety of data types in the
domain of rail automation. Our contribution extends the notion of signi ers
for a robust and at the same time typo-tolerant identi cation of objects of
di erent sources [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
2. Multivariate Data Analysis. Multivariate statistics are eminently
suitable for anomaly detection and prediction trends [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Therefore, we develop
techniques to extract statistical information and anomalies from the
operation data. E.g., are there any hardware models or interfaces which frequently
reboot? Have trains of di erent vendors di erent driving behaviour? These
analyses will also integrate con guration and contextual data.
3. Feedback from Operation to Con guration Data. Con guration
models represent all the di erent HW and SW element types along with their
structure and constraints to build a system (e.g. a rail automation system).
By following the statistical results of prediction, we believe that new rules
or constraints can be learned by using, e.g., classi cation and regression tree
methods to improve the con guration models. For example, certain types of
modules may cause overheating if located next to each other in the hardware
rack. External, contextual data, usually available as linked open data, may
also be integrated to derive additional rules (e.g. heat sensibility of a module
derived from module shutdowns in combination with meteorological data).
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Research stage</title>
      <p>The project related to this research has been started in April 2016. The work
follows the contributions described in Section 4 assuming that they build on each
other.</p>
      <p>
        The rst stage, Heterogeneous Data Model Integration, is nished. The
result of the this contribution is an approach \Using Signi ers for Data Integration
in Rail Automation" which was presented at the 6th International Conference
on Data Science, Technology and Applications [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This approach enables a
semi-automatic process for data import, where the user resolves ambiguous data
classi cations. We introduced a technique using a signi er, which is a natural
extension of composite primary keys to nd the correct data warehouse
classi cation of source values in a proprietary, often semi-structured format. This
approach is already in use and results show a signi cant improvement of data
quality.
      </p>
      <p>The di erent data analytics tasks for predictive asset management and anomaly
detection in operation data are de ned and documented in a user requirements
speci cation. Next, we will study the applicability of di erent multivariate
methods to our analytics tasks. The selection and application of statistical methods
is a highly sensitive task since the results serve as basis for further prescriptive
analytics methods to optimize rules and constraints in product con guration.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Maglio</surname>
            ,
            <given-names>P.J.H.P.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selinger</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Data is dead without what-if models</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>4</volume>
          (
          <issue>12</issue>
          ) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Soltanpoor</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sellis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Prescriptive analytics for big data</article-title>
          .
          <source>In: Australasian Database Conference</source>
          , Springer (
          <year>2016</year>
          )
          <volume>245</volume>
          {
          <fpage>256</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bleiholder</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Data fusion</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 41(1)</source>
          (
          <year>2009</year>
          )
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Schurr,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Dorr, H.:
          <article-title>Introduction to the special sosym section on model-based tool integration</article-title>
          .
          <source>Software and Systems Modeling</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ) (
          <year>2005</year>
          )
          <volume>109</volume>
          {
          <fpage>111</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Papadakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexiou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papastefanatos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutrika</surname>
          </string-name>
          , G.:
          <article-title>Schema-agnostic vs schema-based con gurations for blocking methods on homogeneous data</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ) (
          <year>2015</year>
          )
          <volume>312</volume>
          {
          <fpage>323</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rapolu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Focus: How big data is making tracks in the rail industry. Building the Digital Transport Network of the Future (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Han,
          <string-name>
            <given-names>J</given-names>
            .,
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kamber</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Data mining: concepts and techniques</article-title>
          .
          <source>Elsevier</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Pattern recognition</article-title>
          .
          <source>Machine Learning</source>
          <volume>128</volume>
          (
          <year>2006</year>
          )
          <volume>1</volume>
          {
          <fpage>58</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chandola</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Anomaly detection: A survey. ACM computing surveys (CSUR) 41(3) (</article-title>
          <year>2009</year>
          )
          <fpage>15</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Esbensen</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westad</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Houmoller</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          :
          <article-title>Multivariate data analysis: in practice: an introduction to multivariate data analysis and experimental design</article-title>
          .
          <source>Multivariate Data Analysis</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sabin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weigel</surname>
          </string-name>
          , R.:
          <article-title>Product con guration frameworks-a survey</article-title>
          .
          <source>IEEE Intelligent Systems and their applications 13(4)</source>
          (
          <year>1998</year>
          )
          <volume>42</volume>
          {
          <fpage>49</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>G.C.</given-names>
          </string-name>
          :
          <article-title>Supply chain analytics</article-title>
          .
          <source>Business Horizons</source>
          <volume>57</volume>
          (
          <issue>5</issue>
          ) (
          <year>2014</year>
          )
          <volume>595</volume>
          {
          <fpage>605</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heppelmann</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>How smart, connected products are transforming companies</article-title>
          .
          <source>Harvard Business Review</source>
          <volume>93</volume>
          (
          <issue>10</issue>
          ) (
          <year>2015</year>
          )
          <volume>96</volume>
          {
          <fpage>114</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Siksnys</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Towards prescriptive analytics in cyber-physical systems</article-title>
          .
          <source>Dissertation</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wurl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falkner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Haselbock,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mazak</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Using signi ers for data integration in rail automation</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Data Science, Technology and Applications - Volume</source>
          <volume>1</volume>
          : DATA,
          <string-name>
            <surname>,</surname>
            <given-names>INSTICC</given-names>
          </string-name>
          , SciTePress (
          <year>2017</year>
          )
          <volume>172</volume>
          {
          <fpage>179</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>