<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable Semantic Access to Siemens Static and Streaming Distributed Data?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E. Kharlamov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Brandt</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Giese</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. Jime´nez-Ruiz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Y. Kotidis</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Lamparter</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Mailis</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Neuenstadt</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O¨. O¨ zc¸ep</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Pinkel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Soylu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Svingos</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Zheleznyakov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I. Horrocks</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Y. Ioannidis</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Mo¨ ller</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Waaler</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Numerous analytical tasks in industry rely on data integration solutions since they require data from multiple static and streaming data sources. In the context of the Optique project we have investigated how Semantic Technologies can enhance data integration and thus facilitate further data analysis. We introduced the notion Ontology-Based Stream-Static Data Integration and developed the system Optique to put our ideas in practice. In this demo we will show how Optique can help in diagnostics of power generating turbines in Siemens Energy. For this purpose we prepared anonymised streaming and static data from 950 Siemens power generating turbines with more than 100,000 sensors and deployed Optique on distributed environments with 128 nodes. The demo attendees will be able to see do diagnostics of turbines by registering and monitoring continuous queries that combine streaming and static data; to test scalability of our devoted stream management system that is able to process up to 1024 concurrent complex diagnostic queries with a 10 TB/day throughput; and to deploy Optique over Siemens demo data using our devoted interactive system to create abstraction semantic layers over data sources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Motivation. Siemens runs service centres dedicated to diagnostics of thousands of
power-generation appliances across the globe. One typical task for these centres is to
detect in real-time potential failure events caused by, e.g., an abnormal temperature
and pressure increase. Such tasks require simultaneous processing of (i) sequences of
digitally encoded coherent signals produced and transmitted from thousands of gas and
steam turbines, generators, and compressors installed in power plants, and (ii) static data
that include the structure of relevant equipment, history of its exploitation and repairs, and
even weather conditions. These data are scattered across a large number of heterogeneous
data streams in addition to static DBs with hundreds of TBs of data.</p>
      <p>
        Even for a single diagnostic task, such as checking if a given turbine might develop
a fault, Siemens engineers have to analyse streams with temperature and other
measurements from up to 2; 000 sensors installed in different parts of the turbine, analyse
historical temperature data, compute temperature patterns, compare them to patterns in
other turbines, compare weather conditions, etc. This requires to pose a collection of
hundreds of queries, the majority of which are semantically the same (they ask about
temperature), but syntactically different (they are over different schemata). Formulating
and executing so many queries, and then assembling the computed answers, takes up to
80% of the overall diagnostic time [10].
? This demo accompanies our ISWC’16 paper [14] and extends [
        <xref ref-type="bibr" rid="ref8">8, 11</xref>
        ]. This research was funded
by the EU project Optique (FP7-IP-318338) and the EPSRC grants DBonto, MaSI3, and ED3.
Our Proposal. In order to streamline the diagnostic process at Siemens, we propose a
data integration approach based on Semantic Technologies that extends a well-known
Ontology Based Data Access and that we call Ontology-Based Stream-Static Data
Integration (OBSSDI). It follows the classical data integration paradigm that requires the
creation of a common ‘global’ schema that consolidates ‘local’ schemata of the integrated
data sources, and mappings that define how the local and global schemata are related. In
OBSSDI the global schema is an ontology: a formal conceptualisation of the domain of
interest that consists of a vocabulary, i.e., names of classes, attributes and binary relations,
and axioms over the terms from the vocabulary that, e.g., assign attributes of classes,
define relationships between classes, compose classes, class hierarchies, etc. OBSSDI
mappings relate each ontological term to a set of queries over the underlying data. For
example, the generic ontology attribute temperature-of-sensor is mapped to all specific
data and procedures that return temperature readings from sensors in dozens of different
turbines and DBs storing historical data, thus, all particularities and varieties of how the
temperature of a sensor can be measured, represented and stored are captured in these
mappings. In OBSSDI the integrated data can be accessed by posing queries over the
ontology, i.e., ontological queries. These queries are hybrid: they refer to both streaming
and static data. Evaluation of an ontological query in OBSSDI has three stages: (i) in the
enrichment stage ontology axioms are used to expand the ontological query in order to
access as much of relevant data as possible; (ii) in the unfolding stage the mappings are
used to translate the enriched ontological query into (possibly many) queries over the
data; and (iii) in the execution stage the unfolded data queries are executed over the data.
OBSSDI differs from traditional OBDA since the latter assumes that data is in (static)
relational DBs, e.g [
        <xref ref-type="bibr" rid="ref3">3, 18</xref>
        ], or streaming, e.g., [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ], but not of both kinds. Moreover,
we are different from existing solutions for unified processing of streaming and static
semantic data e.g. [17], since they assume that data is natively in RDF while we assume
that the data is relational and mapped to RDF.
      </p>
      <p>Contributions. We developed an OBSSDI system OPTIQUE with several novel parts:
– BOOTOX: semi-automatic support to construct high quality ontologies and mappings
over relational and streaming data.
– STARQL: query language over ontologies that combines streaming and static data,
and allows for efficient enrichment and unfolding that preserves the semantics of
ontological queries.
– STREAMVQS: end-user oriented query formulation support to construct continuos
ontological queries.
– EXASTREAM: backend for optimising large numbers of queries automatically
generated via enrichment and unfolding, and efficiently execute them over distributed
streaming and static data.</p>
      <p>
        BOOTOX [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is practically important since it can dramatically speed up deployment
and maintenance of OBSSDI systems. STARQL [16] is crucial since, to the best
of our knowledge, no dedicated query language for hybrid semantic queries has the
required properties. STREAMVQS (that extends OptiqueVQS [19]) is essential since
it allows for fast and easy data access for non-experts to state-of-the-art technologies.
EXASTREAM [15] is vital since even in the context where the data is only static and not
distributed, query execution without dedicated optimisation techniques performs poorly
since the queries that are automatically computed after enrichment and unfolding can be
very inefficient, e.g., they may contain many redundant joins and unions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 System Overview</title>
      <p>
        OPTIQUE is an integrated system that consist of multiple components to support OBSSDI
end-to-end [
        <xref ref-type="bibr" rid="ref6">6, 9, 12, 13</xref>
        ]. For IT specialists OPTIQUE offers support for the whole lifecycle
of ontologies and mappings: semi-automatic bootstrapping from relational data sources,
importing of existing ontologies, semi-automatic quality verification and optimisation,
cataloging, manual definition and editing of mappings. For end-users OPTIQUE offers
tools for query formulation support, query cataloging, answer monitoring, as well as
integration with GIS systems. Query evaluation is done via OPTIQUE’s query enrichment,
unfolding, and execution backends EXASTREAM that allow to execute up to thousands
complex ontological queries in highly distributed environments.
      </p>
      <p>We now give an overview of EXASTREAM, our component for scalable streaming
and static relational data processing that is in the focus of this demonstration. Relational
queries produced by an unfolding component of Optique are handled by EXASTREAM, our
high-throughput distributed Data Stream Management System (DSMS). The EXASTREAM
DSMS is embedded in EXAREME, a system for elastic large-scale dataflow processing in
the cloud [15, 20]. In the following, we present some key aspects of EXASTREAM.</p>
      <p>
        EXASTREAM is built as a streaming extension of the SQLite DBMS, taking advantage
of existing Database Management technologies and optimisations such as query planners.
It provides a declarative language, namely SQL , for querying data streams and relations
that conform to the CQL semantics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. EXASTREAM natively supports User Defined
Functions (UDFs) with arbitrary user code. The engine blends the execution of UDFs
together with relational operators using JIT tracing compilation techniques speeding
up the execution time. UDFs allow to express very complex dataflows using simple
primitives. For OPTIQUE we used UDFs to implement communication with external
sources, window partitioning on data streams, data mining algorithms such as the
LocalitySensitive Hashing technique [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for computing the correlation between values of multiple
streams. More importantly, the main operators that incorporate the algorithmic logic for
transforming SQLite into a DSMS are implemented as UDFs.
      </p>
      <p>In order to enable efficient processing of data streams of very high velocity we have
implemented a number of optimisations in the stream processing engine, such as adaptive
indexing. With this technique EXASTREAM collects statistics during query execution and,
adaptively, decides to build main-memory indexes on batches of cached stream tuples in
order to expedite query processing.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration Scenarios</title>
      <p>For the demonstration purpose we selected 20 diagnostic tasks typical for Siemens service
centres and expressed these tasks in STARQL and STREAMVQS. Then, we prepared a
demo data set of streaming and static data from 950 gas and steam turbines in the time
from 2002 to 2011. This data is anonymised in a way that preserves the patterns needed
for demo diagnostic tasks. During the demo we will ‘play’ the streaming data and thus
emulate real time streams. Then, we distributed the demo-data in several installations
with different number of nodes (VMs) ranging from 1 to 128, where each node has 2
processors and 4GB of main memory. To demonstrate diagnostics results we prepared a
dedicated monitoring dashboard for each diagnostic task in the catalog. Dashboards show
diagnostics results in real time, as well as statistics on streaming answers, relevant turbines,
and other information that is typically required by the service engineers at Siemens.
Finally, we deployed OPTIQUE over the Siemens data by bootstrapping ontologies and
mappings with BOOTOX and then manually post-processing and extending them so that
they reach the required quality and contain necessary terms and mappings to cover 20
Siemens diagnostic tasks.</p>
      <p>During the demo OPTIQUE will be available in three scenarios:
[S1] Diagnostics with user’s deployment: the attendees will be able to deploy OPTIQUE
over the Siemens data by bootstrapping ontologies and mappings, saving them, and
observing and possibly improving them in dedicated editors. Then, they will query
their deployed instance with diagnostic tasks either from the Siemens catalog or their
own, i.e., they will be able to formulate such tasks in STREAMVQS as parametrised
continuous queries and register concrete instances of these tasks over data streams.
[S2] Diagnostics with our deployment: The attendees will be able to query our
preconfigured (high quality) Siemens deployment using diagnostic tasks either from the
Siemens catalog and their own.
[S3] Performance showcase of our deployment: the attendees will be able to run various
tests over our deployment using one of 128 preconfigured Siemens distributed
environments and one of 10 test sets of queries. While running the tests they will
monitor the throughput and progress of parallel query execution processes.
4</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Arasu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Babu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>The CQL Continuous Query Language: Semantic Foundations and Query Execution</article-title>
          . In: VLDBJ (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <surname>O</surname>
          </string-name>
          ´ .
          <article-title>Corcho, and</article-title>
          <string-name>
            <given-names>A. J. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          .
          <article-title>Enabling Ontology-Based Access to Streaming Data Sources</article-title>
          . In: ISWC.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Civili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Console</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          , et al. MASTRO STUDIO:
          <article-title>Managing Ontology-Based Data Access applications</article-title>
          .
          <source>In: PVLDB 6</source>
          .12 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Scharrenbach</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling</article-title>
          .
          <source>In: SSWKBS@ISWC</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Giatrakos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kotidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deligiannakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vassalos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          .
          <article-title>In-network approximate computation of outliers with quality guarantees</article-title>
          .
          <source>In: Inf. Systems 38.8</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hubauer</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kharlamov</surname>
          </string-name>
          , et al.
          <article-title>Addressing Streaming and Historical Data in OBDA Systems: Optique's Approach</article-title>
          . In: KNOW@LOD.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jime</surname>
          </string-name>
          <article-title>´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zheleznyakov</surname>
            , I. Horrocks,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pinkel</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Skjaeveland</surname>
            , E. Thorstensen, and
            <given-names>J. Mora.</given-names>
          </string-name>
          <article-title>BootOX: Practical Mapping of RDBs to OWL 2</article-title>
          . In: ISWC.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kotidis</surname>
          </string-name>
          , et al.
          <article-title>Ontology-Based Integration of Streaming and Static Relational Data with Optique</article-title>
          .
          <source>In: SIGMOD</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>