<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Concept for metadata and time series data integration based on a material science application ontology*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Zierep</string-name>
          <email>paul.zierep@iwm.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dirk Helm</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer IWM</institution>
          ,
          <addr-line>Wöhlerstraße 11, 79108 Freiburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The matching between ontologies and classical datasets allows for an improved interpretability and understanding of the associated information and its interoperability. However, the integration of large tabular datasets poses various difficulties concerning the storage and accessibility, since the data cannot be accessed using only the ontology as an interface. Whereas many categorical tabular datasets can be used to automatically populate ontologies, this approach is technically not feasible for large continuous datasets such as time series data. Although hybrid query systems such as R2RML and RML [1] allow for the mapping of relational databases and heterogeneous data formats to Resource Description Framework (RDF) graphs, these techniques are only suitable for trivial data schemas [2], which is not the case for our time series datasets. In the presented poster, we propose a protocol to include time series data into a material science ontology using a two-step approach that combines ontology-based metadata querying with an additional functionality that allows for the performant retrieval of the associated time series data. The protocol demonstrates the first conceptual prototype of the envisioned system. We are looking forward to fruitful discussions with the community to further develop the proposed system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>Ontology development</p>
      <p>To design the mapping concept, we developed a prototype ontology based on the
tensile test experiment. The ontology concepts were derived from experimental
datasets, the test standard ISO 6892-1 and interviews with domain experts. All concepts
were classified into two superclasses (MetaData and TimeSeriesData), that provide the
attributes required for the designed data population strategy. The structure of the
superclasses are described in detail in the poster.</p>
      <p>
        To demonstrate data integration and data retrieval in our model system we
implemented a basic pipeline. The pipeline was programmed using the ontology-based
opensource Python framework SimPhoNy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and its major component OSP core (OSP:
Open Simulation Platform). SimPhoNy allows for the manipulation of Abox
individuals using CUDS (Common Universal Data Structure) objects.
      </p>
      <p>Data parsing. To parse the raw datasets we implemented a mapping routine that
reads the dataset and assigns the metadata to corresponding MetaData CUDS objects
as well as each time series column to TimeSeriesData CUDS objects. The
TimeSeriesData CUDS objects only store the information required to extract the specific
column (e.g. column index, number of header rows and file path). The CUDS objects can
be serialized to ontology individuals in a RDF graph.</p>
      <p>Data retrieval. The data can be queried using the SimPhoNy application
programming interface (API) as well as SPARQL queries. The MetaData individuals can be
used to perform complex queries, such as filtering for tensile test experiments with
specific properties (e.g., experiments that used as specific material X with a specimen
width larger than Y). The corresponding TimeSeriesData individuals can be used to
extract the column data using additional data analysis tools.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussions</title>
      <p>The implemented workflow enables the storing and retrieval of time series data in a
semantically enriched ontology. The designed data parser allows for the parsing and
mapping of semi-structured tabular data. The ontologically stored metadata can be used
for complex queries of the time series data that would be difficult to perform using only
the raw data files.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Funding References</title>
      <p>This research was funded by the Federal Ministry of Education and Research
Germany (BMBF) within the project StahlDigital (funding code: 13XP5116C).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data</article-title>
          .
          <source>Multimed Tools Appl</source>
          .
          <volume>78</volume>
          ,
          <fpage>28835</fpage>
          -
          <lpage>28851</lpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1007/s11042-019-7281-5
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.:
          <article-title>Ontology-Based Data Access Mapping Generation Using Data, Schema</article-title>
          , Query, and
          <string-name>
            <surname>Mapping Knowledge.</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. OSP core.
          <source>SimPhoNy</source>
          . (
          <year>2021</year>
          ). https://github.com/simphony/osp-core
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>