Concept for metadata and time series data integration based on a material science application ontology* Paul Zierep, Dirk Helm Fraunhofer IWM, Wöhlerstraße 11, 79108 Freiburg, Germany paul.zierep@iwm.fraunhofer.de 1 Introduction The matching between ontologies and classical datasets allows for an improved in- terpretability and understanding of the associated information and its interoperability. However, the integration of large tabular datasets poses various difficulties concerning the storage and accessibility, since the data cannot be accessed using only the ontology as an interface. Whereas many categorical tabular datasets can be used to automatically populate ontologies, this approach is technically not feasible for large continuous datasets such as time series data. Although hybrid query systems such as R2RML and RML [1] allow for the mapping of relational databases and heterogeneous data formats to Resource Description Framework (RDF) graphs, these techniques are only suitable for trivial data schemas [2], which is not the case for our time series datasets. In the presented poster, we propose a protocol to include time series data into a ma- terial science ontology using a two-step approach that combines ontology-based metadata querying with an additional functionality that allows for the performant re- trieval of the associated time series data. The protocol demonstrates the first conceptual prototype of the envisioned system. We are looking forward to fruitful discussions with the community to further develop the proposed system. 2 Methods 2.1 Ontology development To design the mapping concept, we developed a prototype ontology based on the tensile test experiment. The ontology concepts were derived from experimental da- tasets, the test standard ISO 6892-1 and interviews with domain experts. All concepts were classified into two superclasses (MetaData and TimeSeriesData), that provide the attributes required for the designed data population strategy. The structure of the super- classes are described in detail in the poster. * Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). 2 2.2 Data Integration Pipeline To demonstrate data integration and data retrieval in our model system we imple- mented a basic pipeline. The pipeline was programmed using the ontology-based open- source Python framework SimPhoNy [3] and its major component OSP core (OSP: Open Simulation Platform). SimPhoNy allows for the manipulation of Abox individu- als using CUDS (Common Universal Data Structure) objects. Data parsing. To parse the raw datasets we implemented a mapping routine that reads the dataset and assigns the metadata to corresponding MetaData CUDS objects as well as each time series column to TimeSeriesData CUDS objects. The TimeSe- riesData CUDS objects only store the information required to extract the specific col- umn (e.g. column index, number of header rows and file path). The CUDS objects can be serialized to ontology individuals in a RDF graph. Data retrieval. The data can be queried using the SimPhoNy application program- ming interface (API) as well as SPARQL queries. The MetaData individuals can be used to perform complex queries, such as filtering for tensile test experiments with specific properties (e.g., experiments that used as specific material X with a specimen width larger than Y). The corresponding TimeSeriesData individuals can be used to extract the column data using additional data analysis tools. 3 Results and Discussions The implemented workflow enables the storing and retrieval of time series data in a semantically enriched ontology. The designed data parser allows for the parsing and mapping of semi-structured tabular data. The ontologically stored metadata can be used for complex queries of the time series data that would be difficult to perform using only the raw data files. 4 Funding This research was funded by the Federal Ministry of Education and Research Ger- many (BMBF) within the project StahlDigital (funding code: 13XP5116C). References 1. Zhao, Z., Han, S., Kim, J.: R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data. Multimed Tools Appl. 78, 28835–28851 (2019). https://doi.org/10.1007/s11042-019-7281-5 2. Heyvaert, P., Dimou, A., Verborgh, R., Mannens, E.: Ontology-Based Data Access Mapping Generation Using Data, Schema, Query, and Mapping Knowledge. (2017) 3. OSP core. SimPhoNy. (2021). https://github.com/simphony/osp-core