<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Analysis Environment for Materials Science and Engineering Integrating Heterogeneous Data Resources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Toshihiro Ashino</string-name>
          <email>ashino@acm.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nobutaka Nishikawa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takuya Kadohira</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mizuho Information &amp; Research Institute, Inc.</institution>
          <addr-line>2-3 Kanda-Nishikicho, Chiyoda-ku, Tokyo 101-8443</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute for Materials Science</institution>
          ,
          <addr-line>1-2-1 Sengen, Tsukuba, Ibaraki, 305-0047</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Toyo University</institution>
          ,
          <addr-line>5-28-20 Hakusan, Bunkyo-ku, Tokyo 112-8606</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>336</fpage>
      <lpage>341</lpage>
      <abstract>
        <p>Materials performance analysis requires to integrate many heterogeneous data and information resources, experimental data, empirical/theoretical models and computational simulations. It means data analysis platform for materials science and engineering should provide many functionalities, e.g., data retrieval, processing, statistical analysis, symbolic mathematics, visualization and scripting capabilities to store the typical data analysis process and also, these heterogeneous data resources should be accessed unified way. Scripting language Python provides many of these capabilities with additional software modules and widely applied to interactive/non-interactive data processing environment. In this paper, a prototype design and implementation of data analysis environment for materials science and engineering is presented.</p>
      </abstract>
      <kwd-group>
        <kwd>virtual research environment</kwd>
        <kwd>materials integration</kwd>
        <kwd>materials ontology</kwd>
        <kwd>semantic web</kwd>
        <kwd>heterogeneous data integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In many research area, data intensive research, so called the Fourth Paradigm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
have been increasing its importance. In materials science and engineering, there is a
long tradition developing computerized materials property databases [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. But
materials experiment requires huge cost and high skill, materials represent wide variation
of properties, there are various measurement methods and substances, data intensive
approach is delayed to be introduced into materials design process.
      </p>
      <p>
        But advancement of computer simulation technology and new measurement
method presented a possibility to obtain huge amount of data in this field. It enables to
evaluate materials properties such as physical properties and long term performance
with minimum experiment, relatively low cost and short period, furthermore, enables
to predict materials performance without real experiment [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4–6</xref>
        ].
      </p>
      <p>
        One of the important application area is to develop software platform for high
throughput computational approach for materials design focused on functional
materials which performances are directly reflect micro-scale physical properties [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
However, in case of structural materials performances prediction, e.g. creep rupture
property, different scales and complexed interactions of physical phenomenon affect
the total performance, it requires to integrate heterogeneous data and models.
      </p>
      <p>
        This approach is called ICME (Integrated Computational Materials Engineering)
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In Japan, SIP-MI (Strategic Innovation Promotion Program: Materials
Integration) is a project to implement ICME concept. Information platform for MI is required
to handle and integrate many kind of information resources, such as experimental
data, simulation modules and mathematical equations. Semantic description of data,
relationships among data and attributes of data are essential in order to integrate these
heterogeneous information.
      </p>
      <p>
        We applied the Semantic Web framework to this application. It provides several
machine readable semantic description standards, XML Schema [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], RDF (Resource
Description Framework)/SPARQL (SPARQL Protocol and RDF Query Language)
[
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], OWL (Web Ontology Language) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and OpenMath [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. MI prototype
data platform which can handle these data formats and enables to describe workflows
of materials data processing has been developed.
2
      </p>
      <p>
        Design and Implementation of the Prototype
The prototype system is based on a mathematical system, SageMath [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which is an
open source project integrates many open source mathematical systems, SciPy, R and
others. It is based on Python programming environment and this means, various
software modules developed for Python can be used in this system and it is easy to
develop original data processing modules for this data processing environment.
      </p>
      <p>Fig. 1 shows the design of the prototype system. In order to achieve flexible data
management, since it should manage continuously evolving materials measurement
and new materials data, metadata, which describes the structure of database is stored
in Apache/Jena Fuseki SPARQL endpoint as RDF files. RDF provides conceptual
description on the data resources and it is retrieved by using SPARAL query
language.</p>
      <p>
        Metadata which describes experimental data and mathematical equations, target
materials, equation names, target property, application conditions and link to data and
equation body, are written in RDF for retrieval by SPARQL. Sample experimental
databases is stored as XML (Extensible Markup Language) documents, they can be
accessed by their URI’s listed in RDF files. Equation bodies are also stored as XML
documents which written in OpenMath semantic representation of mathematics,
which provides rich vocabularies contain many operators and mathematical functions
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>Python modules XML, RDFlib, SPARQLWrapper and py-openmath are
incorporated into SageMath symbolic-math environment and original OpenMath parser have
been developed for this prototype. Metadata which describes experimental data and
mathematical equations, target materials, equation names, target property, application
conditions and link to data and equation body, are written in RDF for retrieval by
SPARQL. Materials Ontology written in OWL is managed by the same SPARQL
endpoint.</p>
      <p>Jupyter Notebook
for interactive data processing
SageMath (Python)</p>
      <p>Python modules
XML processing module
RDFlib
SPARQLWrapper
Py-openmath</p>
      <p>SPARQL</p>
      <p>XML Documents</p>
      <p>Apache Jena/Fuseki</p>
      <p>SPARQL Endpoint
Ontologies
(OWL)</p>
      <p>Rules
(RuleML)
Equations
(OpenMath)</p>
      <p>Parameters
(RDF)</p>
      <p>Experimetal Data
(RDF, XMLScehma)
Jena Ontology API
External data processing applications
(R, Maxima, Mathematica, etc.)
Fig. 2. Metadata description in RDF for (a) experimental data and (b) constitution
equation. Data and equations are stored in XML files pointed by URI’s</p>
      <p>An Example Materials Performance Analysis Workflow
in Python
One of the typical materials data processing workflow, creep data analysis is
displayed in Fig. 3. Workflows can be written in Python scripting language in the
prototype, it provides quite flexible and extensible description. 1st, relevant creep
experimental data is retrieved from database with SPARQL. Results are obtained in XML
documents and they are transformed into appropriate format for further processing by
the XPath functions of Python XML processing module. XML data format stored in
database is defined in this project locally, but it should be standardized for test
method or property in XML Schema.</p>
      <p>2nd, appropriate equation, in this case Norton equation, constitution equation for
creep behavior is selected by its metadata written in RDF. The metadata contains a
URI which points semantic representation of the equation in OpenMath. It can be
parsed and converted into the corresponding input format required by specified data
processing package, e.g. R, SciPy and other packages which is integrated to
SageMath.</p>
      <p>In the package, non-linear least square method is applied to the equation with the
retrieved experimental data set. Obtained parameter values, in this case A and n, are
written into RDF format, added appropriate metadata, e.g. link to corresponding
experimental data, equation and version of software package, and stored into the
database for further utilization in MI software modules.</p>
      <p>This workflow can be stored as Python script and also, all functions can be used in
interactive programming environment Jupyter notebook. This script has properly
worked and proved the extensibility and flexibility of this system.
4</p>
      <p>
        Discussions
There are many trials to develop ontology and integrate data with ontology [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref22">19–22</xref>
        ].
Ontology can be used a fundamental dictionary for data integration. But in order to
integrate heterogeneous information resources, all description of these resources
should be based on common ontology or be mapped to the correspondence of
ontology. This work is done manually, it requires continuous efforts to standardize and
disseminate ontology, and also support system to select vocabulary with ontology
reasoner.
      </p>
      <p>Materials ontology has been extended to contain some concepts which relate to
creep performance evaluation. In this prototype, ontology written in OWL can be
accessed via Apache/Jena API, we are now testing utilization of reasoner in data
retrieval and rule based data analysis with this functionality.</p>
    </sec>
    <sec id="sec-2">
      <title>Search creep test experimental data (SPARQL)</title>
    </sec>
    <sec id="sec-3">
      <title>XML data transformation by XPATH</title>
    </sec>
    <sec id="sec-4">
      <title>Search creep constitution equation (SPARQL) =A</title>
    </sec>
    <sec id="sec-5">
      <title>Parse</title>
    </sec>
    <sec id="sec-6">
      <title>OpenMath equation</title>
    </sec>
    <sec id="sec-7">
      <title>Parameter fitting by</title>
      <p>data analysis system</p>
      <p>A = 3.80542e-27
ming language and the design have been verified by sample database and script. RDF
metadata representation for</p>
      <p>materials experimental data and mathematical equations is
defined and tested for further development of</p>
      <p>MI system.
This work was supported by Council for Science, Technology and Innovation (CSTI),
Cross-ministerial Strategic Innovation Promotion Program (SIP), Structural Materials
for Innovation” (Funding by JST).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hey</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tansley</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tolle</surname>
            ,
            <given-names>K.M.:</given-names>
          </string-name>
          <article-title>The fourth paradigm: data-intensive scientific discovery (Microsoft Research</article-title>
          , Redmond,
          <year>1969</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Rumble</given-names>
            <surname>Jr.</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.R.</surname>
          </string-name>
          :
          <source>Integr. Mater. Manuf. Innov. (6)</source>
          ,
          <fpage>172</fpage>
          -
          <lpage>186</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Austin</surname>
          </string-name>
          , T.: Mater.
          <source>Discov. 3</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Curtarolo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>G.L.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardelli</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mingo</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanvito</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.: Nature</given-names>
          </string-name>
          <string-name>
            <surname>Mater</surname>
          </string-name>
          .
          <volume>20</volume>
          ,
          <fpage>191</fpage>
          -
          <lpage>201</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Broderick</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santhanam</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rajan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : JOM
          <volume>68</volume>
          ,
          <fpage>2109</fpage>
          -
          <lpage>2115</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Editorial: Scripta Mater.
          <volume>70</volume>
          ,
          <issue>1</issue>
          -
          <fpage>2</fpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ong</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richards</surname>
            ,
            <given-names>W.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hautier</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kocher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cholia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chevrier</surname>
            ,
            <given-names>V.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ceder</surname>
          </string-name>
          , G.: Comp. Mater. Sci.
          <volume>68</volume>
          ,
          <fpage>314</fpage>
          -
          <lpage>319</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kalidindi</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niezgoda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>Fast</surname>
          </string-name>
          , T.: Comp.,
          <source>Mater. and Cont</source>
          .
          <volume>17</volume>
          ,
          <fpage>103</fpage>
          -
          <lpage>125</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. National Research Council:
          <article-title>Integrated Computational Materials Engineering: A Transformational Discipline for Improved Competitiveness and National Security (The National Academies Press</article-title>
          , Washington, DC.
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. W3C: https://www.w3.org/standards/xml/schema, last accessed
          <year>2019</year>
          /5/5
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. W3C: https://www.w3.org/TR/2014/NOTE-rdf11
          <string-name>
            <surname>-</surname>
          </string-name>
          primer-20140624/, last accessed
          <year>2019</year>
          /5/5
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. W3C: https://www.w3.org/TR/2013/REC-sparql11
          <string-name>
            <surname>-</surname>
          </string-name>
          overview-20130321/, last accessed
          <year>2019</year>
          /5/5
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. W3C: https://www.w3.org/TR/2012/REC-owl2
          <string-name>
            <surname>-</surname>
          </string-name>
          overview-20121211/, last accessed
          <year>2019</year>
          /5/5
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. OpenMath Society: https://www.openmath.org/standard/om20-2017-07-22/, last accessed
          <year>2019</year>
          /5/5
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. SageMath,
          <source>the Sage Mathematics Software System (Version 8.0)</source>
          ,
          <source>The Sage Developers</source>
          ,
          <year>2017</year>
          , https://www.sagemath.org.
          <source>last accessed 2019/5/5</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ashino</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yamashita</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <source>Data Sci. J</source>
          .
          <volume>11</volume>
          ,
          <fpage>ASMD17</fpage>
          -
          <lpage>ASMD21</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Dublin Core Initiative: http://dublincore.org/,
          <source>last accessed 2019/5/5</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ashino</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <source>Data Sci. J</source>
          .
          <volume>9</volume>
          ,
          <fpage>54</fpage>
          -
          <lpage>61</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <source>AIP Advances 7</source>
          ,
          <issue>105325</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>LeBlanc</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Balduccini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Regli</surname>
          </string-name>
          , W.C.: AAAI-14
          <string-name>
            <surname>Workshop</surname>
          </string-name>
          (AAAI,
          <string-name>
            <surname>Quebec</surname>
          </string-name>
          ,
          <year>2014</year>
          )
          <fpage>39</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Madalli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sulochana</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <source>Data Technol and Appl</source>
          .
          <volume>50</volume>
          ,
          <fpage>103</fpage>
          -
          <lpage>117</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Remolona</surname>
            ,
            <given-names>M.F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conway</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balasubramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nirantar</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranabothu</surname>
            ,
            <given-names>N.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rastogi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Venkatasubramanian</surname>
          </string-name>
          , V.:
          <string-name>
            <surname>Comp</surname>
          </string-name>
          . and
          <string-name>
            <surname>Chem</surname>
          </string-name>
          . Eng.
          <volume>107</volume>
          ,
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>