=Paper= {{Paper |id=Vol-2523/paper32 |storemode=property |title= Data Analysis Environment for Materials Science and Engineering Integrating Heterogeneous Data Resources (short paper) |pdfUrl=https://ceur-ws.org/Vol-2523/paper32.pdf |volume=Vol-2523 |authors=Toshihiro Ashino,Nobutaka Nishikawa,Takuya Kadohira |dblpUrl=https://dblp.org/rec/conf/rcdl/AshinoNK19 }} == Data Analysis Environment for Materials Science and Engineering Integrating Heterogeneous Data Resources (short paper) == https://ceur-ws.org/Vol-2523/paper32.pdf
        Data Analysis Environment for Materials Science
        and Engineering Integrating Heterogeneous Data
                           Resources

             Toshihiro Ashino 1, Nobutaka Nishikawa 2, and Takuya Kadohira3
            1 Toyo University, 5-28-20 Hakusan, Bunkyo-ku, Tokyo 112-8606, Japan

                                       ashino@acm.org
    2 Mizuho Information & Research Institute, Inc. 2-3 Kanda-Nishikicho, Chiyoda-ku, Tokyo

                                         101-8443, Japan
    3 National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki, 305-0047, Japan




         Abstract. Materials performance analysis requires to integrate many heteroge-
         neous data and information resources, experimental data, empirical/theoretical
         models and computational simulations. It means data analysis platform for ma-
         terials science and engineering should provide many functionalities, e.g., data
         retrieval, processing, statistical analysis, symbolic mathematics, visualization
         and scripting capabilities to store the typical data analysis process and also,
         these heterogeneous data resources should be accessed unified way. Scripting
         language Python provides many of these capabilities with additional software
         modules and widely applied to interactive/non-interactive data processing envi-
         ronment. In this paper, a prototype design and implementation of data analysis
         environment for materials science and engineering is presented.

         Keywords: virtual research environment, materials integration, materials on-
         tology, semantic web, heterogeneous data integration


1        Introduction

In many research area, data intensive research, so called the Fourth Paradigm [1],
have been increasing its importance. In materials science and engineering, there is a
long tradition developing computerized materials property databases [2, 3]. But mate-
rials experiment requires huge cost and high skill, materials represent wide variation
of properties, there are various measurement methods and substances, data intensive
approach is delayed to be introduced into materials design process.
   But advancement of computer simulation technology and new measurement meth-
od presented a possibility to obtain huge amount of data in this field. It enables to
evaluate materials properties such as physical properties and long term performance
with minimum experiment, relatively low cost and short period, furthermore, enables
to predict materials performance without real experiment [4–6].
   One of the important application area is to develop software platform for high
throughput computational approach for materials design focused on functional mate-



 Copyright © 2019 for this paper by its authors. Use permitted under Creative
 Commons License Attribution 4.0 International (CC BY 4.0).




                                               336
rials which performances are directly reflect micro-scale physical properties [7, 8].
However, in case of structural materials performances prediction, e.g. creep rupture
property, different scales and complexed interactions of physical phenomenon affect
the total performance, it requires to integrate heterogeneous data and models.
   This approach is called ICME (Integrated Computational Materials Engineering)
[9]. In Japan, SIP-MI (Strategic Innovation Promotion Program: Materials Integra-
tion) is a project to implement ICME concept. Information platform for MI is required
to handle and integrate many kind of information resources, such as experimental
data, simulation modules and mathematical equations. Semantic description of data,
relationships among data and attributes of data are essential in order to integrate these
heterogeneous information.
   We applied the Semantic Web framework to this application. It provides several
machine readable semantic description standards, XML Schema [10], RDF (Resource
Description Framework)/SPARQL (SPARQL Protocol and RDF Query Language)
[11, 12], OWL (Web Ontology Language) [13] and OpenMath [14]. MI prototype
data platform which can handle these data formats and enables to describe workflows
of materials data processing has been developed.


2      Design and Implementation of the Prototype

The prototype system is based on a mathematical system, SageMath [15], which is an
open source project integrates many open source mathematical systems, SciPy, R and
others. It is based on Python programming environment and this means, various soft-
ware modules developed for Python can be used in this system and it is easy to devel-
op original data processing modules for this data processing environment.
   Fig. 1 shows the design of the prototype system. In order to achieve flexible data
management, since it should manage continuously evolving materials measurement
and new materials data, metadata, which describes the structure of database is stored
in Apache/Jena Fuseki SPARQL endpoint as RDF files. RDF provides conceptual
description on the data resources and it is retrieved by using SPARAL query lan-
guage.
   Metadata which describes experimental data and mathematical equations, target
materials, equation names, target property, application conditions and link to data and
equation body, are written in RDF for retrieval by SPARQL. Sample experimental
databases is stored as XML (Extensible Markup Language) documents, they can be
accessed by their URI’s listed in RDF files. Equation bodies are also stored as XML
documents which written in OpenMath semantic representation of mathematics,
which provides rich vocabularies contain many operators and mathematical functions
[16].
   Python modules XML, RDFlib, SPARQLWrapper and py-openmath are incorpo-
rated into SageMath symbolic-math environment and original OpenMath parser have
been developed for this prototype. Metadata which describes experimental data and
mathematical equations, target materials, equation names, target property, application
conditions and link to data and equation body, are written in RDF for retrieval by




                                          337
SPARQL. Materials Ontology written in OWL is managed by the same SPARQL
endpoint.

           Jupyter Notebook
           for interactive data processing



                                                                                                                            Apache Jena/Fuseki
          SageMath (Python)                                              SPARQL                                              SPARQL Endpoint
                             Python modules
                                                                               XML Documents                   Equations           Parameters            Experimetal Data
                 XML processing module                                                                        (OpenMath)              (RDF)             (RDF, XMLScehma)
                 RDFlib
                 SPARQLWrapper
                                                                                                                        Ontologies             Rules
                 Py-openmath                                                                                             (OWL)               (RuleML)
                                                                                           Jena Ontology API

                                                                                            External data processing applications
                                                                                            (R, Maxima, Mathematica, etc.)


                         Fig. 1. Concept of the prototype system for materials data processing

   Fig. 2 shows examples of metadata description for experimental data (a) and equa-
tion (b) in RDF. In current sample database, tags are selected from Dublin Core tag
set defined in order to describe metadata [17], but there are many tag sets which de-
fined to represent data meanings and any of them can be added into these RDF data
anytime.
   Experimental datasets and equation bodies are divided from RDF metadata file.
RDF file contains URI’s (Uniform Resource Identifier) indicate datasets and equa-
tions, since such files may have written in different data formats like XML Schema
and OpenMath. Data retrieval requires two-steps, at first, find a RDF description by
SPARQL and second, traverse the URI which is indicated by  tags.
   Vocabularies used in database, property names, material names, units for measured
values and other keywords are selected from extended Materials Ontology [18]. It
intended to realize uniform data retrieval on heterogeneous data resources, in this
case, experimental data and equation library stored in different RDF documents. In
current prototype, words are selected manually from the ontology as a common vo-
cabulary.
                                                                            
 
   xmlns:mi="http://www.codata.jp:8080/mi/mat-ontology.owl#">                                     
  
                                                                                                   Creep Equation Norton
   Mod.9Cr-1Mo MgT Creep Test                                                 mi:Norton
   creep                                                          Creep Equation Norton
   mi:Creep_Test                                                          mi:Norton
   Mod.9Cr-1Mo MgA HAZ Creep Test
   mi:MgA_HAZ                                                                   http://www.codata.jp:8080/mi/creep.norton.openmath.xml

   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.550.200.xml           http://dx.doi.org/*****
   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.550.190.xml
   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.550.170.xml          
   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.600.140.xml         
   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.600.120.xml
   http://www.codata.jp:8080/mi/9cr.mga.haz.creep.600.100.xml



   Fig. 2. Metadata description in RDF for (a) experimental data and (b) constitution
         equation. Data and equations are stored in XML files pointed by URI’s




                                                                                           338
3      An Example Materials Performance Analysis Workflow
       in Python

One of the typical materials data processing workflow, creep data analysis is dis-
played in Fig. 3. Workflows can be written in Python scripting language in the proto-
type, it provides quite flexible and extensible description. 1st, relevant creep experi-
mental data is retrieved from database with SPARQL. Results are obtained in XML
documents and they are transformed into appropriate format for further processing by
the XPath functions of Python XML processing module. XML data format stored in
database is defined in this project locally, but it should be standardized for test meth-
od or property in XML Schema.
   2nd, appropriate equation, in this case Norton equation, constitution equation for
creep behavior is selected by its metadata written in RDF. The metadata contains a
URI which points semantic representation of the equation in OpenMath. It can be
parsed and converted into the corresponding input format required by specified data
processing package, e.g. R, SciPy and other packages which is integrated to Sage-
Math.
   In the package, non-linear least square method is applied to the equation with the
retrieved experimental data set. Obtained parameter values, in this case A and n, are
written into RDF format, added appropriate metadata, e.g. link to corresponding ex-
perimental data, equation and version of software package, and stored into the data-
base for further utilization in MI software modules.
   This workflow can be stored as Python script and also, all functions can be used in
interactive programming environment Jupyter notebook. This script has properly
worked and proved the extensibility and flexibility of this system.


4      Discussions

There are many trials to develop ontology and integrate data with ontology [19–22].
Ontology can be used a fundamental dictionary for data integration. But in order to
integrate heterogeneous information resources, all description of these resources
should be based on common ontology or be mapped to the correspondence of ontolo-
gy. This work is done manually, it requires continuous efforts to standardize and dis-
seminate ontology, and also support system to select vocabulary with ontology rea-
soner.
    Materials ontology has been extended to contain some concepts which relate to
creep performance evaluation. In this prototype, ontology written in OWL can be
accessed via Apache/Jena API, we are now testing utilization of reasoner in data re-
trieval and rule based data analysis with this functionality.




                                          339
                                              
                                              
                                               
                                                Mod.9Cr-1Mo MgT
                                                
                                               
                                               
                                               
                                                Creep Test
                                                600
                                                140
                                               
                                               
                                                2550.2
                                                32.1
                                                89                               temp(C) stress(Mpa)
                                                1.48e-5                       ruputure_time(h)
                                                                                                   creep_rate(1/h)
                                                                                                 550 200 10432.9 3.58e-6
                                                                                                550 190 18514.7 1.44e-6
                                                                                                                       550 170 47104.5 3.46e-7
                                                   0 0.001328527                                                   600 160 501.4 9.65e-5

               Search creep test
                                                   0.000555555 0.001328527                                         600 140 2550.2 1.48e-5
                                                   0.001111111 0.001345451                                         600 130 6036.8 5.33e-6
                                                   0.001388888 0.001345451                                         600 110 19907.0 9.60e-7

               experimental data (SPARQL)
                                                   0.001666666 0.001353913                                         600 100 40307.4 5.48e-7
                                                   0.002222222 0.001362375                                         600 90 73960.9 2.83e-7
                                                                                                                          650 90 928.0 3.50e-5
                                                                                                                          650 80 2726.1 1.18e-5
                                                                                                                          650 70 8385.6 4.75e-6

               XML data transformation by XPATH                                                                           650 50 60181.2 6.19e-7



                                                                      
                                                                      
                                                                       
                                                                        
                                                                        
                                                                        


               Search creep constitution
                                                                        
                                                                        

                                               =A                      
                                                                       

               equation (SPARQL)                                        
                                                                         
                                                                          Mod.9Cr-1Mo MgT BM Creep Eq Norton
                                                                         
                                                                         
                                                                          mi:Norton
                                                                         
                                                                         
                                                                          mi:MgT_BM
                                                                         
                                                                         
                                                                          3.80542e-27
                                                                         
                                                                         
                                                                          600

               Parse OpenMath equation                                   
                                                                        




               Parameter fitting by     A = 3.80542e-27
                                                                                                     0.35

                                                                                                      0.3


               data analysis system     n = 10.0781
                                                                                                                  MgT BM 600C 140MPa
                                                                                                     0.25

                                                                                                      0.2
                                                                                            Strain




                                                                                                     0.15

                                                                                                      0.1

                                                                                                     0.05

                                                                                                       0
                                                                                                            0   500   1000     1500      2000    2500      3000

                Visualization/Validation                                                                                      Time (h)




                Store as a new parameter set (SPARQL)


         Input parameter for MI Creep analysis module

    Fig. 3. A creep data processing workflow and corresponding operation on the system


5     Conclusion

Prototype of data analysis environment which has capability integrating heterogene-
ous materials information resources have been developed based on Python program-
ming language and the design have been verified by sample database and script. RDF
metadata representation for materials experimental data and mathematical equations is
defined and tested for further development of MI system.




                                           340
Acknowledgments

This work was supported by Council for Science, Technology and Innovation (CSTI),
Cross-ministerial Strategic Innovation Promotion Program (SIP), Structural Materials
for Innovation” (Funding by JST).


References
  1. Hey, T., Tansley, S., and Tolle, K.M.: The fourth paradigm: data-intensive scientific dis-
    covery (Microsoft Research, Redmond, 1969).
  2. Rumble Jr., J.R.: Integr. Mater. Manuf. Innov. (6), 172–186 (2017).
  3. Austin, T.: Mater. Discov. 3, 1–12 (2016).
  4. Curtarolo, S., Hart, G.L.W., Nardelli, M.B., Mingo, N., Sanvito, S., and Levy, O.: Nature
   Mater. 20, 191–201 (2013).
  5. Broderick, S.R., Santhanam, G.R., and Rajan, K.: JOM 68, 2109–2115 (2016).
  6. Editorial: Scripta Mater. 70, 1–2 (2014).
  7. Ong, S.P., Richards, W.D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D.,
   Chevrier, V.L., Persson, K.A., and Ceder, G.: Comp. Mater. Sci. 68, 314–319 (2013).
  8. Kalidindi, S.R., Niezgoda, S., Landi, G., and Fast, T.: Comp., Mater. and Cont. 17, 103–
   125 (2010).
  9. National Research Council: Integrated Computational Materials Engineering: A Trans-
   formational Discipline for Improved Competitiveness and National Security (The National
   Academies Press, Washington, DC. 2008).
  10. W3C: https://www.w3.org/standards/xml/schema, last accessed 2019/5/5
  11. W3C: https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/, last accessed
   2019/5/5
  12. W3C: https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/, last accessed
   2019/5/5
  13. W3C: https://www.w3.org/TR/2012/REC-owl2-overview-20121211/, last accessed
   2019/5/5
  14. OpenMath Society: https://www.openmath.org/standard/om20-2017-07-22/, last ac-
   cessed 2019/5/5
  15. SageMath, the Sage Mathematics Software System (Version 8.0), The Sage Developers,
   2017, https://www.sagemath.org. last accessed 2019/5/5
  16. Ashino, T. and Yamashita, Y.: Data Sci. J. 11, ASMD17-ASMD21 (2012).
  17. Dublin Core Initiative: http://dublincore.org/, last accessed 2019/5/5
  18. Ashino, T.: Data Sci. J. 9, 54–61 (2010).
  19. Zhao, S. and Qian, Q.: AIP Advances 7, 105325 (2017).
  20. LeBlanc, E., Balduccini, M., and Regli, W.C.: AAAI-14 Workshop (AAAI, Quebec,
   2014) 39–42.
  21. Madalli, D., Sulochana, A., and Singh, A.K.: Data Technol and Appl. 50, 103–117
   (2016).
  22. Remolona, M.F.M., Conway, M.F., Balasubramanian, S., Fan, L., Feng, Z., Gu, T., Kim,
   H., Nirantar, P.M., Panda, S., Ranabothu, N.R., Rastogi, N., and Venkatasubramanian, V.:
   Comp. and Chem. Eng. 107, 49–60 (2017).




                                             341