Hitachi Materials Informatics Analytics Platform Assisting Rapid Development Yoshihiro Osakabe, Akinori Asahara, Hidekazu Morita Hitachi Ltd. 1-6-1, Marunouchi, Chiyoda-ku, Tokyo 100-8220, Japan {yoshihiro.osakabe.fj@hitachi.com} Abstract solve their problems because they do not have enough infor- The data science platform for materials developments is matics knowledge, which means that they need the supports demonstrated. Due to the recent great advances in artificial in- of informatics experts (data scientists). Their relation can be telligence, it becomes more realistic that the industrial appli- understood as that between a runner and his escort, thus this cation of materials informatics (MI) which is the data-driven phase can be regarded as an “accompanying phase.” Though approach to discover and investigate materials characteristics. this service style is common, it may remain the possibil- However, it is not quite easy for materials manufacturers to ity that the informatics experts can not exactly understand set up MI analytics environments without any help. There- the characteristics of target materials and obtain the know- fore, we provide the user-friendly cloud-based IT platform hows materials scientists have. This problem will be solved for non-experts of IT enabling materials scientists in R&D if materials scientists can reach analysis results by them- departments to analyze their experimental data effectively for selves without the excessive IT and analytics knowledge. rapid developments. That phase can be understood as a “self-managing phase,” Motivation and the MI analytics services should be shifted to that phase Product developments require significant time and costs to from accompanying phase for scaling up and rapid prototyp- find the optimal combination of ingredients and parame- ing. It suggests the need of the informatics expert alternative ters. Materials Informatics (MI) is an emerging study field and one-stop platform for storing, analyzing data and visu- based on the both informatics and materials science, with alizing analysis results. the goal of greatly reducing the resources and risks required to discover, invest, and deploy new materials (Curtarolo et al. 2013). Recently, artificial intelligence (AI) has improved the MI performance, thus the experimental candidates can be narrowed down without unnecessary trials and errors before its actual experiments to discover or create new materials with yet-to-be realized properties. In fact, US government has invested over $250 million to assist MI projects (Ma- terials Genome Initiative 2011). The Novel Materials Dis- covery Laboratory in EU also opens new oppotunities to in- vestigating MI by delivering analytics tools and open access repository of materials data (NOMAD Laboratory 2015). According to such outreach activities, there has been heavy demands of materials manufacturers for introducing MI- powered methodology into their R&D processes to increase their industrial competitiveness, and the number of startups in MI analytics services is increasing. In figure 1, the concept of this demonstration is illustrated. In many cases, it is difficult for materials scientists to select suitable preprocessing method and effective algorithm to Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel- mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- bining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020). Stanford University, Palo Alto, California, USA, March 23-25, 2020. Use permitted under Creative Commons Figure 1: The concept of MIAP (MI Analytics Platform) License Attribution 4.0 International (CC BY 4.0). Materials Informatics Analytics Platform We have developed an IT platform of MI, called Materials Informatics Analytics Platform (MIAP), for R&D teams of various manufacturing companies. This platform brings together all data into one place to make it easier for re- searchers to access and custom machine learning algorithms by themselves without any additional help of informatics ex- perts. In fact, it includes the functions that support almost every step required for MI analytics. Functions MIAP is a cloud service thus the user interface is accessible via common web browsers. It mainly includes three func- Figure 2: Screen capture of checking learning results tionalities; storing, analyzing and visualizing. In the follow- ing, their details are explained. software (KNIME 2019). Next, user specify the target ma- 1. Storing terial property expected to be improved as an objective vari- In this platform, all input and output data is stored in Post- able, and other properties are set to explainable variables. greSQL database servers. Various file types are accept- After selecting algorithm for modeling and entering the out- able; CSV, Microsoft Excel, NetCDF and so on. Graph- put table name for the current attempt, the learning is started ical user interface (GUI) is utilized to upload and im- by pressing the execute button. These operations are very port data into databases. At the same time, it also receives simple because almost all users have to do is just clicking on SQL queries to manage data tables directly with the im- corresponding tabs. As shown in Figure 2, users can recog- plemented query editor for complecated operations. With nize the results are listed on results view window when the GUI for example, users can define the data type of each calculation is finished. Because the automatically generated column without typing any complicated SQL queries. truth-prediction scattering is shown with common indicators 2. Analyzing to score learning performace such as Root Mean Squared In general, MI problems are interpreted as regression and Error (RMSE) and correlation coefficient, it is possible for classification tasks. Thus, it supports the various well- users to judge whether the learning is succeeded or not. After known machine learning algorithms such as Random For- users can obtain well-trained model via iterational attempts, est (Breiman 2001), Gaussian Process (Rasmussen and they can predict the target material property with candidate Williams 2006), Support Vector Machine (Burges 1998) recipes to narrow down before the actual experiments for and Gradient Boosting (Friedman 2001). It also makes new products. In this way, MIAP assists to find the optimal predictions and optimizations possible. In addition, one recipes of ingredients or parameters, which contributes to of the MIAP unique features is the implementation of the reduce materials development resources. AI-based best practices of efficient methodologies for in- dividual customers, which contributes to reduce their ex- References periment iterations. In most cases, once users have devel- Breiman, L. 2001. Random forests. Machine learning oped their best practices, they can easily and repeatedly 45(1):5–32. apply the same method to new data by themselves. Burges, C. J. 1998. A tutorial on support vector machines for 3. Visualizing pattern recognition. Data Mining and Knowledge Discovery It provides basic visualization tool to plot data in database 2(2):121–167. by selecting target column and graph types (bar, line and Curtarolo, S.; Hart, G. L.; Nardelli, M. B.; Mingo, N.; San- pie graph). To check the learning performance, users only vito, S.; and Levy, O. 2013. The high-throughput high- have to click on automatically generated truth-prediction way to computational materials design. Nature materials scattering graphs. In addition, it provides the original UI 12(3):191. tool derived from a Geospatial Information System (GIS) Friedman, J. H. 2001. Greedy function approximation: a tool that draws animation along with time in 2D and 3D gradient boosting machine. Annals of statistics 1189–1232. graphs. It means that users can see the time evolution of KNIME. 2019. KNIME, https://www.knime.com/. materials properties. Materials Genome Initiative. 2011. In Web- Demonstration site of Materials Genome Initiative (MGI), The usage of MIAP is demonstrated by taking an example of https://obamawhitehouse.archives.gov/mgi. the search for an optimal recipe that improves the material NOMAD Laboratory. 2015. In Website of Novel Ma- properties of a ready-made product. terials Discovery (NOMAD) Laboratory, https://nomad- First, collect data accumulated in the process of making coe.eu/industry/interaction-with-industry. target product. Second, upload them to the MIAP database. Rasmussen, C., and Williams, C. 2006. Gaussian processes MIAP automatically converts files with different formats for machine learning, model selection and adaptation of hy- into a predetermined format using KNIME, the open source perparameters, chapter 5.