Towards Executable Knowledge Graph Translation Dongzhuoran Zhou1,2,∗ , Baifan Zhou2 , Zhuoxun Zheng1,4 , Zhipeng Tan1 , Egor V. Kostylev2 and Evgeny Kharlamov1,2 1 Bosch Center for Artificial Intelligence, Germany 2 Department of Informatics, Univeristy of Oslo, Norway 3 Department of Computer Science, Oslo Metropolitan University, Norway Abstract Data analytics is vital in manufacturing for extracting insights from production data and optimising production processes. Semantic technologies including knowledge graphs (KG) proved to be beneficial for addressing challenges of transparency and explainability of analytics by offering standardised means to describe manufacturing domains, data, analytical tasks and solutions. In this work we discuss executable KGs for industrial analytics; they can be “translated” (i.e. transformed) to executable data pipelines in a reusable and modularised fashion. In particular, we discuss how to capture analytical solutions in the form of data pipelines as KGs, and how to translate such KGs to executable data pipelines. The poster presents our framework, implementation, and preliminary industrial evaluation. Keywords knowledge graph, welding monitoring, machine learning, industrial application analytics, 1. Introduction Data analytics is vital in manufacturing for extracting insights from production data and optimising production processes. Semantic technologies including knowledge graphs (KG) proved to be beneficial for challenges of transparency and explainability [1] of analytics [2] by offering standardised means [3, 4] to describe manufacturing domains [5, 6], data analytical tasks and solutions, as well as robot positioning controlling [7, 8]. In particular, KGs allow to represent executable data analytical pipelines with standardized and formal description to represent the steps in the data pipeline [9]. This opens the door for KG based verification, reasoning, and optimisation, data construction [10], data mining [11, 12, 13] in manufacturing [14]. Consider an industrial scenario, where a multi-disciplinary team including engineers, data scientists, managers work together on quality prediction with ML in car industry [15]. Our project experience reals that the experts with distinct background spent excessive time on discussion but found out they misunderstood the problem and the ML solutions. After that, we tried to use KG as a medium for communication, see an example ML pipeline KG in Fig. 1. It takes TimeSeries and SingleFeatures as the input data, and does LRRegression to predict the Q-Value. The users can simply change the input data, output data, and method of the pipeline, by changing the named individuals, e.g., the users can delete TimeSeries if they do not have the sensor curves (time series) in their data, because the sensor curves are costly to collect. The users can also change the ML method from LRRegression to MLP (multilayer perceptron). Hangzhou’22: The 21st International Semantic Web Conference, October 23–27, 2022, Hangzhou, China ∗ Corresponding author. Envelope-Open dongzhuoran.zhou@de.bosch.com (D. Zhou) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: MLPipeline, input: time series, single features; relies on: linear regression (LRMethod). In this work we study how KGs can facilitate analytics by addressing two issues: how to encode data pipelines as KGs – we refer to such KGs as executable KGs – and how to automatically compute executable pipelines from executable KGs. Here the latter should ensure the correctness of the analytical methods; the desired order of tasks within executable pipelines; the correct number of input and output and corresponding designed dimensions. Moreover, such computation should support typical executable pipelines for visual and statistical analytics [16, 17]. Finally, such executable pipelines should be suitable for large-scale deployment [18]. In this poster paper we exemplify our solution within the domain of quality monitoring in automatic manufacturing. In particular, in this poster we present the following. We present our executable Knowledge Graph framework, its verification, translation and execution methods; we discuss implementation of our solution, and present its preliminary evaluation with Bosch manufacturing data. The experimental results show that our proposed approach is promising in coverage and scalability aspects. This poster accompanies our In-Use track paper accepted at ISWC’22 [19] and gives significant extension on technical details of KG translation. 2. Our Approach Executable Knowledge Graph Framework. We propose framework for executable KG (ExeKG) that represents ML solutions [20] for solving ML questions. Framework supports ExeKG to be translated to executable scripts and modularised in reusable and modularised fashion. We first define data, methods and tasks in this framework. Data 𝒟 is a set of facts, statistics, or items of information, it can be in forms such as numerals, diagrams or strings organised in different structures, typically relational tables or RDF database, etc. A Method ℱ is a function in form of language-dependent script (such as in C++ or Python). A method takes some data which fulfils certain Constraints 𝒞ℱ as input and can output specific data. Formally, 𝒟𝑜𝑢𝑡 = ℱ (𝒟𝑖𝑛 ), if 𝒞ℱ (𝒟𝑖𝑛 ) = 𝑇 𝑟𝑢𝑒. A Task 𝒯 is the process of invoking a method by feeding it with some data that meets certain Constraints, and by doing so to obtain some other data. Formally, 𝒯 ⟨𝒟𝑖𝑛 , ℱ ⟩ = ℱ (𝒟𝑖𝑛 ) = 𝒟𝑜𝑢𝑡 , if 𝒞ℱ (𝒟𝑖𝑛 ) = 𝑇 𝑟𝑢𝑒. We call each single 𝑇 𝑎𝑠𝑘 as Atomic task. Some tasks have methods which are unified, while other more complex tasks can not solved by invoking a single integrated method while can be unfolded into a sequence of tasks where each task is a part of the complex one. We refer to the complex tasks as pipelines 𝒯𝑝 . Formally, a pipeline 𝒯𝑝 with input data 𝒟𝑖𝑛 to get 𝒟𝑜𝑢𝑡 , expressed as 𝒯𝑝 ⟨𝒟𝑖𝑛 , ℱ ⟩ = 𝒟𝑜𝑢𝑡 can be unfolded in the sequence {𝒯1 , 𝒯2 , ..., 𝒯𝑛 }, where: 𝒯1 ⟨𝒟𝑖𝑛1 , ℱ1 ⟩ = 𝒟𝑜𝑢𝑡1 , 𝒟𝑖𝑛1 ⊂ 𝒟𝑖𝑛 , 𝒞ℱ1 (𝒟𝑖𝑛1 ) = 𝑇 𝑟𝑢𝑒; ... (1) ̇ 𝒯𝑛 ⟨𝒟𝑖𝑛𝑛 , ℱ𝑛 ⟩ = 𝒟𝑜𝑢𝑡𝑛 , 𝒟𝑖𝑛𝑛 ⊂ ⋃𝑖∈{1,...𝑛−1} 𝒟𝑜𝑢𝑡𝑖 ∪ 𝒟𝑖𝑛 , 𝒞ℱ𝑛 (𝒟𝑖𝑛𝑛 ) = 𝑇 𝑟𝑢𝑒 ̇ ̇ ⟶ 𝒟𝑜𝑢𝑡 ∈ ⋃𝑖∈{1,...,𝑛} 𝒟𝑜𝑢𝑡𝑖 , 𝒞ℱ = ⋂𝑖∈{1,...,𝑛} 𝒞ℱ𝑖 (𝒟𝑖𝑛𝑖 ). (2) Table 2 Categories of the executable KGs (ExeKG). The structure refers to whether there is only one sequential data pipeline in the executable KG or there are multiple parallel data pipelines. Atomic task refers to tasks that cannot be decomposed to smaller tasks, while Pipeline tasks are comprised from pipelines of multiple atomic tasks. Multiple input/output specifies whether the tasks in the data pipeline can take multiple input/ouput or not. Complexity Type Structure Atomic Task/Pipeline Task Multiple input/output Linear ExeKG Sequential Only Atomic No Multilinear ExeKG Sequential Only Atomic Yes Integrated ExeKG Sequential Integrated Pipeline Yes or No Parallel ExeKG Parallel Only Atomic Yes Parallel Integrated ExeKG Parallel Integrated Pipeline Yes Verification. We use Boolean query (4) and the axioms in KG to offer the correct Constraints of translated executable data analytics. A DataEntity is the class for a concrete dataset or a feature. An example can be Every Task has at least one output data, which is a DataEntity (3-4): ∀𝑥.task(𝑥) → ∃𝑦(hasOutput(𝑥, 𝑦) ∧ DataEntity(𝑦)) (3) QUERY ∶ 𝑄(𝑥) ← Task(𝑥) ∧ ¬∃𝑦.(hasOutput(𝑥, 𝑦) ∧ DataEntity(𝑦)) (4) Translation and Execution. The Table 1: Task complexity, categories and coverage. translation can be discussed with KG Type Structure Avg. #Atomic Tasks Coverage two structures of executable KGs: Visu- Linear 5 to 10 100% 1) Sequential: here each exe- alKG Multilinear 5 to 10 85% cutable KG is in the form of a Pipeline, Linear 1 to 5 100% which consists of a series of Tasks Multilinear 5 to 10 95% StatsKG Parallel 10 to 20 90% of sequential structures connected Integrated 10 to 20 80% with hasNextTask. Thus, the trans- Integrated More than 20 80% MLKG lation of an executable KG invokes Parallel Integrated More than 20 80% the Python function scripts with the inputs/outputs and parameters given by DataEntity and datatype properties of KGs, according to the order defined by hasNextTask. 2) Parallel: In the case of merging two parallel structures, the translator will search preceding dependency with hasNextTask, until no preceding Task is found. Implementation. We implemented a system for executable KG translation [21] with three functional modules: 1) Databases including the relational database and its APIs and RDF database, 2) Analytics module and 3) KG processing module. Fig. 2a shows the structure of the system. The Databases are responsible for storing the welding data and executable KGs. Its APIs handles the loading, formatting, filtering, padding and merging of different data subsets. The Analytics module stores and provides interface for all analytical methods of Visual KG, StatsKG and ML KG in Python scripts. The KG processing module performs verification of the executable KGs, translates the executable KGs to executable pipelines and executes these pipelines by connecting the analytical methods stored in the Analytics module. 3. Evaluation and Conclusion Transparency and Coverage Evaluation. Theoretical discussion: We formulate visual, statistical and ML analytics in the form of Eq. 1-3. These three forms provide a way to describe analytics tasks in a general and straightforward way, which eases the understanding of the tasks, since the users only need to understand the description once and then they can 400.0 370.3 Linear ExeKG Multilinear ExeKG Load Data/KG KG Processing Connect Integrated ExeKG Parallel ExeKG Module Python Running Time (s) 300.0 261.4 Parallel Integrated ExeKG Script 214.6 204.5 API 200.0 158.3 131.0 123.5 Database Analytical 82.1 82.7 KG RDF Relational 100.0 Module 49.457.6 31.4 24.3 34.8 Database Database a 9.7 b 0.0 50 100 150 # Pipelines Figure 2: (a) Implementation architecture (b) Scalability evaluation. understand similar data analytical pipelines described in this way. Thus, our approach is a step towards more transparent way that cover cases of data analytical pipelines described with Eq. 1-3. Empirical evaluation: We organised extensive workshops with the ML and non-ML experts. After discussion, we categorised most tasks of visual, statistical and ML analytics encountered in our project in groups (see Table 1), and give the coverage percentage according to our empirical cases. Observe, for all of the cases the coverage is above 80%, for some of the cases even above 90%. Besides, the users also gave their subjective evaluation on transparency with questionnaires, where they answered questions such as “I found the Executable KGs make data analytics easier to understand ” and gave scores ranging 1-5 (ranging for disagree to agree). The average score was 4.28 ± 0.47 (mean ± standardeviation) which shows good transparency. Scalability Evaluation. We evaluate the scalability of our approach by the running time of translation and execution of executable KGs with different complexity type (Fig. 2b). Data Description. To have controllable scope, we tested these executable KGs on a sample welding production dataset collected from a German factory. The dataset is in relational tables form after integration, containing 4585 welding operation records, 2 welding programs, performed by 1 welding machine and deals with 2 types of car bodies. Results and Discussion. Fig. 2b demonstrates that our system scales well since it takes limited time to translate executable KGs to scripts and execute scripts. On most right hand side, we see that the translation and execution of most complex executable KGs, namely parallel integrated executable KG, only takes 6 minutes for 150 KGs, on the given data, which shows good scalability. Conclusion and Outlook. In this poster we present our ongoing research of representing data analytical pipelines in KGs and transformation (also called “translation”) of such KGs in executable analytical pipelines. We discussed framework, verification, translation and execution with our scope of welding monitoring with a Bosch case and evaluated our approach with real industrial data and users from Bosch case, which shows promising results. In the future, we plan to generalise our approach to more cases and to host the system regularly on the Bosch environment and constantly collect more user feed-backs. We will also study more technical details such as expressivity and limitations in theory and practice, and compare with other similar work such as Yahoo! Pipes, DAGs in Spark, Tez, the PROV-O ontology. Acknowledgments The work was partially supported by H2020 projects Dome 4.0 (Grant Agreement No. 953163), OntoCommons (Grant Agreement No. 958371), DataCloud (Grant Agreement No. 101016835) and the SIRIUS Centre, Norwegian Research Council project number 237898. References [1] B. Mahesh, Machine learning algorithms-a review, IJSR 9 (2020) 381–386. [2] Z. Zheng, et al., Executable knowledge graph for transparent machine learning in welding monitoring at bosch, in: CIKM, 2022. [3] B. Motik, et al., OWL 2 web ontology language profiles, 2012. URL: https://www.w3.org/ TR/owl2-profiles/. [4] A. Salatino, et al., The computer science ontology: a large-scale taxonomy of research areas, in: International Semantic Web Conference, 2018, pp. 187–205. [5] J. Davis, et al., Smart manufacturing, manufacturing intelligence and demand-dynamic performance, Computers & Chemical Engineering 47 (2012) 145–156. [6] B. Zhou, et al., Exploiting the values of data: A holistic semantification approach at Bosch, in: ESWC (Demos/Industry), Springer, 2022. [7] C. Naab, et al., Application of the unscented kalman filter in position estimation a case study on a robot for precise positioning, RAS 147 (2022) 103904. [8] O. Celik, et al., Specializing versatile skill libraries using local mixture of experts, in: CRL, PMLR, 2022, pp. 1423–1433. [9] M. Yahya, et al., Towards generalized welding ontology in line with ISO and knowledge graph construction, in: ESWC (Posters & Demos), 2022. [10] D. Zhou, et al., Towards Ontology Reshaping for KG Generation With User-In-The-Loop: Applied to Bosch Welding, in: IJCKG, 2021. [11] B. Zhou, et al., Scaling Usability of ML Analytics With Knowledge Graphs: Exemplified with A Bosch Welding Case, in: IJCKG, 2021. [12] D. Zhou, et al., Enhancing knowledge graph generation with ontology reshaping–bosch case, ESWC (Demos/Industry), Springer (2022). [13] K. K. Breitman, et al., Ontology in computer science, Semantic Web: Concepts, Technolo- gies and Applications (2007) 17–34. [14] D. Zhou, et al., ScheRe: Schema Reshaping for Enhancing Knowledge Graph Construction, in: CIKM, 2022. [15] B. Zhou, Y. Svetashova, A. Gusmao, A. Soylu, G. Cheng, R. Mikut, A. Waaler, E. Kharlamov, SemML: Facilitating development of ML models for condition monitoring with semantics, Journal of Web Semantics 71 (2021) 100664. [16] B. Zhou, et al., Knowledge graph-based semantic system for visual analytics in automatic manufacturing, in: ISWC, 2022. [17] Z. Zheng, et al., Towards a visualisation ontology for data analysis in industrial applications, in: ESWC, 2022. [18] D. Zhou, et al., Ontology reshaping for knowledge graph construction: Applied on bosch welding case, in: ISWC, 2022. [19] Z. Zheng, et al., Executable Knowledge Graphs for Machine Learning: A Bosch Case of Welding Monitoring, in: ISWC, Springer, 2022. [20] P. M. LaCasse, et al., A Survey of Feature Set Reduction Approaches for Predictive Analytics Models in the Connected Manufacturing Enterprise, Applied Sciences 9 (2019). [21] Z. Zheng, et al., ExeKG: Executable Knowledge Graph System for User-friendly Data Analytics, in: CIKM, 2022.