=Paper=
{{Paper
|id=Vol-3632/ISWC2023_paper_465
|storemode=property
|title=Literal-Aware Knowledge Graph Embedding for Welding Quality Monitoring
|pdfUrl=https://ceur-ws.org/Vol-3632/ISWC2023_paper_465.pdf
|volume=Vol-3632
|authors=Zhipeng Tan,Zhuoxun Zheng,Antonis Klironomos,Mohamed H. Gad-Elrab,Guohui Xiao,Ahmet Soylu,Evgeny Kharlamov,Baifan Zhou
|dblpUrl=https://dblp.org/rec/conf/semweb/TanZKG0SKZ23
}}
==Literal-Aware Knowledge Graph Embedding for Welding Quality Monitoring==
Literal-Aware Knowledge Graph Embedding for Welding Quality Monitoring Zhipeng Tan1,2,* , Zhuoxun Zheng1,3 , Antonis Klironomos1,4 , Muhammed Gad1 , Guohui Xiao5 , Ahmet Soylu6 , Evgeny Kharlamov1,3,* and Baifan Zhou6,3,* 1 Bosch Center for AI, Germany 2 RWTH Aachen, Germany 3 Department of Informatics, University of Oslo, Norway 4 University of Mannheim, Germany 5 University of Bergen, Norway 6 Department of Computer Science, Oslo Metropolitan University, Norway Abstract Recently there has been a series of studies in knowledge graph embedding (KGE), which attempts to learn the embeddings of the entities and relations as numerical vectors and mathematical mappings via machine learning (ML). However, there has been limited research that applies KGE for industrial problems in manufacturing. This paper investigates whether and to what extent KGE can be used for an important problem, that is quality monitoring for welding in manufacturing industry. It is an important process accounting for production of millions of cars annually. The work is in line with our research of data-driven solutions that intends to replace the traditional costly quality monitoring. The paper tackles two challenging questions simultaneously: how large the welding spot diameter is; and to which car body the welded spot belongs to. The problem setting is difficult for traditional ML because there exist a high number of car bodies that should be assigned as class labels. We formulate the problem as link prediction, and experimented popular KGE methods with literals on real industry data, with consideration of literals. This paper accompanies the full paper in in-use track and provides additional discussion on problem formulation, literal handling strategies, and included information in industrial KG construction. 1. Introduction Background and Challenge. Research in knowledge graphs and their industrial applications has attracted increasing attention [1]. Recently there has been a series of studies in knowledge graph embedding (KGE), but there has been limited research that applies KGE for industrial problems in manufacturing. This paper investigates whether and to what extent KGE can be used for an important problem, that is quality monitoring for welding in manufacturing industry. We discuss automated welding, which plays a critical role in the automotive industry for manufacturing high-quality car bodies, with millions of cars produced annually. The welding process generates a vast amount of data, considering the number of welding machines in car production lines and the thousands of spots on each carbody (Fig. 1a). This large amount of data increases the demand of data-driven solutions, which aim to reduce and eventually replace ISWC2023: The 22nd International Semantic Web Conference, November 06–10, 2023, Athens, Greece * Corresponding author. $ zhipeng.tan@rwth-aachen.de (Z. Tan); evgeny.kharlamov@de.bosch.com (E. Kharlamov); baifanz@ifi.uio.no (B. Zhou) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings a ! Question 1: diameter as classes link prediction: owl:Class rdf:type ab spot diameter? Diameter predict tail owl:ObjectProperty Current, 𝑰 <0.1 rdfs:subClassOf regression: voltage, Resitance, predict real number Diameter 0.1-0.2 ? Spot103 Time, discretisation Diameter Power entity creation 0.2-0.3 head predicate tail score etc. Diameter Spot103 rdf:type Diameter 0.1-0.2? 0.65 prediction, ... Spot103 rdf:type Diameter 0.2-0.3? 0.52 label, rank=1 Spot103 rdf:type Diameter 0.3-0.4? 0.34 Moving Question 2: Which carbody parts bc direction Car body carbody part? as entities part3 classification: Car body predict carbody part1 Car body part part label part1 part2 part3 entity creation Car body ? Spot103 part2 ? Welding 𝑫 welding spot head predicate tail score prediction Spot Spot103 belongsTo Carbody part1? 0.76 Spot103 belongsTo Carbody part2? 0.55 label, rank=2 part4 part5 part6 Spot103 belongsTo Carbody part3? 0.43 Figure 1: Two core questions in welding quality monitoring: Question 1 (Q1), how large is the spot diameter? Question 2 (Q2), which car body part does this spot diameter belongs to? conventional destructive, yet extremely expensive and inefficient, methods. Addressing this challenges, two core questions need to be answer here as shown in Fig. 1b and Fig. 1c. First, spot diameter (Q1) is the key quality indicator for evaluating welding quality. Since the diameter must be neither too large nor too small. Second, the carbody part of the spot is important because it is essential to know the percentage of good spots for each car body, but the carbody-spot correspondence information does not exist in a large amount of historical data silos from past welding systems. Currently there are research applying ML and Digital Twin in industry [2], but limitedly solve this challenge especially the two core questions. Our Approach. We investigate KGE for answering the two questions in the automotive industry with production data, and compare with representative classic ML. We first construct KGs from tabular data with special handling on literals; then, we conduct experiments and compare mainstream KGE methods such as TransE, RotatE, AttH with multilayer perceptron (MLP), and compare with variant KGE applied in [3]. This poster paper accompanies our full paper in ISWC 2023 [4]. This paper provides additional discussions on problem formulation, literal handling, and industrial KG construction, which were not present in the full paper. 2. Approach Welding KG construction. A welding KG is constructed from tabular data collected from a German factory. We have used welding-related information, such as time of welding processes, welding machines, welding programs, and welding parameters (e.g., voltage, current, resistance). The constructions are conducted on welding spots and the car body and diameters. We transform the values of the welding data table into entities and the relationships between these entities as edges in the KG. Fig.2b shows the construction of literal entities, which are entities generated from numeric values. The literals are handled as described in the later part. Problem formulation. Fig. 1b and Fig. 1c shows in more details the two research questions of the quality monitoring in the use case and our approach to reformulate them: given the information of the welding spot, we want to predict the carbody of the this welding spot and the diameter of the welding spot. As shown in the Fig. 1, both of the problems are reformulated. The Spot diameter prediction was a regression problem based on the welding data to predict the real values for the diameters size. Due to resolution when measuring the spot diameter, current (I) diameter as classes a b Diameter Diameter Diameter Diameter <0.1 0.1-0.2 0.2-0.3 ... stage2 stage3 stage1 aggregation time literal entities U_mean ? <0.1 hasVoltagemean spot_id I_mean I_1_mean I_2_mean I_3_mean I_mean <0.1 belongsTo Spot103 0.72 0.63 0.80 0.60 hasCurrentmean Spot104 0.57 0.52 0.70 0.49 Spot103 ? Car body part2 discretisation for each feature e ct ed By nn G duc co o K pro t Machine1 rt hasC pa nt1 am I_mean I_mean I_mean dy ne nt2 rbo po gr ompo Ca ro <0.1 0.1-0.2 0.2-0.3 om ne s sP ope ha sC po ha ha rate om dBy nent3 sC ha Program1 Component1 I_mean I_mean I_mean owl:Class 0.3-0.4 0.4-0.5 0.5-0.6 rdf:type Component2 Component3 owl:ObjectProperty ... ... ... rdfs:subClassOf Figure 2: (a) Procedure of literal embedding (b) Partial illustration of the welding KG we discretise the diameters into different diameter classes as classification problem. We then constructed the entities based on the diameter classes and carbody class. The links between welding spots and the diameter classes or the carbody are predicted. Literal handling. We did the following steps for literal embeddings inspired by [5]. The numeric literals of the knowledge graphs are embedded following aggregation, discretisation of features, entity creation, and linking. As shown in the Fig. 2, in the aggregation step, the sensor measured values are aggregated into the mean values of the three stages and the overall mean values in real numbers. Then in the discretisation step, we discretise the real values into different ranges. And then we create entities based on the discretised ranges and link them. 3. Evaluation and Discussion We discuss the experiments and provide additional discussion on the problem formulation, the literal handling and discretisation techniques, and the industrial KG construction. Discussion on problem formulation. We discuss three promising ways: (1) classic ML with MLP; (2) classic KGE with link Table 1: Model performance comparison on answering Q1 and prediction; (3) binary triple classifi- Q2. Bold: best results. Underlined: second best. cation inspired by [3]. MLP TransE RotatE AttH MLP. In manufacturing, quality Acc(Hits@1) 0.39 0.42 0.25 0.31 monitoring aims at predicting diam- Q1 MRR - 0.65 0.49 0.57 eters and carbody. These questions nrmse 0.05 0.06 0.08 0.06 are formulated as classification prob- Acc(Hits@1) 0.61 0.64 0.52 0.53 lems, where carbodies and diame- Q2 MRR - 0.77 0.69 0.70 ters are formulated as the predicted Hits@Groupby3 - 0.85 0.81 0.79 classes. This formulation is verified by the MLP model. This model is proven to be most efficient and have best performance only in nrmse in Q1, but provides inadequate performance for Q2, see Tab. 1. KGE. Since there can be over hundreds and thousands of carbodies, Q2 is not very suitable as a classic ML problem. We adapt the problem and reformulate it as link prediction (Fig. 2), where the correct link between a welding spot and the correct carbody should have the highest score among all the other carbodies. Similar principles also hold for diameter predictions. This formulation is verified by the KGE model. We use metrics, including Acc, MRR, and Hits@Groupby3. We introduce a new metric Hits@Groupby3, because no KGE model deliver satisfactory Acc. We thus adopt the adaptation to relax the metric, to test the accuracy based on the group of 3 carbodies. The results, see Tab. 1, show TransE delivers the best results for both Q1 and Q2 on Acc, MRR, and Hits@Groupby3. KGE-MLP. The third possible problem formulation is inspired by [3], where the link is formulated as binary classification with the output score between 0 and 1, where a value closer to 1 indicates the link exists. The score is compared with all other potential diameters or carbodies, and the predicted link the highest score is selected as the prediction. This formulation is verified by the KGE-MLP model. This model proved to be not good on Acc, MRR, and other metrics compared with the MLP and KGE models in our welding quality monitoring use case, but is State-of-the-Art (SotA) model on other use case [3]. So we still choose this model in our paper. Discussion on literal landling. In Table 2: Best KGE model compared with KGE-MLP models. SotA research, there exist mainly The KGE-MLP models are marked with*. Bold: best. two ways to handle literals: discreti- sation (KGA) [5] and literal as em- Metric TransE TransE* DistMult* HolE* bedding vector (LiteralE) [6]. Ac- Acc(Hits@1) 0.42 0.17 0.22 0.21 cording to the experiments in [5], Q1 MRR 0.65 0.45 0.48 0.48 nrmse 0.06 0.11 0.09 0.10 the discretisation with bins meth- Hits@1 0.64 0.34 0.34 0.37 ods yield SotA results with large im- Q2 MRR 0.77 0.48 0.52 0.41 provements on the traditional KGE. Hits@GroupBy3 0.85 0.45 0.46 0.52 We also consider [6] not suitable, be- cause it requires fixed embedding size for fixed number of literals, while in real application the number of literals can vary a lot. This paper chooses the discretisation with bins as in [5] method to encode the literal information. Other discretisation methods are compared and discussed in the next part. Discussion on discretisation strategy. There are different discretisation approaches discussed in the paper, including the single setting without overlapping, overlapping and hierarchical settings. There are also two different bins creation methods based on frequencey or the fixed value. According to the experiment results the single bin with fixed value is simple and the performance differences between different discretisation strateties are insignificant, so we choose this method on the welding quality monitoring. Discussion on included information in industrial KG construction. In the KG construction step, we also notice it is important to consider the impact of the tabular columns on the KGE performance. The number of columns in production data are very large (over 200), but only few information are important based on domain knowledge, and most of them are meta-settings, or overlapping information that do not contribute to the KGE performance. This is a common problem for industrial KGs, especially for industries such as manufacturing, mining, chemistry, where large amount of information is collected, but only few are essential for the quality monitoring.. Thus, we need to choose the least amount of columns that represent the most important information for welding. This selection process is done by iteratively updating the features and evaluating the performance. Our observations are, the most important information are (1) those that are crucial to the graph structure of the KG, such as welding machine, welding program, the materials and thickness of carbody, that have impact define the graph structure; (2) sensor values (literals in KG). 4. Conclusion and Outlook This poster paper presents an extended abstract of our full paper [4] and provides additional discussions. The research is under the under the umbrella of Neuro-Symbolic AI for Industry 4.0 at Bosch. We aim at enhancing manufacturing technology with both symbolic AI [7] (such as semantic technologies) for improving transparency [1], and ML for prediction power. We will further improve the performance of the KG embedding method and develop other complementary technologies, such as ontologies [8, 9], ontology-based data access, etc. Acknowledgements. The work was partially supported by EU projects Dome 4.0 (953163), OntoCommons (958371), DataCloud (101016835), Graph Massiviser (101093202), EnRichMyData (101093202), and SMARTEDGE (101092908) and the Norwegian Research Council funded projects (237898, 308817). References [1] Z. Zheng, B. Zhou, D. Zhou, A. Soylu, E. Kharlamov, Executable knowledge graph for transparent machine learning in welding monitoring at bosch, in: CIKM, 2022, pp. 5102– 5103. [2] Z. Huang, M. Fey, C. Liu, E. Beysel, X. Xu, C. Brecher, Hybrid learning-based digital twin for manufacturing process: Modeling framework and implementation, Robotics and Computer-Integrated Manufacturing 82 (2023) 102545. [3] E. B. Myklebust, E. Jimenez-Ruiz, J. Chen, R. Wolf, K. E. Tollefsen, Knowledge graph embedding for ecotoxicological effect prediction, in: ISWC, 2019. [4] Z. Tan, B. Zhou, Z. Zheng, O. Savkovic, Z. Huang, I. G. Gonzalez, A. Soylu, E. Kharlamov, Literal-aware KGE for welding quality monitoring, in: ISWC, 2023. [5] J. Wang, F. Ilievski, P. A. Szekely, K.-T. Yao, Augmenting knowledge graphs for better link prediction, in: IJCAI, 2022. [6] A. Kristiadi, M. A. Khan, D. Lukovnikov, J. Lehmann, A. Fischer, Incorporating literals into knowledge graph embeddings, in: ISWC, 2019. [7] D. Rincon-Yanez, M. H. Gad-Elrab, D. Stepanova, K. T. Tran, C. C. Xuan, B. Zhou, E. Karlamov, Addressing the scalability bottleneck of semantic technologies at bosch, ESWC Industry (2023). [8] B. Zhou, Z. Zheng, D. Zhou, Z. Tan, O. Savković, H. Yang, Y. Zhang, E. Kharlamov, Knowledge graph-based semantic system for visual analytics in automatic manufacturing, ISWC, 2022. [9] Z. Zheng, B. Zhou, D. Zhou, A. Q. Khan, A. Soylu, E. Kharlamov, Towards a statistic ontology for data analysis in smart manufacturing, in: ISWC Posters, volume 3254, 2022.