Empowering Industry Professionals with Machine Learning through Knowledge Graphs Antonis Klironomos1,2 , Gad-Elrab Mohamed1 and Evgeny Kharlamov1,3 1 Bosch Center for Artificial Intelligence, Germany 2 University of Mannheim, Germany 3 University of Oslo, Norway Abstract The application of machine learning (ML) has become increasingly prevalent in various industries, offering valuable insights and predictive capabilities. However, the adoption of ML by domain experts, who possess deep industry-specific knowledge but may lack technical expertise, presents unique challenges. This paper explores strategies for scaling out the usage of ML to industry professionals, enabling them to leverage the power of ML in their respective domains. We discuss a comprehensive user-friendly ML system with an interface for democratizing ML within industry domains. The system includes automatic feature engineering through ontologies, and simplifying ML pipeline creation using knowledge graphs (KGs). We also present real-world use cases supported by user study results. Keywords machine learning, knowledge graphs, industry 1. Introduction The adoption of machine learning (ML) in various industries has revolutionized the way businesses op- erate, offering valuable insights and predictive capabilities that were previously unattainable. However, the prerequisites and complexity of developing ML pipelines pose a barrier for domain experts who want to use ML. To overcome this barrier, the paper shows a comprehensive user-friendly system with a graphical user interface (GUI) designed to cater to industry professionals. This system includes automatic feature engineering through ontologies, which allows domain experts to leverage their industry-specific knowledge to create and extract relevant features for ML models. Additionally, the paper addresses the system’s utilization of knowledge graphs (KGs) to simplify the creation of ML pipelines, making it easier for domain experts to build ML models without requiring extensive technical expertise. To show the system’s impact, the paper presents two real-world Bosch use cases that demonstrate the successful integration of ML into industry domains, supported by user study results. By showcasing the practical applications of ML in the industry, the paper aims to highlight the potential benefits and opportunities that ML can offer to domain experts. Ultimately, the goal is to democratize ML within industry domains, empowering domain experts to leverage ML for improved decision-making, enhanced productivity, and innovative solutions to industry-specific challenges. 2. Automated Feature Engineering using Ontologies A crucial step in applying ML involves feature engineering, which typically necessitates domain knowledge repeatedly used for similar data. Integrating domain knowledge into ontologies can reduce repetition and help perform automatic ML algorithm selection. This section describes the phases and benefits of our Semantically-Enhanced Feature Engineering (SemFE) tool [1]. First International Workshop on Scaling Knowledge Graphs for Industry, co-located with 20th International Conference on Semantic Systems (SEMANTICS) - Amsterdam, Sept. 17–19, 2024 $ antonis.klironomos@de.bosch.com (A. Klironomos); mohamed.gad-elrab@de.bosch.com (G. Mohamed); evgeny.kharlamov@de.bosch.com (E. Kharlamov) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Semantic Data Preparation. The process of semantically enhancing feature engineering involves a data preparation phase. This begins with integrating raw data sources, where the Domain Knowledge Annotator maps raw feature names to the terms of domain ontologies (DO) in a semi-automated fashion. The resulting Data-to-DO mapping serves as the foundation for subsequent automated processes. The Feature Group (FG) Annotator then utilizes reasoning to infer ML feature groups for each source from the Data-to-DO mapping, generating the DO-to-FG mapping. This automated step allows subsequent modules to abstract from the concrete features and generically work with feature groups. Semantic Feature Processing. Following the data preparation phase, the process moves into semantic feature processing, which involves retrieving feature processing algorithms (FPAlg) for each feature group from the ML Ontology. The Feature Processing Algorithm Selector retrieves these algorithms, each with varying computational complexity, depending on the feature groups. The default algorithms are specified, and for specific feature groups, users can manually modify the default algorithms. The Processed Feature Groups Annotator then uses the FPAlg-to-FPG mapping to infer the feature processed groups (FPG) for the specified feature groups and chosen feature processing algorithms. This automated process generates names for new features and applies the feature processing algorithms to compute these new features. ML Modeling and Implementation. The final step involves ML modeling, where the ML Algorithm Se- lector module selects ML algorithms with different feature settings, ML methods, and hyper-parameters based on the FPAlg-to-FPG mapping. After the ML model training and testing, the formal representation of domain knowledge, ML feature processing strategies, and the algorithms and their application order enable the execution of these workflows with minimal adjustments and adaptations to new datasets. SemFE has been implemented as an extension of the ML pipeline, incorporating several semantic modules and communicating with a triple store and reasoner to store ontologies and retrieve inference results. 3. Convenient ML Pipeline Creation via Knowledge Graphs The aforementioned automated feature engineering tool (i.e. SemFE) is complemented by our Semantically-Enhanced Machine Learning (SemML) tool [2] used for conveniently creating ML pipelines. The structured knowledge representation provided by ML-related ontologies facilitates the efficient construction of executable KGs that represent ML pipelines [3, 4]. This section describes the relevant ontologies and the process of creating machine learning pipelines from the user’s perspective. Semantic Artifacts for Executable Knowledge Graphs. The tool includes various ontologies, such as the upper domain ontology, manufacturing ontology, and domain ontologies for specific manufacturing domains at Bosch. The upper domain ontology contains axioms, classes, object properties, and datatype properties to model the general knowledge of discrete manufacturing processes. Domain experts create the domain ontologies, which consist of sub-classes of the upper domain ontology’s classes. The tool also includes the data science ontology, which formalizes the general knowledge of data science activities, and task ontologies for visualization, statistical analytics, and machine learning analytics. These task ontologies describe common methods, allowed data structures, and the organization of tasks in pipelines. Executable Knowledge Graph Construction. The construction of pipelines as executable KGs in the tool can be done via GUI. All user’s actions reflect changes in KGs and the task options are based on the KG structure. Users can create pipelines from scratch, and modify, or integrate existing ones. The creation of pipelines involves selecting input data and tasks (i.e. steps) based on the respective ontologies. The modification of pipelines can be done by adding, deleting, or changing tasks, while the integration of KGs involves combining the outputs of different pipelines. The translation of executable KGs into code is done with Python, which is used as the language for discussion. 4. Use cases We showcase our system’s benefits through two real-world use cases. We demonstrate SemFE’s capabilities with a welding use case including two experiments and test SemML on real welding data. Use Case 1: Professionals extend SemFE’s domain ontologies. Users were tasked with using SemFE for domain ontology extension (Experiment 1) and data mapping between column names and ontology terms (Experiment 2) [5]. These tasks are important for domain experts to establish a common vocabulary. In Experiment 1, domain experts and data scientists created terms for resistance spot welding (RSW) and hot-staking (HS), with average correctness for applying a template at 93% and making choices of dependencies at 92%. The correlations between user performance and self-reported expertise indicated that domain expertise greatly increased the correctness and efficiency. In Experiment 2, most users correctly mapped column names to newly introduced terms, achieving 100% correctness, with average time spent for each term at about 50 seconds. Similar to Experiment 1, there was the same conclusion for correlations between user performance and self-reported expertise, while experience with mapping tools showed minimal effect on correctness and efficiency. Use Case 2: Welding experts develop ML pipelines using SemML. SemML was deployed on welding data from Bosch to predict the quality of resistance spot welding [4]. A user study involving 28 experts from various fields, including ML, welding, and sensor engineering, was conducted to evaluate the tool’s effectiveness. The study included a series of tasks for visualization, statistics, and ML. The users were asked to complete the tasks with and without using SemML. ML experts explained the tasks to non-ML experts, who then completed the tasks by using technical language or creating, modifying, and merging knowledge graphs through a GUI. The study measured the percentage of tasks completed, completion time, and correctness of answers, and compared the actions taken during the tasks with ground truth to measure correctness. The results showed that most participants had a high completion percentage and correctness when using our tool and needed less time to complete tasks. The tool also improved transparency, usability, and the coverage of tasks, making previously impossible tasks achievable for non-ML experts. 5. Conclusion This paper describes SemFE and SemML, two tools that complement each other to form a semantic-based ML system. The automation of feature engineering by SemFE is followed by SemML’s facilitation of ML pipeline creation. In this system, we harness the power of ontologies and KGs to allow experts in various domains to use ML in their work, without needing ML expertise. We prove the system’s benefits, such as user-friendliness and efficiency, by presenting two real-world use cases. The system is capable of enabling non-ML professionals to define a common vocabulary through ontologies and conveniently create ML pipelines through knowledge graphs. Acknowledgments The Graph Massiviser (GA 101093202) EU project as well as Dome 4.0 (GA 953163) and enRichMyData (GA 101093202) partially supported this work. References [1] B. Zhou, Y. Svetashova, T. Pychynski, I. Baimuratov, A. Soylu, E. Kharlamov, SemFE: Facilitating ML Pipeline Development with Semantics, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, Virtual Event Ireland, 2020, pp. 3489–3492. doi:10.1145/3340531.3417436. [2] B. Zhou, Y. Svetashova, A. Gusmao, A. Soylu, G. Cheng, R. Mikut, A. Waaler, E. Kharlamov, SemML: Facilitating development of ML models for condition monitoring with semantics, Journal of Web Semantics 71 (2021) 100664. doi:10.1016/j.websem.2021.100664. [3] A. Klironomos, B. Zhou, Z. Tan, Z. Zheng, G.-E. Mohamed, H. Paulheim, E. Kharlamov, ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics, in: C. Pesquita, H. Skaf-Molli, V. Efthymiou, S. Kirrane, A. Ngonga, D. Collarana, R. Cerqueira, M. Alam, C. Trojahn, S. Hertling (Eds.), The Semantic Web: ESWC 2023 Satellite Events, volume 13998, Springer Nature Switzerland, Cham, 2023, pp. 123–127. doi:10.1007/978-3-031-43458-7_23. [4] Z. Zheng, B. Zhou, D. Zhou, X. Zheng, G. Cheng, A. Soylu, E. Kharlamov, Executable Knowledge Graphs for Machine Learning: A Bosch Case of Welding Monitoring, in: U. Sattler, A. Hogan, M. Keet, V. Presutti, J. P. A. Almeida, H. Takeda, P. Monnin, G. Pirrò, C. d’Amato (Eds.), The Semantic Web – ISWC 2022, volume 13489, Springer International Publishing, Cham, 2022, pp. 791–809. doi:10.1007/978-3-031-19433-7_45. [5] Y. Svetashova, B. Zhou, T. Pychynski, S. Schmid, Y. Sure-Vetter, R. Mikut, E. Kharlamov, Ontology- Enhanced Machine Learning: A Bosch Use Case of Welding Quality Monitoring, 2020. doi:10. 1007/978-3-030-62466-8_33.