=Paper= {{Paper |id=Vol-3632/ISWC2023_paper_510 |storemode=property |title=Semantic Cloud System for Scaling Data Science Solutions for Welding at Bosch |pdfUrl=https://ceur-ws.org/Vol-3632/ISWC2023_paper_510.pdf |volume=Vol-3632 |authors=Zhuoxun Zheng,Baifan Zhou,Zhipeng Tan,Ognjen Savkovic,Diego Rincon-Yanez,Nikolay Nikolov,Dumitru Roman,Ahmet Soylu,Evgeny Kharlamov |dblpUrl=https://dblp.org/rec/conf/semweb/ZhengZTSRNRSK23 }} ==Semantic Cloud System for Scaling Data Science Solutions for Welding at Bosch== https://ceur-ws.org/Vol-3632/ISWC2023_paper_510.pdf
                                Semantic Cloud System for Scaling Data Science
                                Solutions for Welding at Bosch
                                Zhuoxun Zheng1,2 , Baifan Zhou3,2 , Zhipeng Tan1,4 , Ognjen Savkovic6 ,
                                Diego Rincon-Yanez1,7 , Nikolay Nikolov5,2 , Dumitru Roman5,3 , Ahmet Soylu3,2 and
                                Evgeny Kharlamov1,2
                                1
                                  Bosch Center for AI, Germany
                                2
                                  Department of Informatics, University of Oslo, Norway
                                3
                                  Department of Computer Science, Oslo Metropolitan University, Norway
                                4
                                  RWTH Aachen University, Germany
                                5
                                  SINTEF AS, Norway
                                6
                                  Free University of Bozen-Bolzano, Italy
                                7
                                  Universidad de Santander, Cucuta, Colombia

                                Background and Challenges. Industry 4.0 focuses on smart factories that rely on IoT tech-
                                nology for automation. This produces massive amounts of production data, increasing the
                                demand for data-driven solutions and cloud technology. Yet, users of these solutions and cloud
                                technology are often not cloud experts, such as domain experts and data scientists (Fig. 1.1).
                                In a standard setting of a data science project, the team requires extensive assistance from
                                cloud experts, whenever they want to deploy solutions or make small changes to their solutions
                                deployed on the cloud. To facilitate the adoption of cloud systems careful planing to balance
                                cost and benefits is required. Scaling data science solutions presents challenges of handling high
                                data volume and enabling a broader users which are non-cloud experts to use cloud systems.
                                SemCloud for Distributed ETL and Use Case. In industry, large amount of data collected
                                from different resources are integrated and analysed in parallel to optimise the following
                                production. Due to the large volume of data, cloud technology is used to enable distributed
                                ETL. Here, cloud configuration plays an important role in achieving optimal performance,
                                which is however non-trivial for non-cloud experts. To address the scalability challenges and
                                democratising cloud systems for more users, we propose SemCloud [1], a semantics-enhanced
                                cloud system, that scales semantic ETL pipeline on the cloud, and allows non-cloud experts to
                                deploy their solutions. We showcase SemCloud in our welding use case.
                                SemCloud for Automated Cloud Configuration. SemCloud achieves optimised cloud
                                adoption [2] for ETL automatically by breaking down the ETL into pipelines of four steps:
                                retrieve, slice, prepare, and store (Fig. 1.2), where data is first retrieved from databases or online
                                streams, and then split into subsets (e.g. each belong to one welding machine) by slice to achieve
                                parallel processing and storage in the following prepare and store. A rough description of the
                                application of SemCloud is as follows: (a) non-cloud experts create knowledge graphs (KG) that
                                represent ETL-Pipelines on a cloud system, where attributes of cloud resource configuration is
                                under-specified; (b) Datalog ML rules execute in three steps, where the rules contain external

                                ISWC2023: The 22nd International Semantic Web Conference, November 6–10, 2023, Athens, Greece
                                                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
 1         Sensor                                                                                          3
                                    ML
           expert                               Welding          Moving      ETL KG Construction
                                   expert                                                             Rule parameter
                                                expert           direction
                                                                                                         learning
                                     !                                                 Datalog ML Rules
                                                             Car body
                                                                                 Graph Extraction rules
                                                             "     Welding
                                                                    Spot           Resource Estimation Rules
                      Data                     Knowledge
 2                   manager                    engineer                            Resource Configuration Rules
                               Prepare
 Retrieve                        Prepare             Store         Cloud
  Retrieve          Slice          Prepare                        storage
     Retrieve                        Prepare                                        Automated Cloud Configuration


Figure 1: (1) High volume of heterogeneous data collected from welding machines and factories; (2)
distributed semantic ETL; (3) ETL KG and Datalog ML rules for automated cloud configuration

functions obtained by (c) rule parameter learning with ML; (d) automated cloud configuration.
SemCloud Ontology and KG for ETL Pipelines. SemCloud provides the users an ontology
to construct semantic ETL pipelines and encode them into knowledge graphs. The ontology is
written in OWL 2, and consists of 20 classes and 165 axioms. For these data, the users construct
KG for ETL pipelines with four layers (via GUI), which will be used for rule-based reasoning.
Datalog ML Rules. Obtaining an optimised cloud configuration is not trivial. Cloud experts
typically try different configurations by testing the system with various settings and use
heuristics to manually decide on the configurations. To this end, SemCloud uses adaptive rules
in Datalog with aggregation and calls to external predicates learned by ML . In particular,
we consider non-recursive rules of the form 𝐵 ← 𝐵1 , . . . , 𝐵𝑛 , where 𝐵 is a head of rule
(the consequence of the rule application) and 𝐵1 , . . . , 𝐵𝑛 are either predicates that apply join,
aggregate function that filters out the results or the expression of the form Var = @FUNCT(Vars).
Rule Parameter Learning with ML. The functions in the adaptive rules are in the form
of ML models. The resource estimation rules are selected from the best model resulting from
training three ML methods and the pilot running statistics. We selected three representative
classic ML methods: Polynomial Regression (PolyR), Multilayer Percetron (MLP), and K-Nearest
Neighbours (KNN). The resource configuration rules are trained with the three ML methods and
with optimisation techniques, such as Bayesian optimisation or grid search.
User Feedback and Business Impact. SemCloud helps non-cloud experts who know little
about cloud and cannot use cloud system to find the optimal allocation of cloud resources in
various industrial tasks. To verify the time efficiency of SemCloud, we run SemCloud repeatedly
3562 times and gather pilot running statistics. With SemCloud, the Bosch semantic ETL is speed
up to at least twice faster, the optimisation time of cloud configuration is speeded up to 1.12s.
Additionally SemCloud helps more users to use cloud systems, which greatly reduce time and
cost for personnel training and data processing, benefiting data science solution at Bosch.
References
[1] B. Zhou, et al., Scaling data science solutions with semantics and ML, in: ISWC, 2023.
[2] Z. Zheng, O. Savkovic, et al., Datalog with external machine learning functions for auto-
    mated cloud resource configuration, in: ISWC, 2023.