=Paper=
{{Paper
|id=Vol-3254/paper390
|storemode=property
|title=Towards A Statistic Ontology for Data Analysis in Smart Manufacturing
|pdfUrl=https://ceur-ws.org/Vol-3254/paper390.pdf
|volume=Vol-3254
|authors=Zhuoxun Zheng,Baifan Zhou,Dongzhuoran Zhou,Akif Quddus Khan,Ahmet Soylu,Evgeny Kharlamov
|dblpUrl=https://dblp.org/rec/conf/semweb/ZhengZZKSK22
}}
==Towards A Statistic Ontology for Data Analysis in Smart Manufacturing==
<pdf width="1500px">https://ceur-ws.org/Vol-3254/paper390.pdf</pdf>
<pre>
Towards A Statistic Ontology for Data Analysis in
Smart Manufacturing
Zhuoxun Zheng1,2,∗ , Baifan Zhou3 , Dongzhuoran Zhou3 , Akif Quddus Khan4 ,
Ahmet Soylu2 and Evgeny Kharlamov1,3
1
  Bosch Center for Artificial Intelligence, Germany
2
  Department of Computer Science, Oslo Metropolitan University, Norway
3
  SIRIUS Centre, University of Oslo, Norway
4
  Norwegian University of Science and Technology


                                         Abstract
                                         Statistical analytics has been playing an important role for uncovering patterns and trends from data
                                         in smart manufacturing. However, how statistical analytics are performed is usually stored in the
                                         underlying code and suffers from transparency and re-usability, which is vital for modern industry. To
                                         this end, we propose a statistical ontology StatsOnto that models not only the concepts in statistics, but
                                         also allows encoding the procedure of statistical analytics, which is limitedly addressed in past works.
                                         We present a preliminary evaluation of StatsOnto with a Bosch use case with a user study, competence
                                         question and coverage discussion in this poster paper.

                                         Keywords
                                         Ontology Engineering, Knowledge Graph, KG Generation, Data Science, Manufacturing


1. Introduction
Smart manufacturing is a term generally applied to improve manufacturing operations through
system integration, linking of physical and cyber capabilities, and taking advantage of informa-
tion including leveraging the big data analysis [1, 2]. In this process, statistical analyses have
always played a crucial role, as they not only identify patterns and trends from large amounts
of data [3], but can also be further used in methods such as machine learning [4, 5].
   However, how statistical analytics are performed is usually stored in underlying code and
suffers from transparency and re-usability, which is vital for modern industry [6]. Semantic
technologies including ontologies are beneficial for improving the transparency since they
offer a standardised way describing statistical analysis knowledge and procedure in machine
processable formalisation that opens the door for many applications [7, 8], such as automated
reasoning, optimisation of statistical pipelines [9, 10]. Currently there are a few studies that
discuss partially statistical analytics pipeline modelling. For instance, the computer science
ontology [11, 12] contains the general knowledge about statistics, but the concepts of specific
statistical calculation are not involved. Statistics ontology [13, 14] enumerates the various
statistical methods, but they insufficiently study the procedures of the statistical pipelines.

Hangzhou’22: The 21st International Semantic Web Conference, October 23–27, 2022, Hangzhou, China
∗
    Corresponding author.
Envelope-Open zhuoxun.zheng@de.bosch.com (Z. Zheng)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: Schematic illustration of StatsOnto (partial). Three important classes StatisticalPipeline, Data and
StatisticalMethod are colored in blue. Red, yellow and green backgrounds denote concepts in Task, Data and Method.
   To this end, we propose a statistical ontology StatsOnto with the angle of procedure
orientation. We discuss StatsOnto under an industrial scenario of smart manufacturing and use
real-world data provided by Bosch [15]. On the one hand, StatsOnto covers statistical methods
such as feature calculation (e.g., mean), sampling and filtering; on the other hand, StatsOnto
also offers a set of classes for construction of statistical analytical pipelines, including statistical
pipeline, various tasks, data entities, etc. This poster paper presents a preliminary evaluation
of StatsOnto in the industrial scenario with user study, competence question and discussion
of knowledge and scenario coverage.

2. Our Approach
Industrial Scenario. We limit our scope in this poster paper to an industrial scenario of
smart welding manufacturing at Bosch [16], to show the real-world impact as well as examples.
StatsOnto aims to help the engineers at Bosch to gain insights from the data collected from
the welding production, and to monitor quality of the welding operations.
Requirements. We derive the requirements for StatsOnto based on the scenario purposes and
discussion with users [17, 18, 19]: R1. Procedure-Orientation: StatsOnto should be able to reflect
the statistical analytics procedure, allowing to describe sequence of statistical tasks in a data
pipeline. This opens the door of knowledge graph based verification, reasoning, optimisation of
statistical analytical pipelines. R2. Transparency: StatsOnto should improve the transparency
of the representing statistical analytics in industry among engineers. R3. Knowledge Coverage:
StatsOnto should cover the knowledge and practice of the statistical analytics, such as statistical
method, data structure. R4. Purpose Coverage: StatsOnto should cover the four types of task:
data inspection (e.g. find the data with certain property), statistical modelling (e.g. build the
distribution of the data), data denoising (e.g. detect and remove the outliers ) and data analysis
(e.g. interpolation, subsampling).
Ontology Engineering Process. We broadly follow the routine of Ontology development
101 [20, 21], which is a kind of collaborative ontology engineering methodology. The whole
process can be divided into the following 4 steps. Step 1: Domain Analysis, where common
statistical analytics at Bosch are discussed. Common and important terms of statistical tasks are
enumerated and classified. Step 2: Concepts Formalisation, where enumerated basic concepts are
formalised as classes and relationships between them. Step 3: Mechanism Investigation, where the
mechanism of how StatsOnto can serve as the basis in generating KGs which represent concrete
statistical analystic pipelines. This step reflects the requirement of Procedure-Orientation of
StatsOnto . Step 4: System Deployment, where StatsOnto will be deployed in manufacturing and
Figure 2: An example of statistical analytics, which aims to detect outliers of the quality indicator (Q-Value). (a)
Schematical illustration of a KG generated based on StatsOnto , which represents a data pipeline for data denoising
in the case study (R1-R2). (b) The visualisation of the task results. (c) Example Competence Questions (CQ) for
knowledge query (R2-R3), and (d) for procedure query (R1-R2).

user feed-backs are collected constantly for iterative processing and further improvement.
Statistical Ontology is expressed in OWL 2 EL language for its expressivity and it is still poly-
nomial for query answering [22]. It has 88 classes, 30 properties, and 484 axioms. In particular,
there are three most important classes in StatsOnto (Fig. 1), StatisticalTask, StatisticalMethod and
Data. StatisticalTask can be divided into two sub-classes, the AtomicStatisticalTask and Statisti-
calPipeline. The former models the basic and common statistical tasks, such as mean calculation
task, counting task, etc., and connects to StatisticalMethod with property hasMethod. While
StatisticalPipeline can be regarded as the serialization of AtomicStatisticalTask. Each individual of
StatisticalMethod is a piece of script in computer language such as Python. There are also two
kinds (sub-classes) of StatisticalMethod s, namely DataSelectionMethod and FeatureCalculationMethod,
which correspond to the two basic steps in statistical analytics, namely determination the tar-
get data of interest and calculation of the desired feature respectively. Besides these concepts,
StatsOnto also specifies some rules, which constraint the inputs or outputs of each AtomicStatisti-
calTask which invokes certain StatisticalMethod. For example, the following axiom specifies, that
any task which has Array as input and has MeanCalculationMethod as method, has SingleValue as
output: ∃hasOutput− .(Task ⊓ ∃hasInput.Array ⊓ ∃hasMethod.MeanCalculation) ⊑ SingleValue.

3. Evaluation, Conclusion and Outlook
User Study. We organised a workshop at Bosch and collected 28 reports from experts of
different background, such as welding engineers, data scientists, knowledge engineers. We
have also collected some typical statistical analysis tasks in Bosch’s welding manufacturing
(Tab 1). The users we divided into two groups; each group first perform a statistical analytical
task without our method, and then answer several single-selected questions (SSQ) (Tab 1); after
that, the two groups exchange their statistical tasks and do the tasks with our method, and then
Table 1
Example tasks and their description and examples of single selection questions (SSQ)
   Tasks                                          Description
 StatsTask1 Extract four statistics from a sequence: mean, std., min. and max.
 StatsTask2 Compute the trend, scattering and outliers of a sequence with median filter, etc.
                                  Questions (Q) and Answers (A) for SSQ
 Q1: What’s the structure of the input data we used for the statistical analytics?
 A1: (A) Single features (B) Array (C) Matrix
 Q2: What method have we used in the task? I: median filter, II: mean filter, III: Gaussian sampling.
 A2: (A) I (B) I + II (C) II (D) II + III
answer the SSQs. Fig. 2 demonstrate StatsTask2, which aims to detect the welding operations
with abnormal quality indicator (Q-Value).
Transparency. After evaluation, the correctness of SSQ for the participants with the help of
StatsOnto , no matter in which group, reaches 96.7%, while the correctness without StatsOnto is
about 93.3%. The results show that the ontology indeed helps the users to better understand the
statistical analysis pipeline, thus increase the transparency.
Case Study and Procedure Orientation. We use an example for data denoising (Fig. 2) for
demonstrating the procedure orientation. Given the input of Q-Value Array, the pipeline first
extract its trend by calculating the median value with sliding window, then calculates the
scattering by the difference between the trend and Q-Value. The points with large scattering
(large deviation from the trend) are detected as the outliers. This shows StatsOnto is capable of
representing the procedure of statistical analytics.
Knowledge Coverage. We select two example CQs in SPARQL from two aspects: knowledge
query (Fig. 2c), analysis procedure query (Fig. 2d). Results show that all these CQs return
desired answers on KGs generated based on StatsOnto with the welding manufacturing data,
demonstrating good Knowledge Coverage of StatsOnto .
Purpose Coverage. After extensive discussion in the workshop, we categorised most statistical
analytical tasks in our project into the four types of purposes (R4): data inspection, statistical
modelling, data denoising and data analysis, and found most of the purpose can be covered
(above 80%, considered relatively sufficient).
Conclusion and Outlook. This poster presents our ongoing research of statistical ontology,
which is easy to understand and covers most of statistical analytics in industrial applications.
In the future we will improve on the design practices and try to reuse classes and properties
(e.g., from STATO [13]) for better interoperability, some reasoning properties and mechanisms
of the ontology, and further improve the purpose coverage.
Acknowledgements. The work was partially supported by the H2020 projects Dome 4.0
(Grant Agreement No. 953163), OntoCommons (No. 958371), and DataCloud (No. 101016835)
and the SIRIUS Centre, Norwegian Research Council project number 237898.

References
 [1] J. Davis, et al., Smart manufacturing, manufacturing intelligence and demand-dynamic
     performance, Computers & Chemical Engineering 47 (2012) 145–156.
 [2] C. Naab, et al., Application of the unscented kalman filter in position estimation a case
     study on a robot for precise positioning, RobAutonSyst 147 (2022) 103904.
 [3] E. C. Bryant, et al., Statistical analysis, 1966.
 [4] Z. Zheng, B. Zhou, D. Zhou, et al., Executable knowledge graph for machine learning: A
     Bosch case for welding monitoring, in: ISWC, 2022.
 [5] O. Celik, D. Zhou, et al., Specializing versatile skill libraries using local mixture of experts,
     in: CRL, PMLR, 2022, pp. 1423–1433.
 [6] B. Zhou, Z. Zheng, D. Zhou, et al., The data value quest: A holistic semantic approach at
     bosch, ESWC, Springer (2022).
 [7] D. Zhou, B. Zhou, et al., Ontology reshaping for knowledge graph construction: Applied
     on bosch welding case, in: ISWC, 2022.
 [8] Z. Zheng, B. Zhou, et al., Query-based industrial analytics over knowledge graphs with
     ontology reshaping, ESWC, Springer (2022).
 [9] D. Zhou, B. Zhou, Z. Zheng, et al., Enhancing knowledge graph generation with ontology
     reshaping–Bosch case, ESWC, Springer (2022).
[10] D. Zhou, et al., Schere: Schema reshaping for enhancing knowledge graph construction,
     in: CIKM, 2022.
[11] A. Salatino, et al., The computer science ontology: a large-scale taxonomy of research
     areas, in: ISWC, Springer, 2018, pp. 187–205.
[12] K. K. Breitman, et al., Ontology in computer science, Semantic Web: Concepts, Technolo-
     gies and Applications (2007) 17–34.
[13] K. Kotis, A. Papasalouros, Statistics ontology, http://stato-ontology.org/, 2018.
[14] P. Rocca-Serra, et al., Experiment design driven fairification of omics data matrices, an
     exemplar, Scientific Data 6 (2019) 1–4.
[15] Z. Zheng, et al., Exekg: Executable knowledge graph system for user-friendly data analytics,
     in: CIKM, 2022.
[16] M. Yahya, et al., Towards generalized welding ontology in line with iso and knowledge
     graph construction, ESWC, Springer (2022).
[17] Z. Zheng, et al., Executable knowledge graph for transparent machine learning in welding
     monitoring at bosch, in: CIKM, 2022.
[18] B. Zhou, et al., Knowledge graph-based semantic system for visual analytics in automatic
     manufacturing, in: ISWC, 2022.
[19] B. Zhou, Z. Tan, et al., Towards a visualisation ontology for reusable visual analytics, in:
     IJCKG, 2022.
[20] N. F. Noy, et al., Ontology development 101: A guide to creating your first ontology, 2001.
[21] Z. Zheng, et al., Towards a visualisation ontology for data analysis in industrial applications,
     in: SemIIM@ESWC, 2022.
[22] B. Motik, et al., OWL 2 web ontology language profiles, 2012. URL: https://www.w3.org/
     TR/owl2-profiles/, accessed 5 July, 2022.

</pre>