<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Web
Semantics 71 (2021) 100664. doi:10.1016/j.websem.2021.100664.
[3] A. Klironomos</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.websem.2021.100664</article-id>
      <title-group>
        <article-title>Empowering Industry Professionals with Machine Learning through Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonis Klironomos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gad-Elrab Mohamed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Kharlamov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bosch Center for Artificial Intelligence</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>13998</volume>
      <fpage>123</fpage>
      <lpage>127</lpage>
      <abstract>
        <p>The application of machine learning (ML) has become increasingly prevalent in various industries, ofering valuable insights and predictive capabilities. However, the adoption of ML by domain experts, who possess deep industry-specific knowledge but may lack technical expertise, presents unique challenges. This paper explores strategies for scaling out the usage of ML to industry professionals, enabling them to leverage the power of ML in their respective domains. We discuss a comprehensive user-friendly ML system with an interface for democratizing ML within industry domains. The system includes automatic feature engineering through ontologies, and simplifying ML pipeline creation using knowledge graphs (KGs). We also present real-world use cases supported by user study results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;machine learning</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>industry</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The adoption of machine learning (ML) in various industries has revolutionized the way businesses
operate, ofering valuable insights and predictive capabilities that were previously unattainable. However,
the prerequisites and complexity of developing ML pipelines pose a barrier for domain experts who
want to use ML.</p>
      <p>To overcome this barrier, the paper shows a comprehensive user-friendly system with a graphical
user interface (GUI) designed to cater to industry professionals. This system includes automatic
feature engineering through ontologies, which allows domain experts to leverage their industry-specific
knowledge to create and extract relevant features for ML models. Additionally, the paper addresses
the system’s utilization of knowledge graphs (KGs) to simplify the creation of ML pipelines, making it
easier for domain experts to build ML models without requiring extensive technical expertise.</p>
      <p>To show the system’s impact, the paper presents two real-world Bosch use cases that demonstrate
the successful integration of ML into industry domains, supported by user study results. By showcasing
the practical applications of ML in the industry, the paper aims to highlight the potential benefits and
opportunities that ML can ofer to domain experts. Ultimately, the goal is to democratize ML within
industry domains, empowering domain experts to leverage ML for improved decision-making, enhanced
productivity, and innovative solutions to industry-specific challenges.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Automated Feature Engineering using Ontologies</title>
      <p>
        A crucial step in applying ML involves feature engineering, which typically necessitates domain
knowledge repeatedly used for similar data. Integrating domain knowledge into ontologies can reduce
repetition and help perform automatic ML algorithm selection. This section describes the phases and
benefits of our Semantically-Enhanced Feature Engineering (SemFE) tool [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Semantic Data Preparation. The process of semantically enhancing feature engineering involves a
data preparation phase. This begins with integrating raw data sources, where the Domain Knowledge
Annotator maps raw feature names to the terms of domain ontologies (DO) in a semi-automated fashion.
The resulting Data-to-DO mapping serves as the foundation for subsequent automated processes. The
Feature Group (FG) Annotator then utilizes reasoning to infer ML feature groups for each source from
the Data-to-DO mapping, generating the DO-to-FG mapping. This automated step allows subsequent
modules to abstract from the concrete features and generically work with feature groups.
Semantic Feature Processing. Following the data preparation phase, the process moves into semantic
feature processing, which involves retrieving feature processing algorithms (FPAlg) for each feature
group from the ML Ontology. The Feature Processing Algorithm Selector retrieves these algorithms,
each with varying computational complexity, depending on the feature groups. The default algorithms
are specified, and for specific feature groups, users can manually modify the default algorithms. The
Processed Feature Groups Annotator then uses the FPAlg-to-FPG mapping to infer the feature processed
groups (FPG) for the specified feature groups and chosen feature processing algorithms. This automated
process generates names for new features and applies the feature processing algorithms to compute
these new features.</p>
      <p>ML Modeling and Implementation. The final step involves ML modeling, where the ML Algorithm
Selector module selects ML algorithms with diferent feature settings, ML methods, and hyper-parameters
based on the FPAlg-to-FPG mapping. After the ML model training and testing, the formal representation
of domain knowledge, ML feature processing strategies, and the algorithms and their application order
enable the execution of these workflows with minimal adjustments and adaptations to new datasets.
SemFE has been implemented as an extension of the ML pipeline, incorporating several semantic
modules and communicating with a triple store and reasoner to store ontologies and retrieve inference
results.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Convenient ML Pipeline Creation via Knowledge Graphs</title>
      <p>The aforementioned automated feature engineering tool (i.e. SemFE) is complemented by our
Semantically-Enhanced Machine Learning (SemML) tool [2] used for conveniently creating ML pipelines.
The structured knowledge representation provided by ML-related ontologies facilitates the eficient
construction of executable KGs that represent ML pipelines [3, 4]. This section describes the relevant
ontologies and the process of creating machine learning pipelines from the user’s perspective.
Semantic Artifacts for Executable Knowledge Graphs. The tool includes various ontologies, such as
the upper domain ontology, manufacturing ontology, and domain ontologies for specific manufacturing
domains at Bosch. The upper domain ontology contains axioms, classes, object properties, and datatype
properties to model the general knowledge of discrete manufacturing processes. Domain experts
create the domain ontologies, which consist of sub-classes of the upper domain ontology’s classes. The
tool also includes the data science ontology, which formalizes the general knowledge of data science
activities, and task ontologies for visualization, statistical analytics, and machine learning analytics.
These task ontologies describe common methods, allowed data structures, and the organization of tasks
in pipelines.</p>
      <p>Executable Knowledge Graph Construction. The construction of pipelines as executable KGs in
the tool can be done via GUI. All user’s actions reflect changes in KGs and the task options are based
on the KG structure. Users can create pipelines from scratch, and modify, or integrate existing ones.
The creation of pipelines involves selecting input data and tasks (i.e. steps) based on the respective
ontologies. The modification of pipelines can be done by adding, deleting, or changing tasks, while the
integration of KGs involves combining the outputs of diferent pipelines. The translation of executable
KGs into code is done with Python, which is used as the language for discussion.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Use cases</title>
      <p>We showcase our system’s benefits through two real-world use cases. We demonstrate SemFE’s
capabilities with a welding use case including two experiments and test SemML on real welding data.
Use Case 1: Professionals extend SemFE’s domain ontologies. Users were tasked with using
SemFE for domain ontology extension (Experiment 1) and data mapping between column names and
ontology terms (Experiment 2) [5]. These tasks are important for domain experts to establish a common
vocabulary. In Experiment 1, domain experts and data scientists created terms for resistance spot
welding (RSW) and hot-staking (HS), with average correctness for applying a template at 93% and
making choices of dependencies at 92%. The correlations between user performance and self-reported
expertise indicated that domain expertise greatly increased the correctness and eficiency. In Experiment
2, most users correctly mapped column names to newly introduced terms, achieving 100% correctness,
with average time spent for each term at about 50 seconds. Similar to Experiment 1, there was the same
conclusion for correlations between user performance and self-reported expertise, while experience
with mapping tools showed minimal efect on correctness and eficiency.</p>
      <p>Use Case 2: Welding experts develop ML pipelines using SemML. SemML was deployed on
welding data from Bosch to predict the quality of resistance spot welding [4]. A user study involving 28
experts from various fields, including ML, welding, and sensor engineering, was conducted to evaluate
the tool’s efectiveness. The study included a series of tasks for visualization, statistics, and ML. The
users were asked to complete the tasks with and without using SemML. ML experts explained the tasks
to non-ML experts, who then completed the tasks by using technical language or creating, modifying,
and merging knowledge graphs through a GUI. The study measured the percentage of tasks completed,
completion time, and correctness of answers, and compared the actions taken during the tasks with
ground truth to measure correctness. The results showed that most participants had a high completion
percentage and correctness when using our tool and needed less time to complete tasks. The tool
also improved transparency, usability, and the coverage of tasks, making previously impossible tasks
achievable for non-ML experts.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper describes SemFE and SemML, two tools that complement each other to form a semantic-based
ML system. The automation of feature engineering by SemFE is followed by SemML’s facilitation of
ML pipeline creation. In this system, we harness the power of ontologies and KGs to allow experts
in various domains to use ML in their work, without needing ML expertise. We prove the system’s
benefits, such as user-friendliness and eficiency, by presenting two real-world use cases. The system
is capable of enabling non-ML professionals to define a common vocabulary through ontologies and
conveniently create ML pipelines through knowledge graphs.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The Graph Massiviser (GA 101093202) EU project as well as Dome 4.0 (GA 953163) and enRichMyData
(GA 101093202) partially supported this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Svetashova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pychynski</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Baimuratov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soylu</surname>
          </string-name>
          , E. Kharlamov,
          <article-title>SemFE: Facilitating ML Pipeline Development with Semantics</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , Virtual Event Ireland,
          <year>2020</year>
          , pp.
          <fpage>3489</fpage>
          -
          <lpage>3492</lpage>
          . doi:
          <volume>10</volume>
          .1145/3340531.3417436.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>