<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Anomaly Detection and Diagnosis of Vehicle Steering Systems Using a Knowledge Graph-based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiushi Cao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patelis Alexandros</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irlán Grangel-González</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Corporate Research, Bosch (China) Investment Ltd.</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Corporate Research</institution>
          ,
          <addr-line>Robert Bosch GmbH, Renningen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the automotive industry, data-driven techniques have been widely used for the early detection of vehicle field issues. During data analysis processes, data heterogeneity is a crucial pain point that causes huge manual efort and time delay. To mitigate this pain point, in this paper we demonstrate a knowledge graph-based approach for the early detection of vehicle technical issues for automotive steering systems. The proposed approach enables semantic data integration, by which the eficiency of data Extract, Transform, Load (ETL) process is significantly improved. Based on the developed knowledge graph system, data visualization dashboards and Large Language Model (LLM) solutions can be easily developed to gain insights for failure-cause-efect analysis. The proposed knowledge graph-based approach has gained significant eficiency improvement. The time and manual eforts for data ETL and integration have been reduced up to 70%. The use of standardized domain ontologies enables the re-usability of the proposed approach for other use cases or products. Our work highlights the importance of industrial knowledge graphs for tackling data heterogeneity and data quality issues when developing data-driven applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph</kwd>
        <kwd>Semantic Data Integration</kwd>
        <kwd>Data Quality</kwd>
        <kwd>Automotive Industry</kwd>
        <kwd>Large Language Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Every day, millions of vehicles are driven on the road. Although most of them are able to assist drivers
and passengers reach their destinations, some of them may face technical issues. When faced with a
technical issue, vehicles are driven to service stations for repair services. When Original Equipment
Manufacturers (OEM) try to fix unexpected technical field issues for the first time, all information
regarding this problem and symptoms (from both vehicle and defect components) is collected and
stored for further use. As more similar technical problems are encountered, OEMs start to investigate
the source of the accumulating technical issues. As time goes by, when a suficient number of similar
issues are detected and reported, OEMs may decide to escalate the problem to automotive suppliers like
Bosch. Bosch may also conduct investigation for the problem and trigger countermeasures such as root
cause analysis. During this process, heterogeneous data is collected from various data sources, which
requires a huge amount of human efort for data analysis. This leads to a significant time delay caused
by data Extract, Transform, Load (ETL).</p>
      <p>
        During this data ETL process, Semantic Interoperability Conflicts (SIC) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a big challenge that
causes data quality issues. This highlights the need for data standardization methods. Among the
existing technologies for data standardization, ontologies appear to be a promising solution [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
Defined as “a specification of a conceptualization” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], ontologies can promote the exchange and sharing
of data and domain knowledge for the purpose of supporting both data-driven and knowledge-driven
tasks [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>To tackle the aforementioned challenges regarding data ETL and data SIC, we propose a knowledge
graph-based approach for managing and harmonizing heterogeneous data related to field issue detection.
The proposed approach starts with collecting and pre-processing heterogeneous data sources regarding
vehicle field, production, and auxiliary data, followed by the semantic data integration process enabled
by data-ontology mappings. The data is integrated and ingested into a knowledge graph system that
provides a unified view of field issues. Data traceability is also enabled by traversing the connected
knowledge graphs that connect heterogeneous data sources. The constructed graph database also allows
the development of data dashboards for answering complex failure-cause-efect questions. Exploratory
charts, filters, and tables have been created by running specialized and customized queries that traverse
the complex knowledge graph for retrieving important information.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Approach</title>
      <p>
        We propose a three-layered knowledge graph system for semantic data integration and data analytics.
Fig. 1 shows the overall architecture of the proposed system. We introduce the system architecture in
details:
1. Data Collection/Preparation Module is responsible for data source management and data
pre-processing. In this work, data is collected from various data sources hosted in diferent
database systems, Vehicle Feedback DataBase Manuals, and CSV files. To handle these data for
future exploitation, data cleaning and pre-processing capabilities are provided by this module for
efective data ETL and integration. In this work, there are three main data sources:
2. Semantic Data Integration Module, which enables data access and integration via SPARQL
queries. HTTP/REST endpoints are provided to access data sources from relational databases. The
accessed data is ingested into the knowledge graph system by creating mappings that link data
attributes to ontology classes. The mappings are key for semantic data integration, by which data
is ingested to a single connected knowledge graph. This module consists of four sub-modules:
• Data Ingestion &amp; Mapping: data-ontology mappings are created for data ingestion. The
ingested data together with domain ontologies form diferent knowledge graphs.
• Domain Ontology: to support semantic data integration, domain ontologies are developed
in this system module. These ontologies provide formal and high-level representations of the
domain. In this work, we develop ontologies for describing core concepts and relationships
regarding vehicles, diagnostic tasks, customer claims, vehicle production, and statistic
calculations. These semantic models are developed based on the on the semantics defined in
the Industry 4.0 Core knowledge graph developed at Bosch [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. They are connected together
to form the conceptual layer of the knowledge graph system.
• Knowledge Graph: a unified knowledge graph is constructed by graph materialization.
      </p>
      <p>RDF triples are generated in a graph structure, under associate classes in semantic models.
The generation of RDF triples is enabled by mappings written in SPARQL query language.
In this way, the semantic models together with materialized RDF data form the whole
knowledge graph, which provides access points to higher-level applications.
• Access Points: the materialized knowledge graph provides access to actual data for
endusers and software agents. Users and developers can query the underlying RDF data via
SPARQL endpoints in the knowledge graph system. This allows the answering of key
questions related to the early detection of vehicle field issues. The access points of the
system also allow the deployment of Application Programming Interfaces, which can be
used to simplify the process of software development (based on the developed knowledge
graph system).
3. Applications Module: this module is built on top of the Semantic Data Integration Module.</p>
      <p>Data visualization features are provided in Applications Module to explore, make sense of, and
show connections between data. Diferent data dashboards are developed for a field, production,
and failure-related data. These dashboards display not only important statistics but also highlight
data linkage across multiple data sources. The Applications Module also enables the deployment
of Graph AI and LLM solutions. Leveraging the semantically-integrated data, AI algorithms will
further improve the eficiency and automation level of the whole system.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Business Values</title>
      <p>Currently, the developed knowledge graph system maintains data on about 1,750,000 vehicles. Data
for more than 8,000,000 diagnostic sessions have been ingested and connected to vehicle data. For the
ingestion of diagnostic data, more than 17,000,000 Diagnostic Trouble Code (DTC) occurrences have
been recorded, and DTC-related information has been encoded in the knowledge graph system. Based
on the knowledge graph system, three types of data visualization dashboards have been developed: i)
Early Detection Dashboard for giving an overview of DTC Fault Byte occurrences on both monthly
and weekly level; ii) DTC Overview Dashboard for displaying DTC codes, DTC code description,
failure symptoms, failure criticality, and information about potential failure causes; iii) Early Warning
Dashboard for showing the trend of DTC occurrence (according to time) and the link between DTC
data with vehicle data.</p>
      <p>The proposed knowledge graph-based approach has gained significant eficiency improvement.
The time and manual eforts for data ETL and integration have been reduced up to 70%. The use of
standardized domain ontologies enables the re-usability of the proposed approach for other use cases
or products. The connected knowledge graph serves as an enabler for other solutions that demand
integrated data silos for generating business insights, such as AI solutions including Graph Neural
Networks and LLM (e.g. Graph Retrieval-Augmented Generation) applications. This research highlights
the importance of industrial knowledge graphs for tackling data heterogeneity and data quality issues
when developing data-driven applications.</p>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly to help rephrase sentences or
paragraphs to improve clarity and conciseness. After using this tool, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Melluso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Grangel-González</surname>
          </string-name>
          , G. Fantoni,
          <source>Enhancing industry 4</source>
          .
          <article-title>0 standards interoperability via knowledge graphs with natural language processing</article-title>
          ,
          <source>Computers in Industry</source>
          <volume>140</volume>
          (
          <year>2022</year>
          )
          <fpage>103676</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xi</surname>
          </string-name>
          , G. Xu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pclion:</surname>
          </string-name>
          <article-title>An ontology for data standardization and sharing of prostate cancer associated lifestyles</article-title>
          ,
          <source>International Journal of Medical Informatics</source>
          <volume>145</volume>
          (
          <year>2021</year>
          )
          <fpage>104332</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Sanfilippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgo</surname>
          </string-name>
          ,
          <article-title>What are features? an ontology-based review of the literature</article-title>
          ,
          <source>ComputerAided Design</source>
          <volume>80</volume>
          (
          <year>2016</year>
          )
          <fpage>9</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Giaretta</surname>
          </string-name>
          ,
          <article-title>Ontologies and knowledge bases, Towards very large knowledge bases (</article-title>
          <year>1995</year>
          )
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cappiello</surname>
            ,
            <given-names>B. F.</given-names>
          </string-name>
          <string-name>
            <surname>Lóscio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jarke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Otto</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Paja</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge-driven data ecosystems toward data transparency</article-title>
          ,
          <source>ACM Journal of Data and Information Quality (JDIQ) 14</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zanni-Merk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samet</surname>
          </string-name>
          , C. Reich,
          <string-name>
            <surname>F. D. B. De Beuvron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beckmann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Giannetti</surname>
          </string-name>
          ,
          <article-title>Kspmi: a knowledge-based system for predictive maintenance in industry 4.0, Robotics</article-title>
          and
          <source>ComputerIntegrated Manufacturing</source>
          <volume>74</volume>
          (
          <year>2022</year>
          )
          <fpage>102281</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Grangel-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lösch</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. ul Mehdi</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs for eficient integration and access of manufacturing data</article-title>
          ,
          <source>in: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>