<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Kuksa in Vehicular Data Collection and Digital Twin Creation Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olli Timonen</string-name>
          <email>olli.timonen@oulu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toni Bomström</string-name>
          <email>toni.bomstrom@proton.me</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicklas Staford</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuli Määttä</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alireza Bakhshi Zadi Mahmoodi</string-name>
          <email>Alireza.BakhshiZadiMahmoodi@oulu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tero Päivärinta</string-name>
          <email>tero.paivarinta@oulu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ella Peltonen</string-name>
          <email>ella.peltonen@oulu.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Empirical Software Engineering in Software</institution>
          ,
          <addr-line>Systems, and Services</addr-line>
          ,
          <institution>University of Oulu</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vehicular Computing</institution>
          ,
          <addr-line>Data Transfer, Digital Twins</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Increased sensing and computing capabilities in cars are crucial for advanced trafic and driving automation. However, novel data delivery, testing, and machine learning pipelines are still needed to harness the full capabilities of automotive sensing solutions. At the same time, vehicular digital twins are needed to enable versatile testing and simulation capabilities. This paper depicts the Vehicle-In-The-Loop (VIL) cloud interface and verifies data consistency regardless of the source. The study aims to determine how data collected from simulation corresponds to real test drive data. The data is collected from both simulation and actual test drives. Utilising the MQTT protocol, data is stored on a cloud server and further fed into Unreal Engine 5, where the test drive is replayed, and its correspondence to the real drive is ensured. This work ofers a new perspective on verifying data consistency between simulated and real test drives and complements the vehicle abstraction opportunities provided by Eclipse KUKSA. Our results highlight digital twin creation as a part of automotive software development and set premises for testing and validating complex use cases, such as trafic accidents and extreme weather, that can rarely or only with severe expenses be tested in real-life situations.</p>
      </abstract>
      <kwd-group>
        <kwd>Environment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License test drive in a real car. For validation, we determine how</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>ing capabilities that are crucial for advanced trafic and
driving automation, applications spanning from safety
Today’s cars hold considerable computational and sens- these digital models dynamically based on measurement
is broad, and definitions may vary. Still, the main idea is
to model physical systems with digital means and update
data. In essence, digital twin methods provide digital
spaces where reality can be modelled virtually [7] as it
livery and management protocols and interfaces that are
features to fully automated vehicles. However, data de- is or would be in unseen but possible situations. Indeed,
ily closed in company-specific silos [ 1]. The first eforts
required for machine learning pipelines are still primar- software development sets premises for testing and
valithe creation of the digital twin as part of automotive
dating complex use cases, such as trafic accidents and
terfaces from the car to the cloud environment include
for creating open-source data transfer protocols and in- extreme weather, that can rarely or only be tested in
realEclipse Kuksa [2], of which this work also bases, but con- digital twins for automotive software development opens
siderable work is still required for data validation and
benchmark eficiency of the proposed frameworks. Data
sharing through open interfaces can boost innovation by
more eficient and accurate machine learning models that
cover more expansive geographic areas and use cases [3].</p>
      <sec id="sec-2-1">
        <title>At the same time, digital twins can enable versatile testing and simulation capabilities as seen with applica</title>
      </sec>
      <sec id="sec-2-2">
        <title>Indeed, digital twins have attracted much research interest in recent years [5, 6]. The concept of the digital twin</title>
        <p>nEvelop-O</p>
        <p>0009-0006-4132-9496 (O. Timonen); 0009-0004-3192-7711
(A. B. Z. Mahmoodi); 0000-0002-3374-671X (E. Peltonen)
life situations with severe expenses. The utilisation of
avenues for testing diferent sensors and components in
actual use cases, such as studying the longevity of such
components and proposing novel learning strategies that
combine multiple data sources.</p>
        <p>This paper describes the Vehicle-In-The-Loop (VIL)
cloud interface. It verifies data consistency regardless of
the source: a real car on the road or a virtual object in
platform is provided in Figure 1. We use KUKSA.val
[8] that provides a vehicle abstraction layer to enable
the management and use of vehicle signals. As a
dig5. Capabilities of utilising such a game engine in digital
twin creation have been successfully demonstrated in
wind power plants [9] and cultural tourism [10]. Using
a game engine and a VIL cloud interface enables visual
simulation that complements the capabilities provided
by KUKSA.val; KUKSA.val is used to collect data from a
tions in industry, energy, and transportation verticals [4]. the digital twin environment. The overview of the Kuura
11.6.2024 Vaasa, Finland
TKTP 2024: Annual Doctoral Symposium of Computer Science, 10.- ital twin modelling framework, we use Unreal Engine
data collected from the simulation corresponds to real edge and cloud back-ends can perform challenging
infertest drive data in a real-life driving scenario. Utilising ence and learning tasks to support drivers’ cognition and
the MQTT protocol, data is stored on a cloud server and automate the driving scenario [11, 12]. Such intelligent
further fed into Unreal Engine 5, where the test drive systems demand training data, which in-vehicle sensors
is replayed, and its correspondence to the real drive is and external databases can provide. How to make the
ensured. This work ofers a new perspective on verifying data available, processed, and utilised in a challenging
data consistency between simulated and real test drives real-time and mobile environment is a timely research
and complements the vehicle abstraction opportunities question.
provided by Eclipse KUKSA. Vehicular safety systems (and any other relevant
ap</p>
        <p>The main contributions of this work are the following: plications) of any level of driving autonomy require data
1) We provide the Kuura platform for similarly collecting from in-vehicle sensors [13], such as cameras, LiDARs,
vehicular sensor data to a real car and similar test runs radars, and speed meters [14, 15]. This information can
in a digital twin environment. With this, we extend the be used to, for example, improve lane [16] and road
potKUKSA.val environment to better fit digital testing and hole [17] recognition. Solutions for detecting drivers’
validation tasks. 2) We explore Unreal Engine 5 as a ve- behaviour while using smartphones during driving [18]
hicular digital twin environment and provide a pipeline and drunk driving [19] have been explored. However,
to deploy such digital twins with simulated and real test the results underline that human drivers’ perception and
drives. 3) We experiment with the data consistency be- reasoning still maintain an advantage compared to fully
tween simulated and real test drives and further demon- automatic vehicles [20].
strate the power of game engine-based digital twins in However, most of the in-vehicular and driver’s
pervehicular computing and sensing scenarios. sonal sensors and interfaces are brand-specific or closed,
limiting access to the data, computing, and networking
capabilities and thus hindering vehicular application
de2. Background velopment. To enable connected vehicles to utilise all
the available data sources, AI/ML computing resources,
2.1. Vehicle as a Sensing Device and networking capabilities, open-sourced general
inModern cars implement technologies for automatic brak- terfaces and software platforms need to be defined [ 1].
ing, Cooperative Adaptive Cruise Control (CACC), pre- On-board diagnostics (OBD) protocol refers to a
vehivention of unwanted lane crossing, distance keeping, and cle’s self-diagnostic and reporting capability. The more
so on, to supply drivers’ own cognition and prevent acci- advanced OBD-II is a protocol homogenised into the
vehidents. For further technological advancement, vehicles cle itself, allowing software-defined onboard operations
will require artificial intelligence (AI) and machine learn- and, most importantly, collecting a wide range of
vehicing (ML) capabilities depending on efective data transfer ular data to the software-defined vehicle’s case. This
and management systems. With increased networking includes but is not limited to engine load, coolant
temand computing capabilities, vehicles and their supporting perature, fuel pressure, engine revolutions per minute
(RPM), vehicle speed, intake air temperature, airflow rate,
throttle position and many types of sensor data like oxy- precisely under source made available license, making
gen sensors and fuel system status. OBD-II has relatively it suitable for various applications in autonomous and
easy access to the mentioned sensors, which is enough to driving-support test cases [22]. The key diferences
beprove the concept. In the future, research conducted with tween Unity and Unreal Engine are summarised in Table
vehicular sensors can utilise direct access to the vehicle’s 1. Unreal Engine has typically been considered a better
controller area network (i.e. CAN bus) and standardised choice for 3D games, while Unity has been considered a
architectures such as AUTOSAR for wider data access. strong choice for 2D games.</p>
        <p>The previous literature emphasises the importance
2.2. Automotive Simulations of determinism in simulation environments to ensure
repeatability, allowing for trustworthy and easily
debuggable results. Game engines still may come with
challenges of non-deterministic behaviours. For example, the
investigation by Chance et al. [24] reveals significant
simulation variance in CARLA, particularly due to
actor collisions and system-level resource utilisation. As
such, accuracy investigation is one of the key goals in
our preliminary work presented in this paper.</p>
        <p>Driving and trafic simulators are used in the
automotive industry as an alternative to costly and potentially
dangerous real-life testing [21]. The advantages of such
practices highlight efectiveness in analysing human
driving behaviour and essential trafic situations often too
hazardous to test in real-life scenarios, such as extreme
weather, congestion, and accidents [22]. This can be
especially emphasised in the increasing reliance on
simulation technologies for assessing human driving factors. 2.3. Automotive Digital Twins
The more real-life, photo-realistic simulations enable
simultaneous testing of vehicle dynamics and stochastic Digital twins (DT) in the context of vehicles is an
emergpedestrian, driver, and vehicle interactions in various ing field that has attracted significant attention in both
scenarios [23]. industry and academia [27]. Digital twins are virtual</p>
        <p>However, traditional simulators often have limitations representations of physical entities, such as vehicles, that
in emulating real-life behaviour and perception. This has aim to mirror the lives and behaviours of their real-world
led to a growing interest in game engines as simulation counterparts [28]. These digital replicas use the best
platforms for developing and testing autonomous vehi- available physical models, sensor updates, and other data
cle control systems [24]. Several vehicular simulators, sources to simulate and predict the behaviour of the
corsuch as CARLA, AirSim and CarSIM, provide simula- responding physical twin [29, 30]. One area where digital
tion capabilities and environments to support vehicle twins have shown great potential is in the automotive
research and development. These platforms have been industry, particularly for electric vehicles [31]. Digital
used to study vehicle autonomy, safety and performance. twins can greatly benefit electric vehicles, which have
CARLA is a free open-source simulator to support au- gained greater market share in recent years. By creating
tonomous vehicle systems’ development, training, and a digital twin of an electric vehicle, manufacturers and
validation. AirSim is a simulator for drones and cars researchers can simulate and optimise its performance,
developed by Microsoft. It can also provide the possibil- energy consumption and other key parameters. Unlike
ity to experiment with deep learning, computer vision traditional simulators, digital twins provide beyond
capaand reinforcement learning algorithms in autonomous bilities for human-machine interaction and performing
vehicles and the creation of complex and changeable envi- data-driven actions in real-world scenarios.
ronments and additional sensor modalities [25]. CarSim Digital twins also play a key role in the design and
is a vehicle dynamics simulation platform that allows the development of autonomous vehicles [32]. The concept
simulation of vehicle behaviour in diferent conditions of digital twins is closely related to the transition to
dataand environments, including motor dynamics, through driven vehicles, as it enables the analysis and validation
Simulink models. It can be used to create accurate models of autonomous vehicle designs [33]. By exploiting digital
of vehicles and simulate their behaviour under diferent twin technologies, researchers can assess the safety and
road surfaces, weather conditions, and trafic situations security of autonomous vehicles and identify potential
[26], but is not open-sourced. risks and vulnerabilities. Furthermore, combining
digi</p>
        <p>In this study, we use Unreal Engine, renowned for its tal twins with combined vehicle technology and cloud
versatility, high-quality graphics and realistic physics computing has led to the development of the Mobility
simulation, which is useful for simulating vehicles [21]. Digital Twin (MDT) framework [34]. These frameworks
Competing game engines include Unity and CryEngine, consist of digital representations of people, vehicles, and
of which CryEngine is the smaller project. The main transport, which enable the analysis and optimisation
arguments that favour Unreal Engine are it is free of of mobility and large-scale trafic systems. By
exploitcost for research and commercial projects until making ing real-time data and simulations, MDT frameworks
one million revenue, has open source code even it is can support decision-making processes and improve the</p>
        <sec id="sec-2-2-1">
          <title>Learning Curve</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Graphics</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>Physics and Simulation 2D vs. 3D</title>
          <p>Unreal Engine</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>Epic Games</title>
        </sec>
        <sec id="sec-2-2-5">
          <title>C++, Blueprint</title>
        </sec>
        <sec id="sec-2-2-6">
          <title>Open source</title>
        </sec>
        <sec id="sec-2-2-7">
          <title>Free for research and for commercial use up to 1 million revenue, 5% comission after that</title>
        </sec>
        <sec id="sec-2-2-8">
          <title>Steep</title>
        </sec>
        <sec id="sec-2-2-9">
          <title>Photorealistic graphics, used in AAA games</title>
        </sec>
        <sec id="sec-2-2-10">
          <title>Ragdoll physics, physics-based destruction, fluid simulation</title>
        </sec>
        <sec id="sec-2-2-11">
          <title>Excellent 3D-development, especially for creating photorealistic environments and visual efects</title>
          <p>Unity</p>
        </sec>
        <sec id="sec-2-2-12">
          <title>Unity Technologies C#</title>
        </sec>
        <sec id="sec-2-2-13">
          <title>Not open source</title>
        </sec>
        <sec id="sec-2-2-14">
          <title>Free version available</title>
        </sec>
        <sec id="sec-2-2-15">
          <title>Easy to learn with intuitive user interface</title>
        </sec>
        <sec id="sec-2-2-16">
          <title>High-quality graphics, but not as refined as Unreal</title>
        </sec>
        <sec id="sec-2-2-17">
          <title>Easily integrated and well-rounded with other engine features</title>
        </sec>
        <sec id="sec-2-2-18">
          <title>Strong 2D development capability, excellent choice for 2D game projects</title>
          <p>2.4. Open-sourced Automotive Software
eficiency and safety of transportation systems. abilities. The maintenance and support of open-source</p>
          <p>Digital twins enable the simulation, optimisation, and software can be uncertain if their developers and
comanalysis of vehicle performance, energy consumption, munity are not active or committed. While open-source
and safety and security. Combining digital twins with software is dynamic and constantly changing, vehicles
connected vehicle technology and cloud computing will purchased today will remain in trafic for decades. In
extend their capabilities to optimise mobility systems. addition, there is a need for precise quality control and
As technology advances, digital twins can be expected software certification in the automotive industry, which
to play a key role in shaping the future of vehicles and can be challenging to implement in an open-source
entransport systems. As such, current technologies aim to vironment because access to representative designs and
create models for distributed multi-agent cyber-physical industry-standard methodologies is limited. This
limisystems using co-simulation [35]. Such large-scale digi- tation challenges researchers as automotive companies
tal twins should be able to make predictions about the do not openly share their development life-cycles and
future condition and behaviour of the vehicle [36]. How- verification methods, each maintaining proprietary
techever, AI-based digital twin capabilities require data co- niques. Given this scenario, there is a growing demand
operation and load-balancing, scheduling, and network for open-source solutions to support the development
security schemes over vehicle-to-cloud computing con- and research of automotive applications, emphasizing
tinuum [37]. the need for open-source benchmarks to facilitate
research across various aspects of automotive application
development. [41].</p>
          <p>Open source software refers to software that has a
publicly available and editable source code. This allows col- 3. Kuura Implementation
laborative development and innovation. One of the most
remarkable benefits of open-source software is its flexi- 3.1. Design Principles
bility and customizability, as user communities can adapt The cornerstone of our framework is grounded in the
the software to their specific needs. Open source is also principle of open-source development, ensuring
transcost-efective as it is free and reduces dependencies on parency and collaborative potential. Simplicity is at the
specific software providers. The use of open source also core, paving the way for efortless future evolution. Our
ofers opportunities for innovation in automotive soft- design philosophy revolves around creating a system that
ware development and promotes the use of new tech- is not just functional today but remains adaptable and
nologies and solutions [38, 39]. maintainable for tomorrow’s innovations. The essence</p>
          <p>For the automotive industry, open-source software of this framework is to avoid complexity instead of
empresents some unique challenges as vehicular software, bracing a minimalist approach that prioritises ease of
by default, has life-critical safety and reliability require- understanding and operation. One must consider the
ments [40]. Technically, anyone can modify the source life cycle of software components, as updates and
decode, which may create unwanted surprises and vulner- pendencies are inevitable. The framework architecture
is designed to handle these, avoiding obsolescence and limited devices or low-bandwidth networks.
incompatibility. The Kuura framework presents a cohesive suite of
components, each selected for robustness and simplicity.
3.2. Kuura Architecture Design At its foundation lies the integration of Kuksa, SMAD,
and Kuura, delineating a timeline of iterative progress
The general architecture of Kuura is shown in Figure 2. as detailed in Table 2. Each iteration is a response to the
We chose the Unreal Engine 5 game engine because of evolving needs and challenges encountered. Kuksa,
iniits versatility in creating realistic simulations. This is tially misaligned with its focus on automotive app stores
an essential part of the research objective to verify the and firmware updates, has since been archived [ 8, 2].
consistency of the data between the simulation and the SMAD was unsustainable due to its complexity and poor
real test runs. The MQTT protocol was chosen to collect documentation [42]. Our simplified stack emerges as
and transfer the data to the cloud server and the game a response, stripping away the superfluous to focus on
engine, as it is reliable and eficient for real-time data functionality. It leverages OpenShift (run on CSC Rahti
transfer, which might be the next step in the research container cloud 1) for its cost-eficiency compared to
Miand, thus, critical requirements in our study. The MQTT crosoft Azure. This pragmatic approach is engineered
protocol operates asynchronously and is considered an to reduce complexity, cost, and maintenance overhead,
ideal choice for IoT applications that often operate on
Purpose</p>
        </sec>
        <sec id="sec-2-2-19">
          <title>Cloud Service Provider</title>
        </sec>
        <sec id="sec-2-2-20">
          <title>Deployment Platform</title>
        </sec>
        <sec id="sec-2-2-21">
          <title>Client-Server Messaging Infrastructure Broker</title>
        </sec>
        <sec id="sec-2-2-22">
          <title>Serverside Messaging Infrastructure</title>
        </sec>
        <sec id="sec-2-2-23">
          <title>Client Message Persistence</title>
        </sec>
        <sec id="sec-2-2-24">
          <title>Client Message Data Modelling</title>
        </sec>
        <sec id="sec-2-2-25">
          <title>Client Firmware Updates</title>
        </sec>
        <sec id="sec-2-2-26">
          <title>Client Appstore</title>
        </sec>
        <sec id="sec-2-2-27">
          <title>Messaging Telemetry Storage</title>
        </sec>
        <sec id="sec-2-2-28">
          <title>Data Visualization</title>
        </sec>
        <sec id="sec-2-2-29">
          <title>Deployment Monitoring</title>
        </sec>
        <sec id="sec-2-2-30">
          <title>Message Tracing</title>
        </sec>
        <sec id="sec-2-2-31">
          <title>InfluxDB</title>
        </sec>
        <sec id="sec-2-2-32">
          <title>Kuksa.VAL</title>
        </sec>
        <sec id="sec-2-2-33">
          <title>Eclipse hawkBit</title>
        </sec>
        <sec id="sec-2-2-34">
          <title>Kuksa Appstore</title>
          <p>SMAD stack</p>
        </sec>
        <sec id="sec-2-2-35">
          <title>Microsoft Azure</title>
        </sec>
        <sec id="sec-2-2-36">
          <title>Kubernetes</title>
        </sec>
        <sec id="sec-2-2-37">
          <title>Eclipse Hono</title>
        </sec>
        <sec id="sec-2-2-38">
          <title>Ambassador and Kafka with Zookeeper</title>
        </sec>
        <sec id="sec-2-2-39">
          <title>MongoDB</title>
        </sec>
        <sec id="sec-2-2-40">
          <title>Kuksa.VAL</title>
        </sec>
        <sec id="sec-2-2-41">
          <title>MongoDB</title>
        </sec>
        <sec id="sec-2-2-42">
          <title>Node-RED</title>
        </sec>
        <sec id="sec-2-2-43">
          <title>Prometheus Monitoring,</title>
        </sec>
        <sec id="sec-2-2-44">
          <title>InfluxDB, and Grafana</title>
        </sec>
        <sec id="sec-2-2-45">
          <title>Jaeger Trace</title>
          <p>Kuura (this paper)</p>
        </sec>
        <sec id="sec-2-2-46">
          <title>OpenShift OKD</title>
        </sec>
        <sec id="sec-2-2-47">
          <title>Eclipse Mosquitto</title>
        </sec>
        <sec id="sec-2-2-48">
          <title>Python script</title>
        </sec>
        <sec id="sec-2-2-49">
          <title>InfluxDB</title>
        </sec>
        <sec id="sec-2-2-50">
          <title>Client implementation</title>
        </sec>
        <sec id="sec-2-2-51">
          <title>Grafana</title>
          <p>streamlining operations without compromising capabil- car. A laptop computer running Linux was connected to
ity. the adapter, and a script was run to record data from the</p>
          <p>Each framework iteration — Kuksa, SMAD, and Kuura vehicle in a log file. The successful log file collection was
— brings new insights. Kuksa’s archival signals a further important in developing the auto-client script
pivot away from its original automotive-centric fo- for future larger tests and ensuring the whole system’s
cus. SMAD’s downfall was its complexity and reliance functionality. Practical testing in the first phase was
on now-inaccessible Kubernetes Helm charts. Kuura carried out by driving the car and ensuring the data was
emerges as the distilled essence of its predecessors, em- stored correctly and its format was manageable.
bodying simplicity and sustainability. By eliminating
non-essential components, Kuura adapts existing func- 3.4. Data Transmission
tionalities with more straightforward tools, significantly
reducing cost and complexity and enabling an
environment conducive to continuous development and
operation.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>MQTT makes it trivial to multi-cast the collected data if</title>
        <p>we want to enable multiple clients to listen to the
generated data simultaneously. One example of such a
scenario is live visualisation of the data while saving it to a
3.3. Vehicle Data Reader database without additional latency. While we could also
save the data and then fetch it from the database, this
The OBD-II is a port designed for diagnostic purposes. would add latency to the visualisation. MQTT also has
It has multiple buses available. These buses include the built-in, easy-to-configure security mechanisms. Setting
CAN bus, SAE-1850 and ISO-9141-2. The automotive up MQTT with SSL is very easy, and configuring the
manufacturers can also provide other networks at their MQTT broker to require client certificates for
communidiscretion [43]. The bus we are most interested in is cation is also very easy. The connection can also be set
the CAN bus. On some vehicles, the CAN bus available up to require a username and password.
at the OBD connector can be protected by a gateway We could also use HTTP or raw TCP/UDP sockets as
device restricting access to some data from the OBD port. an alternative for MQTT. While HTTP ofers security
Unlike the CAN bus inside the car, you must poll the measures similar to MQTT, it does not have multi-cast
OBD port to receive any data. While we could get most by default. While it is not hard to implement, MQTT has
of the data we wanted from the OBD port, some data, it built in and is most likely already done correctly. One
like the steering wheel position, was unavailable. This advantage HTTP has over MQTT is the ability to
commakes the OBD port unsuitable as a data source for our municate directly between two applications, eliminating
purposes, as it would make it quite dificult to drive the the need for a broker in cases where there is only one
virtual car in Unreal accurately. client.</p>
        <p>In the evaluation phase, we collected data from an Raw sockets are the most basic option, and they don’t
OBD-II Bluetooth adapter connected to a Toyota RAV4 come with any of the advanced features included in</p>
        <p>MQTT out of the box. However, they are very versa- 3.5. Cloud Environment
tile and can be used for various purposes. One
advantage the sockets would ofer is the ability to write raw The cloud environment receives data from the MQTT
can data as is to the socket. This would enable saving broker. The environment also has a Python script that
raw can dumps in a database with minimal overhead connects to the broker to receive the data from the vehicle.
if we ever needed/wanted to support it. One problem The script stores all of the messages received by InfluxDB.
with multi-cast solutions is that the provider has no idea The point name and field are derived from the MQTT
if any clients are listening for the sent data unless the topic. The timestamp is also gotten from the MQTT
clients have been programmed to provide feedback when message payload. Since the timestamp is included in the
they are listening. This makes it harder to implement the message, we could use any database solution to store the
provider in a way that it holds the messages in memory data. If the timestamp were missing, however, then a
or saves them locally in case the data is sent to nowhere. time series database would be our only option since the</p>
        <p>In the experiments, a laptop was used as the in-vehicle message times are crucial for playback at a later time.
client running Ubuntu 22.04 LTS, and the script collect- By getting the message time from the provider, we can
ing the data was written in Python using an OBD library ensure that network conditions do not afect the accuracy
[44]. The script writes read values into a CSV file locally of the recorded timestamps.
and publishes them using the MQTT protocol. The back- InfluxDB, a time series database used to store large
end was deployed on CSC’s Rahti as RedHat OpenShift amounts of time-stamped data due to its high
perfordeployments. On the server side, Mosquitto MQTT bro- mance and scalability, was stored at the onset of the
proker forwards the published messages to subscribers. The cess. Storage is essential in handling large amounts of
most important subscriber is a Python service that stores data that emanate from driving vehicles. A Python script
received messages in an InfluxDB instance. As an addi- was then used in the next stage of the data-processing
tional demonstration, Grafana was deployed to provide workflow. This script had two main functionalities: First,
a real-time dashboard for the published and stored data. it reads GPS point data pre-recorded into a JSON file,
The sequence diagram is provided in Figure 3. which is vital in mapping out routes of vehicles.
Secondly, this script establishes a connection with InfluxDB
to retrieve useful information within a particular range.</p>
        <p>This recovery is critical for evaluating the vehicle’s
performance and environmental conditions during various
experiment stages.</p>
        <p>At this point, the processed data goes through an
MQTT broker using a Python script. Once more, this
protocol provides lightweight messaging, providing fast and
reliable real-time information transmission that would
be needed for the simulation environment.
3.6. Simulation Environment
Multiple reasons contributed to the choice of Unreal
Engine 5 game engine, including the capacity to create
realistic simulations of real car driving and the possibility of
driving a car in a simulation, thereby generating
corresponding data. The research aimed to ensure uniformity
between the simulation and actual driving, thus requiring
realistic simulations. Unreal Engine 5 is also open-source,
which meets one of the implementation principles of the
study, making future development as easy as possible.</p>
        <p>The research utilised the MQTT protocol, one of the
key IoT connections and data collection components.
Unreal Engine does not have native MQTT support. For this
reason, we used the NinevaStudios MQTT-utilities
extension with some modifications. This extension allowed
MQTT data communication, which is essential for
collecting data from the simulation, with minor adjustments
made to transfer it to cloud storage securely. Through
this connection, it was possible to develop a dynamic and
interactive simulation environment.</p>
        <p>Lastly, we simulate a car running along received GPS
points as shown in Figure 4. In the simulation, the
vehicle’s movement was driven by speed data acquired from
the MQTT broker. As a result, real-time
synchronisation between the GPS points and speed data gave an
actual representation of the journey made by the
vehicle, hence allowing for the immersion of details about
its performance in diferent circumstances. Such a
holistic approach to data storage, processing, transmission,
and visualisations shows how diverse technologies can
be integrated into high-level vehicular data analysis and
simulation.</p>
        <p>The initial version of the Kuura presented in this paper
has a dynamic road generated as the car moves around,
thus simplifying testing by making it independent of
environmental conditions. This method enables better
lfexibility in the testing process because it does not
require a predefined route or special environmental
circumstances. Generating dynamic roads is essential to
ensure the reliability of the data collection system. This
phase, built on the multiple approaches used in the study,
emphasises adaptability and precision. By generating
the road during runs accuracy of collected data could be
instantly evaluated. It is particularly advantageous to
work within this dynamic environment for the purposes
of identifying and solving prospective issues within a
workflow for data processing that would make it strong
and eficient. This technique also makes Unreal Engine
simulation more elaborate. It allows diferent scenarios
to be run on a platform without sticking to a single static
map, giving the evaluation process more flexibility.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimentation</title>
      <p>4.1. Real-life Experiment</p>
      <sec id="sec-3-1">
        <title>The real-world tests were conducted in the OuluZone</title>
        <p>vehicle testing area using a Toyota RAV4 Hybrid 2019
vehicle. A closed area, such as OuluZone, was chosen
because it allows for assessing the drives and their safety.
The significance of this place is that it helped gather and
analyse information in real-life scenarios, thus allowing
comparison and verification with data collected from
virtual and actual driving instances. Besides being a
recreational driving and sports centre, OuluZone is also
a notable site for research and learning, especially on
autonomous cars and related technologies.</p>
        <p>Several laps were driven during the tests, some with
the cruise control set at diferent speeds (30km/h, 40km/h,
and 50km/h) to facilitate the validation of results in the
simulation with data collected at a constant speed. Laps
were also driven without cruise control at varying speeds.
Driving data was collected during the test via an OBD-II
Bluetooth adapter connected to a laptop running Linux.
This allowed for the vehicle data to be logged and its
format managed. Towards the end of the tests, a USB
adapter enhanced data collection.
4.2. Virtual Experiment</p>
      </sec>
      <sec id="sec-3-2">
        <title>Our virtual experiment utilised Unreal Engine 5.3.2 to</title>
        <p>drive test drive scenarios comparable to our real-world
data collection eforts. In this experiment, we used the
same logger used during actual test drives with a real car,
ensuring a uniform approach to data acquisition and
proving that the logger could be used without changes in both
5. Discussion and Conclusions
environments. We gathered data on speed and time from
the virtual test drive, which can be cross-verified with the
real car’s outputs. The current limitation of real-world
data collection stems from the OBD-II interface’s
inability to provide comprehensive vehicle diagnostics. In the
virtual setting, we collected additional data such as gear,
throttle, brake application, and steering angle. These
were predominantly included for illustrative purposes,
aiming to demonstrate the extensive data collection
possibilities within a simulated environment. It is important
to note that verifying these additional parameters will
become feasible with future access to the CAN bus,
allowing for a more detailed and accurate comparison between
virtual and real vehicle data.</p>
        <p>In this study, we have aimed to bring new insights into
vehicular data collection and the creation of digital twins
by using the Eclipse Kuksa platform and Unreal Engine
5 to simulate driving scenarios. Our main focus was
providing an overview of the simplified vehicular data
collection architecture that can be easily developed for
further projects and verifying the consistency between
real and simulated vehicular data through practical
realworld experimentation.</p>
        <p>Using the MQTT protocol for sending data and Unreal
Engine 5 for simulation has allowed us to compare real
driving data with simulated ones. This method makes
4.3. Experimentation Results digital twins more reliable and allows later use for testing
In our validation process, we specifically focused on com- in many conditions that are hard or expensive to create
paring the collected GPS data and speed data between the in real life, like very bad weather or diferent kinds of
actual and virtual driving tests conducted in Unreal En- trafic situations.
gine 5. As shown in Figure 5, the same InfluxDB database We encountered challenges in data collection via the
successfully contains both real-world data (smad/toyota) OBD-II protocol because it is filtered and does not allow
and virtually collected data (unreal/toyota). This design the collection of all possible data. This limitation
highwill further allow simultaneous analysis of both virtual lighted the need for more comprehensive data acquisition
and real-world data sets, allowing us to expand the digital methods like the CAN bus. The data collection
limitatwin creation capabilities with virtual realities and actual tions prompted us to consider future enhancements in
real-life test runs, independently of the data source. our methodology to achieve a more accurate and
encom</p>
        <p>As illustrated in Figure 6, we successfully mapped the passing digital representation of the vehicle.
collected GPS data onto the 3D model of the racetrack in Our findings open up possibilities for future research
runtime from cloud and verified its accuracy. This demon- directions, including optimising data transmission
methstrates that our virtual environment can accurately repli- ods for improved eficiency and exploring bi-directional
cate real driving conditions. The speed data collected in data flow between the digital twin and the vehicle. Such
the database corresponded with the data obtained in the advancements could potentially enable real-time vehicle
Unreal Engine 5 simulation, confirming the consistency control based on digital twin data.
of data in both real and virtual driving scenarios. While By integrating additional simulation models and
conthe data transmitted from the game engine to the server sidering more sophisticated data collection interfaces, we
was also accurate, at this stage, our primary focus was anticipate that future iterations of this work will address
on verifying the accuracy of speed and time information. the current limitations and unlock new capabilities for
Expanding this experimentation to cover a wider range digital twins in automotive research and development.
of variables is possible in future research. The potential for these technologies to improve vehicle
safety, eficiency, and innovation is immense, paving the</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>The work has been supported by the EU HORI</title>
        <p>ZON project CHIPS-JU CIA FEDERATE (grant number
101139749), Business Finland project 6G Visible (grant
number 10743/31/2022), and the Finnish Research
Council project Northern Utility Vehicle Laboratory
Consortium GO!-RI (grant number 352726).
way for a more interconnected and intelligent
transportation ecosystem.</p>
        <p>Future eforts should be made using the CAN bus
instead of the OBD-II to improve accuracy completeness
and to have access to all possible data the vehicle
provides. Reconsidering data transmission methods, like
MQTT, for more eficient data multicasting is also a
possible future direction. In the future, we are also looking
into sending data from the game engine to the car instead
of just storing it in the cloud, having the car drive in real
life and the game engine simultaneously with as little
latency as possible and importing Eclipse Arrowhead to
extend possibilities with simulation models, such as using
the architecture with Matlab Simulink or corresponding
open-sourced physics modelling software.
[20] B. Schoettle, Sensor fusion: A comparison of sens- security in autonomous vehicles, IEEE
Communiing capabilities of human drivers and highly auto- cations Standards Magazine 5 (2021) 40–46.
mated vehicles, University of Michigan (2017). [33] T. Fuchs, M. Zinser, K. Renatus, B. Bäker, Data
[21] D. Michalík, M. Jirgl, J. Arm, P. Fiedler, Developing model of automotive digital twins, ATZelectronics
an unreal engine 4-based vehicle driving simulator worldwide 16 (2021) 52–57.
applicable in driver behavior analysis—a technical [34] Z. Wang, R. Gupta, K. Han, H. Wang, A.
Ganperspective, Safety 7 (2021) 25. lath, N. Ammar, P. Tiwari, Mobility digital twin:
[22] S. Malik, M. A. Khan, H. El-Sayed, Carla: Car learn- Concept, architecture, case study, and future
chaling to act — an inside out, Procedia Computer lenges, IEEE Internet of Things Journal 9 (2022)
Science 198 (2022) 742–749. 12th International Con- 17452–17467.
ference on Emerging Ubiquitous Systems and Per- [35] M. Palmieri, C. Quadri, A. Fagiolini, C.
Bernardevasive Networks / 11th International Conference schi, Co-simulated digital twin on the network edge:
on Current and Future Trends of Information and A vehicle platoon, Computer Communications 212
Communication Technologies in Healthcare. (2023) 35–47.
[23] A. Dubs, V. C. Andrade, M. Ellis, S. Ganley, B. Kara- [36] G. Bhatti, H. Mohan, R. R. Singh, Towards the future
man, O. Toker, A photo-realistic simulation and test of smart electric vehicles: Digital twin technology,
platform for autonomous vehicles research (????). Renewable and Sustainable Energy Reviews 141
[24] G. Chance, A. Ghobrial, K. McAreavey, S. Lemaig- (2021) 110801.</p>
        <p>nan, T. Pipe, K. Eder, On determinism of game [37] D. Chen, Z. Lv, Artificial intelligence enabled digital
engines used for simulation-based autonomous ve- twins for training autonomous cars, Internet of
hicle verification, IEEE Transactions on Intelligent Things and Cyber-Physical Systems 2 (2022) 31–41.</p>
        <p>Transportation Systems (2022). [38] S. Kochanthara, Y. Dajsuren, L. Cleophas,
[25] W. Jansen, E. Verreycken, A. Schenck, J.-E. Blan- M. van den Brand, Painting the landscape of
quart, C. Verhulst, N. Huebel, J. Steckel, Cosys- automotive software in github, in: Proceedings
airsim: A real-time simulation framework ex- of the 19th International Conference on Mining
panded for complex industrial applications, in: Software Repositories, 2022, pp. 215–226.
2023 Annual Modeling and Simulation Conference [39] S. Niaeetin, R. Šandor, G. Stupar, N. Tesliae,
Maximiz(ANNSIM), IEEE, 2023, pp. 37–48. ing the eficiency of automotive software
develop[26] Q. Liu, D. Xie, S. Hu, J. Wu, Research on dynamic ment environment using open source technologies,
performance simulation of in-wheel motor electric in: 2018 IEEE 8th International Conference on
Convehicle based on carsim-simulink, in: Journal of sumer Electronics-Berlin (ICCE-Berlin), IEEE, 2018,
Physics: Conference Series, volume 1820, IOP Pub- pp. 1–3.</p>
        <p>lishing, 2021, p. 012109. [40] Y. Zhang, Y. Ning, C. Ma, L. Yu, Z. Guo,
Empiri[27] A. Fuller, Z. Fan, C. Day, C. Barlow, Digital twin: En- cal study for open source libraries in automotive
abling technologies, challenges and open research, software systems, IEEE Access (2023).</p>
        <p>IEEE access 8 (2020) 108952–108971. [41] F. A. da Silva, A. C. Bagbaba, A. Ruospo, R. Mariani,
[28] J. A. Ross, K. Tam, D. J. Walker, K. D. Jones, To- G. Kanawati, E. Sanchez, M. S. Reorda, M. Jenihhin,
wards a digital twin of a complex maritime site for S. Hamdioui, C. Sauer, Special session: Autosoc-a
multi-objective optimization, in: 2022 14th Interna- suite of open-source automotive soc benchmarks,
tional Conference on Cyber Conflict: Keep Moving! in: 2020 IEEE 38th VLSI Test Symposium (VTS),
(CyCon), volume 700, IEEE, 2022, pp. 331–345. IEEE, 2020, pp. 1–9.
[29] S. Maulik, D. Riordan, J. Walsh, Dynamic reduction- [42] H. Hirvonsalo, P. Seppänen, On deployment of
based virtual models for digital twins—a compara- eclipse kuksa as a framework for an intelligent
tive study, Applied Sciences 12 (2022) 7154. moving test platform for research of autonomous
[30] A. M. Madni, C. C. Madni, S. D. Lucero, Leveraging vehicles, in: Proceedings of the 2nd Eclipse
Redigital twin technology in model-based systems search International Conference on Security,
Artiengineering, Systems 7 (2019) 7. ifcial Intelligence, Architecture and Modelling for
[31] D. Piromalis, A. Kantaros, Digital twins in the auto- Next Generation Mobility, RWTH Aachen
Univermotive industry: The road toward physical-digital sity, 2021.
convergence, Applied System Innovation 5 (2022) [43] K. McCord, Automotive Diagnostic Systems:
Un65. derstanding OBD I and OBD II, CarTech Inc, 2011.
[32] S. Almeaibed, S. Al-Rubaye, A. Tsourdos, N. P. Avde- [44] Obd library for python 3, https://github.com/
lidis, Digital twin analysis to promote safety and brendan-w/python-OBD, 2023. Accessed:
2023-1112.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>