=Paper=
{{Paper
|id=Vol-3028/D1-03-ESAAMM_2021_paper_7
|storemode=property
|title=APP4MC RaceCar: A Practical ADAS Demonstrator for Evaluating and Verifying Timing Behavior
|pdfUrl=https://ceur-ws.org/Vol-3028/D1-03-ESAAMM_2021_paper_7.pdf
|volume=Vol-3028
|authors=Anand Prakash,Lukas Krawczyk,Carsten Wolff
}}
==APP4MC RaceCar: A Practical ADAS Demonstrator for Evaluating and Verifying Timing Behavior==
APP4MC RaceCar: A Practical ADAS
Demonstrator for Evaluating and Verifying Timing
Behavior
Anand Prakash Lukas Krawczyk
IDiAL Institute IDiAL Institute
Dortmund University of Applied Sciences and Arts Dortmund University of Applied Sciences and Arts
44227 Dortmund, Germany 44227 Dortmund, Germany
anand.prakash@fh-dortmund.de lukas.krawczyk@fh-dortmund.de
Carsten Wolff
IDiAL Institute
Dortmund University of Applied Sciences and Arts
44227 Dortmund, Germany
carsten.wolff@fh-dortmund.de
Abstract—The computational demands of safety-critical ADAS nature and must follow hard real-time constraints. Adaptive
applications on autonomous vehicles have been ever-increasing. cruise control, anti-lock braking systems, lane keep assistance,
As a result, high performance computing nodes with varying obstacle detection/avoidance systems, and traffic sign recog-
operating frequencies, and inclusion of different sensors have
been introduced, which has resulted in introduction of het- nition are just a few examples of an ADAS application with
erogeneous architectures with high complexity. This complexity high computational requirements.
has led to challenges in analyzing a system’s timing behavior, Automotive OEMs are gradually moving towards higher
such as determining the end-to-end response time of high-level levels of driving automation, thereby increasing the complexity
functionality as well as a real-time application’s latency. Although of these applications. The number of different sensor, such as
several approaches to tackle this issue have been proposed, their
practical verification on real-life applications is still an open issue. LiDARs, stereo cameras, and Radars, installed on a vehicle
Accordingly, this work proposes an automotive demonstrator has also increased significantly in recent times. This accounts
that will be used in evaluating the timing behavior of ADAS for the huge amount of data from different sensors that needs
applications in a real-life environment using methodologies such to be processed in real time, which leads to a computational
as tracing, profiling and static analysis. The APP4MC RaceCar bottleneck. In order to deal with the computational bottleneck,
is a work in progress four-wheel drive demonstrator built on a
Traxxas 1/10 scale RC car platform. It is equipped with state- heterogeneous platforms are used to improve the overall per-
of-the-art sensors like LiDAR, ZED2 stereo camera and hosts formance. Hardware accelerators such as GPUs and FPGA are
multiple heterogeneous on-board computers such as Nvidia AGX used in co-ordination with CPUs to increase the computational
Xavier to replicate a full size autonomous vehicle. In this paper, power. Semiconductor companies provide various hardware
we describe the need for making such a demonstrator with an platforms for such ADAS application. For example, Renesas
overview of the heterogeneous components used in it. Moreover,
we further describe the system architecture as well as the data R-Car-H3 [1] System-on-Chip (SoC) is a high performance
flow through event-chain task model for the ADAS application platform specifically designed for In-Vehicle infotainment
which is based on Waters Challenge 2019 industrial case study. and driving safety support. The NVIDIA Drive [2] platform
Index Terms—Heterogeneous System, Radio-Controlled Cars, provides a range of developer kits for Autonomous vehicle
Electronic Speed Controller, RT-Linux Kernel. along with sensor suite.
From the architecture point of view, an ADAS application
I. I NTRODUCTION can be divided into two categories. In a centralized com-
From an abstract point of view, a typical Advanced Driver puting architecture, the raw data from sensors is passed to
Assistance System (ADAS) has to perform three tasks - a centrally located high performance computer that performs
perception, planning, and control. As part of the perception data processing. However, the cost, performance, and power
task, various sensors are used in an ADAS application. The requirements for such a processing unit are usually very high.
vehicle status is provided as a feedback, which along with Generally, the sensors have their own processing cores where
the processed sensor data, plans and controls the path of an the initial data is filtered and then sent to the main processing
autonomous vehicle. Each of these tasks are computationally unit for further computation. This kind of distributed com-
expensive, which requires high performance processing cores. puting architecture is the more conventional approach. The
At the same time, these applications are safety-critical in application does not rely on a single main processing core,
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
which provides system redundancy for functional safety along of a chassis, motor controller, actuators, processing units,
with reduced processing requirements. sensors, and power supply. However, if a system is designed
Each task in an ADAS application has different computa- for a specific purpose such as parallel parking, the platform
tional demands. For example, the computational requirement may consist of a minimum required components. Automatic
for an object detection algorithm from camera input is much Parallel Parking of RC Car platform [4] hosts a custom-
higher than that of an object range detection from ultrasonic made circuit board consisting of an IC chip, amplifier and
sensors. Therefore, in order to reduce the latency of the overall radio receiver along with electric motors and antenna. It
application, it becomes necessary to map these tasks to optimal consists of a simple parallel parking application developed
processing cores. The latency of these applications can be on a single IC chip with no requirement of high perfor-
further reduced by using parallel programming models such mance processing nodes. A more complex application would
as MPI, CUDA, etc. These tasks should not only follow hard- require a heterogeneous system. For instance, Duckietown [5]
real time constraints but must also be deterministic at the same provides minimal autonomy and basic features such as lane
time. Using a Real-Time Operating System (RTOS) for such following by utilizing e.g. a Raspberry Pi that is attached to
applications provides a better control over the handling of tasks a monocular camera. Another example of a low cost, low
based on the scheduling and preemption model used in it. power autonomous robot can be found in Wolfbot[6] which
Development of autonomous driving applications [3] poses is based on Beaglebone Black development platform. Even
several challenges in terms of timing analysis, efficient map- though a realistic computer vision pipeline is implemented
ping and scheduling of tasks, and maintaining their deter- on these platforms, the GPU capability available on the on-
ministic behavior. These applications undergo rigorous testing board computer is not utilized. At the same time, the sensor
and must pass all related safety standards such as Automotive technology used in these platforms are not very advanced.
Safety Integrity Level (ASIL) before going into production. An Higher the complexity of ADAS application, more is the
efficient way to conceptualize a new feature or a functionality computational requirement. Such applications require dedi-
is to implement it in early design phases on an RC (Radio- cated accelerators to process the data in real-time. The RC
Controlled) Car model. A model based approach can be used platform Go-CHART [7] makes use of external GPU capability
to design a heterogeneous application on these platforms. This to overcome the computational bottleneck. The sensor data
not only reduces the development cost and risk factor but also is transmitted for further processing to Jetson TX2 board
helps in determining the feasibility of applications. Besides, using wireless communication. For better performance and
the RC platform provides the flexibility to evaluate and bench- reliable ADAS application, it is recommended to have an on-
mark various performance metrics such as latency, end-to-end board high performance computer on the RC platform. Several
response time, memory contention, execution time, and so on. open source self-driving RC car platforms such as MuSHR
Since there are multiple sensors and ECUs interacting with [8], JetRacer [9], and Donkey Car [10] come with onboard
each other over different communication interfaces, there is Jetson Nano processors and advanced sensors. MuSHR has
a possibility of communication interference adding to the la- an additional Electronic Speed Controller (ESC) component,
tency which affects the overall performance of the application which provides a better control over the actuators. MuSHR,
in real-life scenarios. These metrics can be analyzed, to further JetRacer, and Donkey Car are based on centralized compute
improve the software/hardware design of the system. architecture where there is only one processing node which is
In this work, we design and implement an ADAS ap- responsible for processing all sensor data as well as controlling
plication on a 1/10 scale RC car. The demonstrator model the actuators. The performance of NVIDIA Jetson Nano is suf-
will provide a platform to measure the response time of the ficient for applications involving computer vision algorithms.
implemented ADAS application, memory contention, latency However, relying on a single processing node might not be
caused due to communication interference in real-life environ- efficient for an application having multiple tasks with real-
ments. The remainder of the paper is structured as follows: time constraints.
Section II provides an overview of related work. Section III Some autonomous miniature car models use two processing
describes the existing challenges with respect to the timing nodes, one with the capability of GPU is dedicated for execut-
analysis of an ADAS. Section IV gives an overview of the ing machine learning algorithms and the other processing node
heterogeneous components and sensors used in the proposed for motor control. AutoRally [11] is a high-end RC car built
demonstrator model. The system architecture is explained in on a 1/5-scale platform which uses Intel processing unit along
Section V. Section VI provides a brief overview of evaluating with Nvidia GTX accelerator for scaled autonomous driving.
the timing behavior based on the generated trace data. Finally, It is based on distributed compute architecture where each
Section VII draws the conclusion and road map for the future processing node is responsible for a specific task. However, the
work. RC car platform does not include a LiDAR sensor. Instead of
hosting a stereo camera, it comes with two monocular cameras
II. R ELATED W ORK which makes the system more complex. MIT Racecar[12] plat-
Numerous RC cars have been developed for education and form houses state-of-the-art sensors and computing hardware,
research purpose, with their complexity depending upon their placed on top of a powerful 1/10-scale mini race car.
particular use case and purpose. A typical RC car consists A model based approach can be taken in designing a system
based on Operator-Controller Module (OCM) [13] architec- efficient. The availability of a processor at a given point of time
ture. This helps in better analysis of the performance metrics must be considered while scheduling these tasks. This makes
at each architectural level. It is worth mentioning that none scheduling of these tasks non-trivial. This can affect the hard
of the above mentioned RC mini cars platforms are based on real time constraint and the efficiency of the algorithm. For
the OCM architecture. The Industrial Waters Challenge 2019 example, Wang et al. [21] provided an insight of how effective
[14] provides a case study on a prototypical ADAS application parallelism can improve the performance of a LDA (Lane
modelled on a heterogeneous platform using Amalthea [15]. Detection Algorithm) compared to a naive parallel approach.
The implementation of this model can be used to determine the RTOS such as QNX Neutrino, SAFERTOS uses different
performance metrics of a heterogeneous ADAS application. scheduling techniques in executing the tasks, which affects
MIT Racecar platform satisfies the requirements for design- the latency of the application [22].
ing an ADAS application based on Waters Challenge 2019. Another major aspect of an ADAS application is efficient
However, its compute capabilities can be further enhanced by mapping of tasks to the processing units. The performance
using next-generation and more powerful development boards. of an application is highly dependent on optimum utilization
The heterogeneity of the system can be further enhanced by of the resources available on a heterogeneous platform. The
designing the system based on OCM architecture. This has led Waters Challenge 2019 [14] focused on developing an initial
us to the development of a new RC car platform for applying model based on Amalthea [15] which can be further used
the methods to measure the end-to-end latency of an ADAS in deriving the performance metrics of the application. It is
application on a heterogeneous system. also worth mentioning that the model is derived for a specific
hardware platform (NVIDIA Jetson TX2 SoM). To further
III. P ROBLEM S TATEMENT optimize the application, evolutionary optimization approaches
Researchers have come up with many novel solutions to such as genetic algorithm can be used for allocation of tasks
analyze and benchmark the performance of ADAS application [23]. In order to determine the worst case response time of
on the basis of several performance metrics. The existing work an application, the event chain in the critical path must be
in this area can be categorized on the basis of architecture, high considered. Tracing format such as BTF (Best Trace Format)
computation algorithms, sensor fusion, scheduling and map- [24] can be used to analyze the timing, performance, and
ping of tasks on processing cores. New functions in an ADAS reliability of the system. Similar approach can be used for
require access to different communication interfaces, which mapping of tasks, calculation of the response time in real life
makes the system more complex. AUTOSAR (AUTomotive environment for an application running on RC car platform.
Open System ARchitecture) is based on OSEK specifications,
and provides a three layer architecture to develop an automo- IV. OVERVIEW OF H ETEROGENEOUS C OMPONENTS
tive application.
In order to precisely replicate the real life scenario, the
Since most ADAS applications rely on computer vision and
demonstrator must be modeled similar to existing vehicles
image processing algorithms which require parallel processing,
with autonomous functionality. Therefore, it has been designed
General Purpose GPUs (GP-GPUs) have started playing an
with multiple sensors and processing units. The remainder of
important role. Parallel portions of an application are executed
this section briefly describe the architecture of the components
on GP-GPUs in terms of e.g. kernel programming model
used in the system.
[16]. Therefore, the response time of an application can be
determined by the execution time of a kernel [17] on a
A. NVIDIA Jetson AGX Xavier
given computing unit. However, GPUs are proprietary systems,
which restricts the knowledge of their internal working. This The NVIDIA Jetson AGX Xavier SoM [25] includes a
makes the prediction of latency caused due to GPUs uncertain. compact carrier board and Jetson Xavier module. It is a
Another way to determine the execution time of a task on powerful AI computer designed for autonomous machines.
a GPU is presented in [18]. In recent times, reconfigurable It provides performance to handle sensor fusion, localization
platforms such as FPGAs are also being exploited as acceler- and mapping, obstacle detection, and path planning algorithms
ators for image processing algorithms. PYNQ [19] is a FPGA critical for autonomous driving. A 40-pin expansion header
based platform that hides the underlying hardware details supports some standard communication interfaces such as I2C,
and exposes a python interface to use any computer vision UART, SPI and CAN. A M.2 Key E slot can be used to add
framework such as OpenCV [20]. WiFi/LTE capability to the board. An overview of the com-
Typically, an ADAS application consists of multiple tasks ponents of the NVIDIA Jetson AGX Xavier is illustrated in
executing on dedicated cores. These programs consist of Figure 1, followed by an in-depth description in the following
multiple event chain sequences where the input of next task subsections.
depends on the previous task’s output. Each task has its own 1) Processing Unit: The CPU complex (CCPLEX) is di-
execution time. Varying speed of processors must be taken vided into four clusters. Each cluster contains two identical 64-
into account while scheduling a task on that processor. At the bit Carmel processors that are compliant to ARM’s v8.2 ISA
same time, an optimum scheduling sequence of these tasks architecture. A high performance System Coherency Fabric
needs to be determined to make the overall application more (SCF) connects all CPU clusters, thus enabling simultaneous
Carmel CPU Complex share a common 4MB L3 cache. Each SM in GPU has an
additional L1 cache of 128KB as well as access to a common
CPU cluster 1 CPU cluster 2 CPU cluster 3 CPU cluster 4
512KB L2 cache that is shared by all SMs.
4) Speed: Each CPU can operate at a maximum frequency
Dual core ARMv Dual core ARMv Dual core ARMv Dual core ARMv of 2265 MHz. The GPU is operating at a frequency of
8.2($D 64KB + 8.2($D 64KB + 8.2($D 64KB + 8.2($D 64KB +
$I 128KB) $I 128KB) $I 128KB) $I 128KB) 854 MHz, which can be boosted up to 1377 MHz. The
GPU provides workstation-class performance with up to 32
TeraOPS (TOPS) of peak compute and 750 Gbps of high-speed
L2 cache L2 cache L2 cache L2 cache I/O. The maximum system memory bandwidth is 137GB/s
(2 MB) (2 MB) (2 MB) (2 MB)
32GB providing a low latency in accessing it.
L3 Cache (4 MB) LPDDR
256 bits 5) Power utilization: Jetson AGX Xavier enables new lev-
DRAM
Volta GPU els of power efficiency. Users can configure operating modes
SM for their applications at 10W, 15W, or 30W.
SM
SM
SM
B. Beaglebone AI
Tensor
CUDA Cores The Beaglebone AI [26] is used as secondary processing
Cores
unit in our RaceCar platform. The board is built around a Texas
L1 cache (128 KB) Instruments (TI) AM5729 system-on-chip (SoC). Its comput-
L2 Cache (512 KB)
ing capabilities enable the user to develop machine learning
applications with ease. It supports all standard communication
Fig. 1. NVIDIA Jetson AGX Xavier Architecture. interfaces over an 46-pin header on either side of the board.
Additionally, it also supports a 16-bit LCD interface. The
board was developed specifically for AI applications and has
operation of all CPU cores for a true heterogeneous multi- an integrated neural engine that processes complex algorithms
processing (HMP) environment. The SCF also connects the on the hardware level. The block diagram of AM5729 SoC is
CPU clusters to DRAM through a Memory Controller Fabric depicted in Figure 2.
(MCF) and I/O blocks in the Memory Mapped I/O (MMIO)
space through an ARM Advanced eXtensible Interface (AXI). AM5729 SoC
2) Hardware Accelerators: The NVIDIA GPU GV10B is Main Processing Unit Image Processing Unit Video Co-Processor
ARM Cortex ARM Cortex
based on Volta Architecture which features 512 shading units, A15 A15 ARMv7E-M4 ARMv7E-M4 4 x Embedded Vision
$D 32KB + $I $D 32KB + $I Engine
32 texture mapping units, and 16 Render output units (ROP). 32KB L1 32KB L1
32KB L1 Cache 32KB L1 Cache
Program
Cache Cache Emulation
It also includes 64 tensor cores, which help to improve the L2 Cache Cache
2 MB L2 Cache
speed of machine learning applications. Additionally, the Volta 16KB ROM 64KB RAM
32-bit Vector Co-
48KB Bootable ROM
GPU architecture features a new Streaming Multiprocessor RISC Core Processor
Digital Signal Processing Unit
(SM) which allows an energy efficient, high performance Graphics Processing Unit
TMS320C66x TMS320C66x Interconnect
PowerVR PowerVR
computation of tasks that have processing requirements of $D 32KB + $I $D 32KB + $I SGX544 SGX544
32KB L1 32KB L1
large and complex data streams. Each SM is partitioned into Cache Cache 32KB 32KB 32KB
288KB L2 Cache RAM RAM RAM
four separate blocks referred as Streaming Multiprocessor Par- 288KB L2 Cache
256KB Error
titions(SMPs). Each SMP contains its own instruction buffer, Configurable
34KB MMU DMA
Detection
SRAM Graphics Accelerator
SRAM
scheduler, CUDA cores, and Tensor cores. GPUs core graphics
functions are performed inside the Graphics Processing Cluster
High Speed Interconnect
(GPC), which is a dedicated hardware block for computation,
rasterization, shading, and texturing. It is also augmented with 16-bit 1GB DDR3L
an image signal processor (ISP), a multimedia engine, a pro-
grammable vision accelerators (PVAs), and a pair of NVIDIA Fig. 2. AM5729 SoC Architecture.
deep-learning accelerators (NVDLAs). These accelerators can
be used in parallel or in conjunction with CPU and GPU cores. 1) Processing Unit: The Beaglebone AI comes with an
3) Memory: The Xavier platform features a distributed ARM-A15 based dual-core processor. It supports the standard
shared memory architecture. It comes with a system memory ARM instruction set with hardware virtualization support.
of 32GB 256-Bit LPDDR4x and provides a eMMC storage Two dual-core Programmable Real-Time Units (PRU) are
of 32GB. CPU as well as GPU have direct access to system present to provide ultra low latency. Access to these PRUs
memory. Each CPU core includes 128 KB Instruction (I-cache) are enabled via expansion headers. These dual-core PRUs are
and 64 KB Data (D-cache) Level 1 caches, whereas a 2 MB L2 based on ARMv7E-M architecture for general purpose usage,
cache is shared by both cores in a single cluster. All clusters particularly real-time control.
2) Hardware Accelerators: The platform comes with a In automated driving, monocular vision camera can detect only
3D graphics processing unit (GPU) subsystem based on the classified objects, whereas stereo cameras replicate human
dual POWERVR SGX544 cores to support general embed- vision, thus allowing accurate extraction of depth information,
ded applications. The GPU can process different data types such as the distance of a moving object.
simultaneously, such as: pixel data, vertex data, video data, The ZED2 [28] is a stereo camera that provides high
and general-purpose data. Additionally, the platform is also definition 3D video and neural depth perception of the en-
equipped with a Vivante based 2D graphics accelerator. There vironment. It has been designed for a variety of challenging
is a separate dedicated hardware for the machine learning applications, such as autonomous navigation and mapping
libraries called Embedded Vision Engine (EVE) which func- to augmented reality and 3D analytics. It supports video
tions as a programmable image and vision processing engine. streaming with a maximum field of view of 120 degrees
Additionally, the platform also has two DSP subsystems for and a maximum resolution of 2.2K at 15 frames per second
audio processing as well as general purpose image and video (fps). For applications which require higher fps, the camera
processing. can provide a maximum rate of 100 fps with the resolution
3) Memory: Similar to AGX platform, Beaglebone AI has a getting compromised relatively. Any object within the depth
distributed shared memory architecture. It comes with a 16GB range of 0.3m to 20m can be detected through the camera.
eMMC device and features a SD card slot. The system memory It has a built-in Inertial Measurement Unit (IMU), barometric
is 16-bit 1GB DDRL device. Each core of the main processing pressure sensor, and magnetic sensor, and can acquire inertial,
unit has a 32KB instruction and 32KB data L1 cache along elevation, and magnetic field data in real time.
with shared 2MB L2 cache. The unit also features a 48KB ZED2 camera is compatible with NVIDIA GPU platforms.
bootable ROM. Finally, the platform provides 32KB data and Therefore, the computation power of Jetson AGX platform can
32KB instruction L1 cache for the DSP unit, 32KB shared be leveraged in creating a real-time application.
L1 cache memory for the PRUs, and a system level cache of
128KB for the GPUs. E. Slamtech Lidar Sensor
4) Speed: The system memory operates at a frequency Although cameras provides much of the sensing capabilities
of 553MHz yielding an effective rate of 1066Mb/s on the for an autonomous vehicle, they suffer various limitations
DDR3L bus allowing for 4GB/s of DDR3L memory band- when dealing with e.g. shadows or bright lights, which may
width. The main processing unit operates at a frequency of cause confusion in taking decisions. Moreover, calculating
1.5MHz, whereas the GPU operated at a maximum frequency an object’s distance from raw images usually comes at high
of 532MHz. computation cost and requires correspondingly powerful com-
puters. A viable solution to reduce these limitations is using
C. Vedder Electronic Speed Controller other sensing technologies such as LiDAR or RADAR. The
Most modern Electronic Speed Controller (ESC) consist of a RaceCar uses the capability of LiDAR along with stereo
microcontroller, which take input signals to regulate the speed camera for sensing the environment in decision-making.
of an electric motor. VESC [27] is an open source ESC that en- A LiDAR uses lasers to sense the surrounding environment.
ables advanced customization options with multiple interface The concept for distance determination remains similar to
support. The VESC 6 MKIV uses STM32F4 microcontroller RADAR, where the distance is calculated based on the du-
chip. It operates within the voltage range of 11.1V to 60V and ration between the transmitted signal and the reflected signal
provides a continuous current of 80A, which can reach up to received from the object. LiDARs have extremely fast response
120A in burst mode. Moreover, it can read its 3D orientation times which gives the processing units on the autonomous cars
in space, 3 axis acceleration values, and directions via the in- ample amount of time to react to the changing environment.
built Inertial Measurement Unit (IMU). One of its primary advantage is precision and accuracy.
It is equipped with multiple communication interfaces and The RPLIDAR A3M1 [29] used in RaceCar is the next
sensor ports. Hall sensors allow precise and powerful rotation generation low cost 360 degree 2D laser scanner (LIDAR)
of motor rotors from a random position. Single Wire Debug developed by SLAMTEC. It can take up to 16000 samples
(SWD) provides an interface for debugging, diagnosis of real- of laser ranging per second with high rotation speed. The
time data on STM controller. Along with the connectors system can perform 2D 360-degree scan within a 25-meter
for Brushless DC (BLDC) motor and servo motor, it also range. It must be noted that the distance range for dark or
provides standard interfaces such as I2C, UART which allows less reflective objects is limited to 10m. The generated 2D
its integration with other micro-controllers such as the Beagle- point cloud data can be used in mapping, localization, and
bone AI. An additional CAN Bus interface allows integrating object/environment modeling. The typical scanning frequency
multiple VESC devices into an array. of LiDAR is 10Hz (600rpm), and the frequency can be freely
adjusted within a range from 5 to 20Hz according to the
D. ZED2 Camera specific requirements. With the 10Hz scanning frequency, the
Cameras serve as a crucial component in enabling machine sampling rate is 16kHz and the angular resolution is 0.225
vision and surroundings awareness. Based on the vision, degree. It is worth mentioning that it provides a rotation
camera can be classified as monocular vision or stereo vision. speed detection and adaptive system as it adjusts the angular
resolution automatically according to the actual rotating speed.
ZED2
The LiDAR is augmented by a DSP unit which takes the input Camera
RP LiDAR
from the vision acquisition system, processes the sampled data
and provides the output distance, angle values between the
object and LiDAR. Jetson AGX Xavier Carrier Board
The RPLiDAR A3M1 can either be operated in enhanced
mode or outdoor mode. The enhanced mode is meant for NVIDIA Jetson
indoor environments and provides greater performance com- AGX Xavier
USB Hub
pared to outdoor mode, whereas the outdoor mode comes at
an increased reliability. The RPLiDAR needs a 5V supply for
powering the range scanner core and motor system.
MCP2515 Power bank
V. S YSTEM A RCHITECTURE
LiPo Battery MCP2515
The system architecture is coarsely based on the concept
of Operator-Controller-Module (OCM) [13]. The OCM ar-
chitecture helps in realization of a self-optimizing complex
system with adaptive behavior over changing environmental VESC Beaglebone AI
conditions. It can be structured into three levels - Controller,
Reflective, and Cognitive Operator. The block diagram in
Figure 3 depicts the overall system architecture. The VESC
acts as a controller operator as it has direct access to the DC Motor Servo Motor
actuators and operates under hard-real time conditions. The
Beaglebone AI acts as Reflective operator. It receives the
information about motor speed, steer, and acceleration value Fig. 3. APP4MC RaceCar Block diagram
from the VESC. It does not have direct access to the actuators,
but regulates and supervises the VESC. At the same time,
it acts a communication interface between the VESC and with any standard ESC. Additionally, it also provides a high-
NVIDIA Jetson AGX Xavier. The Jetson AGX Xavier board torque digital steering servo with Futaba connectors. The
is part of Cognitive Operator as it deals with processing the Traxxas 2075 steering servo has a transit time of 0.16 seconds
sensor data such that it adapts to the changing environment which delivers a responsive steering.
conditions. d) Communication Interfaces: Our demonstrator uses
some standard communication protocols to interact with differ-
A. Hardware Architecture ent components on the RaceCar. The data transfer between the
Beaglebone AI and VESC must be in full-duplex mode. Since
The Traxxas chassis comes with a built-in ESC. However, the amount of data exchanged between them is not very high,
for the purpose of better control and customization, the built-in a UART interface with a baud rate of 115200 is sufficient for
ESC is replaced with the VESC. the application. CAN interface is one of the standard protocol
a) Power Management: The platform consists of two used in automotive applications. The data transfer between
power sources - 3S LiPo battery and power bank. The LiPo the Beaglebone AI and VESC must also be in full-duplex
battery is dedicated to provide power supply to VESC using mode. A CAN bus over SPI interface is established between
XT-90 connector. The Jetson AGX Xavier board is powered them for data exchange. For this purpose, MCP2515 breakout
with a 19V power supply from Patona power bank. An board is used, which provides a CAN Bus transceiver over
active USB is also connected to the power bank. The active SPI interface.
USB hub consists of four USB ports, one is used to power
up Beaglebone AI. The power supply to RPLiDAR is also B. Software Architecture
provided by the USB hub. A modular software stack for individual components is
b) Sensing: The sensing technology installed on the plat- depicted in Figure 4.
form consists of ZED2 stereo camera and RPLiDAR A3M1. 1) Jetson AGX Xavier Software Stack: To address the real-
Both the sensors provide a USB interface and are directly time constraints of the application we have enabled RT-Linux
connected to Jetson AGX board. Additionally, the VESC as to the NVIDIA Jetson Xavier platform. This signifies that
well as ZED2 camera comes with an in-built IMU sensor among all the threads ready for execution, the one with
which can be used to determine the orientation, acceleration the highest priority will be executed. The Linux kernel pro-
of the RC car. vides two real-time scheduling policies (SCHED_FIFO and
c) Actuators: The Traxxas platform already comes with SCHED_RR) that apply an individual arbitration in case of
a high-speed performance Velineon brushless DC motor. The tasks having same priorities. Non-real-time tasks are scheduled
DC motor has a 3.5mm bullet connection interface to integrate following the SCHED_NORMAL policy.
Application Layer of the road boundaries and the shape of each lane. The
BLDC Application
output of this task is a matrix of points representing the
System Library lane boundaries within the road, which is sent to the
ChibiOS/HAL
Linux Kernel RTID RTID Planner task.
Device driver • Detection - The detection task is responsible for detecting
Low Level Driver
RT Scheduler
and classifying the objects within the visual range of the
ChibiOS/RT
camera. The output of this task is sent to the Planner task.
RT-Linux Kernel
UART • Planner - The main purpose of this component is to
ARM Core
ARM Core calculate and follow a vehicle trajectory. The targeted ve-
VESC
Beaglebone AI hicle motion parameters are passed to the CAN Controller
CAN over SPI
task.
• Car Controller - The main purpose of this task is to get
Application Layer the steering angle, speed, and acceleration value from the
Planner task and provide it to the reflective operator over
RTID RTID RTID
System Library CAN Bus.
Linux Kernel
Device driver
CPU Camera Input
High Computation GPU
RT Scheduler Algorithm
RT-Linux Kernel SFM Lane Detection
Accelerator (GP-GPU)
CAN Polling depth estimation lane boundaries
ARM Cores
vehicle_status
vehicle_status
steer,
pose speed
Host Memory Localization Planner Controller
LiDAR Input
Nvidia Jetson AGX Xavier
Detection
Sensor Data
Fig. 5. APP4MC RaceCar Task Model with dataflow
Fig. 4. APP4MC RaceCar Software Architecture
2) Beaglebone AI Software Stack: The RT-Linux Kernel
The application is implemented on top of the Operating has been ported on Beaglebone AI.
System (OS) layer, further broken into several tasks, and The application layer consists of two threads - one in-
mapped on different cores. Those tasks that have high com- teracting with VESC to configure and send the commands
putational demands are offloaded to GP-GPUs. CPUs apply to it, the other to communicate with Jetson AGX Xavier
a fully preemptive fixed priority scheduling policy, whereas board. The Beaglebone AI provides the feedback regarding
GPUs follow weighted round-robin scheduling. The system the current speed, steering angle, acceleration, and orientation
memory is shared between the CPU cluster and GP-GPUs for of the RaceCar to Jetson AGX Xavier board over the CAN
better performance. Figure 5 depicts the task model along with bus.
the data flow from sensors to actuators. The task definition of 3) VESC Application: As the VESC directly interacts with
ADAS application defined in this paper is mainly derived from the actuators, it must also conform to the real-time constraints
the Waters Challenge 2019 [14]. required for controlling the actuators. The VESC firmware
• Localization - The localization task is responsible for
is built using RTOS ChibiOS. It is a light weight operating
determining the relative position of the RC car on a given system providing deterministic behavior of real-time multi-
environmental map. It takes the point cloud data from threaded applications. The scheduling of threads on VESC
LiDAR input and merges it with the RC car motion status is possible in two ways - Round Robin scheduling and
to estimate the demonstrator’s position. Cooperative Scheduling.
• Can Polling - This task gets the key information about
A brushless DC (BLDC) motor application is implemented
the demonstrator motion parameters from the on-board on top of ChibiOS RTOS. This application receives commands
CAN bus and sends it to the Localization and Planner from Beaglebone AI over UART interface and performs the
task. respective operation.
• Structure From Motion - This task is responsible for
VI. T IMING A NALYSIS USING T RACING FRAMEWORK
estimating the depth of an object based on the stereo
vision camera images. The distance of the object is passed Tracing the software application is one of the efficient
to Planner task for further processing. approaches in determining its timing behavior. BTF [24]
• Lane Detection - This task provides accurate locations is a CSV (Comma-Separated Values) based format used in
Trace generation R EFERENCES
BTF Tracing
Framework Execution on [1] Renesas r-car-h3. [Online]. Available: https://www.renesas.com/
Generated BTF jp/en/products/automotive-products/automotive-system-chips-socs/
heterogeneous Trace File
ADAS Application platform r-car-h3-m3-starter-kit
[2] Nvidia drive. [Online]. Available: https://developer.nvidia.com/drive
[3] R. Okuda, Y. Kajiwara, and K. Terashima, “A survey of technical
trend of adas and autonomous driving,” in Technical Papers of 2014
International Symposium on VLSI Design, Automation and Test, 2014,
Timing Analysis pp. 1–4.
BTF Trace Visualization
[4] B. Xiao, C. Xu, and L. Xu, “Notice of violation of ieee publication
Derive Performance principles: Automatic parallel parking of rc car using distance sensors,”
tool
Metrics (ATDB) in 2009 Second International Conference on Future Information Tech-
(Eclipse Trace Compass)
nology and Management Engineering, 2009, pp. 525–528.
[5] L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, Y. F.
Fig. 6. Timing Analysis Workflow using BTF Trace. Chen, C. Choi, J. Dusek, Y. Fang, D. Hoehener, S. Liu, M. Novitzky,
I. F. Okuyama, J. Pazis, G. Rosman, V. Varricchio, H. Wang, D. Yershov,
H. Zhao, M. Benjamin, C. Carr, M. Zuber, S. Karaman, E. Frazzoli,
D. Del Vecchio, D. Rus, J. How, J. Leonard, and A. Censi, “Duckietown:
recording events that are triggered on entities in a chrono- An open, inexpensive and flexible platform for autonomy education
logical order, on a system level. The integrated BTF tracing and research,” in 2017 IEEE International Conference on Robotics and
framework can be utilized to capture the events generated on Automation (ICRA), 2017, pp. 1497–1504.
[6] J. Betthauser, D. Benavides, J. Schornick, N. O’Hara, J. Patel, J. Cole,
each task at runtime in real-life scenarios. An overview of the and E. Lobaton, “Wolfbot: A distributed mobile sensing platform for
workflow for timing analysis of the application using tracing research and education,” in Proceedings of the 2014 Zone 1 Conference
capabilities is illustrated in Figure 6. The generated trace of the American Society for Engineering Education, 2014, pp. 1–8.
[7] S. Kannapiran and S. Berman, “Go-chart: A miniature remotely accessi-
file can be viewed on any standard BTF trace visualization ble self-driving car robot,” in 2020 IEEE/RSJ International Conference
tool, for example Eclipse Trace Compass [30]. The timing on Intelligent Robots and Systems (IROS), 2020, pp. 2265–2272.
performance metrics can be derived from the generated trace [8] S. S. Srinivasa, P. Lancaster, J. Michalove, M. Schmittle, C. Summers,
M. Rockett, J. R. Smith, S. Choudhury, C. Mavrogiannis, and F. Sadeghi,
file by converting it to the Eclipse APP4MC Amalthea Trace “Mushr: A low-cost, open-source robotic racecar for education and
Database (ATDB) [31] format. The ATDB file determines the research,” 2019.
execution time of each task and runnables which includes the [9] J ETRACER. [Online]. Available: https://github.com/NVIDIA-AI-IOT/
jetracer
average, best-case and worst-case execution time on a specific
[10] D ONKEYCAR. [Online]. Available: https://www.hackster.io/wallarug/
core. The event-chain metrics in ATDB provides the latency donkey-car-with-jetson-nano-robo-hat-mm1-e53e21
of all the event-chain tasks in the application. At the same [11] B. Goldfain, P. Drews, C. You, M. Barulic, O. Velev, P. Tsiotras, and
time, the trace data also provides the information about the J. M. Rehg, “Autorally: An open platform for aggressive autonomous
driving,” IEEE Control Systems Magazine, vol. 39, no. 1, pp. 26–55,
resource utilization of the processing unit, thereby assisting in 2019.
efficient mapping of the tasks on processing cores. [12] S. Karaman, A. Anders, M. Boulet, J. Connor, K. Gregson, W. Guerra,
O. Guldner, M. Mohamoud, B. Plancher, R. Shin, and J. Vivilec-
chia, “Project-based, collaborative, algorithmic robotics for high school
VII. F UTURE W ORK AND C ONCLUSION students: Programming self-driving race cars at mit,” in 2017 IEEE
Integrated STEM Education Conference (ISEC), 2017, pp. 195–203.
The paper describes the state-of-the-art work on RC car [13] J. Gausemeier, U. Frank, J. Donoth, and S. Kahl, “Specification
platform and identifies the need for developing a new demon- technique for the description of self-optimizing mechatronic systems,”
strator. The APP4MC RaceCar provides a practical prototype Research in Engineering Design, vol. 20, pp. 201–223, 11 2009.
[14] F. Wurst, D. Dasari, A. Hamann, D. Ziegenbein, I. Saudo, N. Capodieci,
of a full size autonomous vehicle with its heterogeneous archi- M. Bertogna, and P. Burgio, “System performance modelling of hetero-
tecture and sensing capabilities. The paper briefly describes the geneous hw platforms: An automated driving case study,” in 2019 22nd
architecture of the heterogeneous components and sensor used Euromicro Conference on Digital System Design (DSD), 2019, pp. 365–
372.
in our platform. It discusses the system architecture and the [15] Eclipse APP4MC. [Online]. Available: https://www.eclipse.org/app4mc/
data flow event-chain task model for an ADAS application on [16] D. A. Jamsek, “Designing and optimizing compute kernels on nvidia
a heterogeneous platform. The computing capabilities of GP- gpus,” in 2009 Asia and South Pacific Design Automation Conference,
2009, pp. 224–229.
GPUs can be used to implement and test an ADAS application
[17] R. Saussard, B. Bouzid, M. Vasiliu, and R. Reynaud, “Optimal per-
in a real-life environment. formance prediction of adas algorithms on embedded parallel architec-
Future work involves working on the mechanical design and tures,” in 2015 IEEE 17th International Conference on High Perfor-
mance Computing and Communications, 2015 IEEE 7th International
assembly of the RaceCar components. Further implementation Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th
involves integrating each component as well as developing International Conference on Embedded Software and Systems, 2015,
the Amalthea task model based on the described architecture. pp. 213–218.
In addition, porting a deterministic RTOS such as QNX on [18] C. Widerspick, W. Bauer, and D. Fey, “Latency measurements for an
emulation platform on autonomous driving platform nvidia drive px2,”
Beaglebone AI and Jetson AGX Xavier will further enhance in ARCS Workshop 2018; 31th International Conference on Architecture
the real-time capability of the system. Finally, we implement a of Computing Systems, 2018, pp. 1–8.
BTF tracing framework that allows us to use the demonstrator [19] K. Haeublein, W. Brueckner, S. Vaas, S. Rachuj, M. Reichenbach, and
D. Fey, “Utilizing pynq for accelerating image processing functions
to verify timing analysis results and efficient mapping of tasks in adas applications,” in ARCS Workshop 2019; 32nd International
on the processing nodes. Conference on Architecture of Computing Systems, 2019, pp. 1–8.
[20] O PEN CV. [Online]. Available: https://docs.opencv.org/master/index.
html
[21] X. Wang, M. Cui, K. Huang, A. Knoll, and L. Chen, “Improving the
performance of adas application in heterogeneous context: A case of lane
detection,” in 2017 IEEE 20th International Conference on Intelligent
Transportation Systems (ITSC), 2017, pp. 1–6.
[22] M. Hammond, G. Qu, and O. A. Rawashdeh, “Deploying and scheduling
vision based advanced driver assistance systems (adas) on heterogeneous
multicore embedded platform,” 2015 Ninth International Conference on
Frontier of Computer Science and Technology, pp. 172–177, 2015.
[23] L. Krawczyk, M. Bazzal, R. P. Govindarajan, and C. Wolff, “An analyt-
ical approach for calculating end-to-end response times in autonomous
driving applications,” 06 2019.
[24] V. I. GmbH, “Best trace format (btf)technical specification v2.2.0,” 2020.
[25] NVIDIA J ETSON AGX X AVIER. [Online]. Available: https:
//www.nvidia.com/de-de/autonomous-machines/embedded-systems/
jetson-agx-xavier/
[26] B EAGLEBONE AI S YSTEM R EFERENCE M ANUAL. [On-
line]. Available: https://github.com/beagleboard/beaglebone-ai/wiki/
System-Reference-Manual
[27] V EDDER E LECTRONIC S PEED CO NTROLLER. [Online]. Available:
https://vesc-project.com/
[28] ZED2 C AMERA S PECIFICATIONS. [Online]. Available: https://cdn.
stereolabs.com/assets/datasheets/zed2-camera-datasheet.pdf
[29] RPLIDAR A3. [Online]. Available: https://www.slamtec.com/en/Lidar/
A3
[30] Eclipse trace compass. [Online]. Available: https://www.eclipse.org/
tracecompass/
[31] Eclipse app4mc amalthea trace database. [Online]. Available: https:
//www.eclipse.org/app4mc/help/latest/index.html#section4.9