=Paper=
{{Paper
|id=Vol-3028/D1-03-ESAAMM_2021_paper_7
|storemode=property
|title=APP4MC RaceCar: A Practical ADAS Demonstrator for Evaluating and Verifying Timing Behavior
|pdfUrl=https://ceur-ws.org/Vol-3028/D1-03-ESAAMM_2021_paper_7.pdf
|volume=Vol-3028
|authors=Anand Prakash,Lukas Krawczyk,Carsten Wolff
}}
==APP4MC RaceCar: A Practical ADAS Demonstrator for Evaluating and Verifying Timing Behavior==
<pdf width="1500px">https://ceur-ws.org/Vol-3028/D1-03-ESAAMM_2021_paper_7.pdf</pdf>
<pre>
      APP4MC RaceCar: A Practical ADAS
  Demonstrator for Evaluating and Verifying Timing
                     Behavior
                                  Anand Prakash                                                          Lukas Krawczyk
                             IDiAL Institute                                                          IDiAL Institute
              Dortmund University of Applied Sciences and Arts                        Dortmund University of Applied Sciences and Arts
                       44227 Dortmund, Germany                                                 44227 Dortmund, Germany
                      anand.prakash@fh-dortmund.de                                           lukas.krawczyk@fh-dortmund.de

                                                                 Carsten Wolff
                                                           IDiAL Institute
                                            Dortmund University of Applied Sciences and Arts
                                                     44227 Dortmund, Germany
                                                    carsten.wolff@fh-dortmund.de


   Abstract—The computational demands of safety-critical ADAS              nature and must follow hard real-time constraints. Adaptive
applications on autonomous vehicles have been ever-increasing.             cruise control, anti-lock braking systems, lane keep assistance,
As a result, high performance computing nodes with varying                 obstacle detection/avoidance systems, and traffic sign recog-
operating frequencies, and inclusion of different sensors have
been introduced, which has resulted in introduction of het-                nition are just a few examples of an ADAS application with
erogeneous architectures with high complexity. This complexity             high computational requirements.
has led to challenges in analyzing a system’s timing behavior,                Automotive OEMs are gradually moving towards higher
such as determining the end-to-end response time of high-level             levels of driving automation, thereby increasing the complexity
functionality as well as a real-time application’s latency. Although       of these applications. The number of different sensor, such as
several approaches to tackle this issue have been proposed, their
practical verification on real-life applications is still an open issue.   LiDARs, stereo cameras, and Radars, installed on a vehicle
Accordingly, this work proposes an automotive demonstrator                 has also increased significantly in recent times. This accounts
that will be used in evaluating the timing behavior of ADAS                for the huge amount of data from different sensors that needs
applications in a real-life environment using methodologies such           to be processed in real time, which leads to a computational
as tracing, profiling and static analysis. The APP4MC RaceCar              bottleneck. In order to deal with the computational bottleneck,
is a work in progress four-wheel drive demonstrator built on a
Traxxas 1/10 scale RC car platform. It is equipped with state-             heterogeneous platforms are used to improve the overall per-
of-the-art sensors like LiDAR, ZED2 stereo camera and hosts                formance. Hardware accelerators such as GPUs and FPGA are
multiple heterogeneous on-board computers such as Nvidia AGX               used in co-ordination with CPUs to increase the computational
Xavier to replicate a full size autonomous vehicle. In this paper,         power. Semiconductor companies provide various hardware
we describe the need for making such a demonstrator with an                platforms for such ADAS application. For example, Renesas
overview of the heterogeneous components used in it. Moreover,
we further describe the system architecture as well as the data            R-Car-H3 [1] System-on-Chip (SoC) is a high performance
flow through event-chain task model for the ADAS application               platform specifically designed for In-Vehicle infotainment
which is based on Waters Challenge 2019 industrial case study.             and driving safety support. The NVIDIA Drive [2] platform
   Index Terms—Heterogeneous System, Radio-Controlled Cars,                provides a range of developer kits for Autonomous vehicle
Electronic Speed Controller, RT-Linux Kernel.                              along with sensor suite.
                                                                              From the architecture point of view, an ADAS application
                        I. I NTRODUCTION                                   can be divided into two categories. In a centralized com-
   From an abstract point of view, a typical Advanced Driver               puting architecture, the raw data from sensors is passed to
Assistance System (ADAS) has to perform three tasks -                      a centrally located high performance computer that performs
perception, planning, and control. As part of the perception               data processing. However, the cost, performance, and power
task, various sensors are used in an ADAS application. The                 requirements for such a processing unit are usually very high.
vehicle status is provided as a feedback, which along with                 Generally, the sensors have their own processing cores where
the processed sensor data, plans and controls the path of an               the initial data is filtered and then sent to the main processing
autonomous vehicle. Each of these tasks are computationally                unit for further computation. This kind of distributed com-
expensive, which requires high performance processing cores.               puting architecture is the more conventional approach. The
At the same time, these applications are safety-critical in                application does not rely on a single main processing core,

     Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
which provides system redundancy for functional safety along        of a chassis, motor controller, actuators, processing units,
with reduced processing requirements.                               sensors, and power supply. However, if a system is designed
   Each task in an ADAS application has different computa-          for a specific purpose such as parallel parking, the platform
tional demands. For example, the computational requirement          may consist of a minimum required components. Automatic
for an object detection algorithm from camera input is much         Parallel Parking of RC Car platform [4] hosts a custom-
higher than that of an object range detection from ultrasonic       made circuit board consisting of an IC chip, amplifier and
sensors. Therefore, in order to reduce the latency of the overall   radio receiver along with electric motors and antenna. It
application, it becomes necessary to map these tasks to optimal     consists of a simple parallel parking application developed
processing cores. The latency of these applications can be          on a single IC chip with no requirement of high perfor-
further reduced by using parallel programming models such           mance processing nodes. A more complex application would
as MPI, CUDA, etc. These tasks should not only follow hard-         require a heterogeneous system. For instance, Duckietown [5]
real time constraints but must also be deterministic at the same    provides minimal autonomy and basic features such as lane
time. Using a Real-Time Operating System (RTOS) for such            following by utilizing e.g. a Raspberry Pi that is attached to
applications provides a better control over the handling of tasks   a monocular camera. Another example of a low cost, low
based on the scheduling and preemption model used in it.            power autonomous robot can be found in Wolfbot[6] which
   Development of autonomous driving applications [3] poses         is based on Beaglebone Black development platform. Even
several challenges in terms of timing analysis, efficient map-      though a realistic computer vision pipeline is implemented
ping and scheduling of tasks, and maintaining their deter-          on these platforms, the GPU capability available on the on-
ministic behavior. These applications undergo rigorous testing      board computer is not utilized. At the same time, the sensor
and must pass all related safety standards such as Automotive       technology used in these platforms are not very advanced.
Safety Integrity Level (ASIL) before going into production. An         Higher the complexity of ADAS application, more is the
efficient way to conceptualize a new feature or a functionality     computational requirement. Such applications require dedi-
is to implement it in early design phases on an RC (Radio-          cated accelerators to process the data in real-time. The RC
Controlled) Car model. A model based approach can be used           platform Go-CHART [7] makes use of external GPU capability
to design a heterogeneous application on these platforms. This      to overcome the computational bottleneck. The sensor data
not only reduces the development cost and risk factor but also      is transmitted for further processing to Jetson TX2 board
helps in determining the feasibility of applications. Besides,      using wireless communication. For better performance and
the RC platform provides the flexibility to evaluate and bench-     reliable ADAS application, it is recommended to have an on-
mark various performance metrics such as latency, end-to-end        board high performance computer on the RC platform. Several
response time, memory contention, execution time, and so on.        open source self-driving RC car platforms such as MuSHR
Since there are multiple sensors and ECUs interacting with          [8], JetRacer [9], and Donkey Car [10] come with onboard
each other over different communication interfaces, there is        Jetson Nano processors and advanced sensors. MuSHR has
a possibility of communication interference adding to the la-       an additional Electronic Speed Controller (ESC) component,
tency which affects the overall performance of the application      which provides a better control over the actuators. MuSHR,
in real-life scenarios. These metrics can be analyzed, to further   JetRacer, and Donkey Car are based on centralized compute
improve the software/hardware design of the system.                 architecture where there is only one processing node which is
   In this work, we design and implement an ADAS ap-                responsible for processing all sensor data as well as controlling
plication on a 1/10 scale RC car. The demonstrator model            the actuators. The performance of NVIDIA Jetson Nano is suf-
will provide a platform to measure the response time of the         ficient for applications involving computer vision algorithms.
implemented ADAS application, memory contention, latency            However, relying on a single processing node might not be
caused due to communication interference in real-life environ-      efficient for an application having multiple tasks with real-
ments. The remainder of the paper is structured as follows:         time constraints.
Section II provides an overview of related work. Section III           Some autonomous miniature car models use two processing
describes the existing challenges with respect to the timing        nodes, one with the capability of GPU is dedicated for execut-
analysis of an ADAS. Section IV gives an overview of the            ing machine learning algorithms and the other processing node
heterogeneous components and sensors used in the proposed           for motor control. AutoRally [11] is a high-end RC car built
demonstrator model. The system architecture is explained in         on a 1/5-scale platform which uses Intel processing unit along
Section V. Section VI provides a brief overview of evaluating       with Nvidia GTX accelerator for scaled autonomous driving.
the timing behavior based on the generated trace data. Finally,     It is based on distributed compute architecture where each
Section VII draws the conclusion and road map for the future        processing node is responsible for a specific task. However, the
work.                                                               RC car platform does not include a LiDAR sensor. Instead of
                                                                    hosting a stereo camera, it comes with two monocular cameras
                     II. R ELATED W ORK                             which makes the system more complex. MIT Racecar[12] plat-
   Numerous RC cars have been developed for education and           form houses state-of-the-art sensors and computing hardware,
research purpose, with their complexity depending upon their        placed on top of a powerful 1/10-scale mini race car.
particular use case and purpose. A typical RC car consists             A model based approach can be taken in designing a system
based on Operator-Controller Module (OCM) [13] architec-             efficient. The availability of a processor at a given point of time
ture. This helps in better analysis of the performance metrics       must be considered while scheduling these tasks. This makes
at each architectural level. It is worth mentioning that none        scheduling of these tasks non-trivial. This can affect the hard
of the above mentioned RC mini cars platforms are based on           real time constraint and the efficiency of the algorithm. For
the OCM architecture. The Industrial Waters Challenge 2019           example, Wang et al. [21] provided an insight of how effective
[14] provides a case study on a prototypical ADAS application        parallelism can improve the performance of a LDA (Lane
modelled on a heterogeneous platform using Amalthea [15].            Detection Algorithm) compared to a naive parallel approach.
The implementation of this model can be used to determine the        RTOS such as QNX Neutrino, SAFERTOS uses different
performance metrics of a heterogeneous ADAS application.             scheduling techniques in executing the tasks, which affects
MIT Racecar platform satisfies the requirements for design-          the latency of the application [22].
ing an ADAS application based on Waters Challenge 2019.                 Another major aspect of an ADAS application is efficient
However, its compute capabilities can be further enhanced by         mapping of tasks to the processing units. The performance
using next-generation and more powerful development boards.          of an application is highly dependent on optimum utilization
The heterogeneity of the system can be further enhanced by           of the resources available on a heterogeneous platform. The
designing the system based on OCM architecture. This has led         Waters Challenge 2019 [14] focused on developing an initial
us to the development of a new RC car platform for applying          model based on Amalthea [15] which can be further used
the methods to measure the end-to-end latency of an ADAS             in deriving the performance metrics of the application. It is
application on a heterogeneous system.                               also worth mentioning that the model is derived for a specific
                                                                     hardware platform (NVIDIA Jetson TX2 SoM). To further
                 III. P ROBLEM S TATEMENT                            optimize the application, evolutionary optimization approaches
   Researchers have come up with many novel solutions to             such as genetic algorithm can be used for allocation of tasks
analyze and benchmark the performance of ADAS application            [23]. In order to determine the worst case response time of
on the basis of several performance metrics. The existing work       an application, the event chain in the critical path must be
in this area can be categorized on the basis of architecture, high   considered. Tracing format such as BTF (Best Trace Format)
computation algorithms, sensor fusion, scheduling and map-           [24] can be used to analyze the timing, performance, and
ping of tasks on processing cores. New functions in an ADAS          reliability of the system. Similar approach can be used for
require access to different communication interfaces, which          mapping of tasks, calculation of the response time in real life
makes the system more complex. AUTOSAR (AUTomotive                   environment for an application running on RC car platform.
Open System ARchitecture) is based on OSEK specifications,
and provides a three layer architecture to develop an automo-            IV. OVERVIEW OF H ETEROGENEOUS C OMPONENTS
tive application.
                                                                        In order to precisely replicate the real life scenario, the
   Since most ADAS applications rely on computer vision and
                                                                     demonstrator must be modeled similar to existing vehicles
image processing algorithms which require parallel processing,
                                                                     with autonomous functionality. Therefore, it has been designed
General Purpose GPUs (GP-GPUs) have started playing an
                                                                     with multiple sensors and processing units. The remainder of
important role. Parallel portions of an application are executed
                                                                     this section briefly describe the architecture of the components
on GP-GPUs in terms of e.g. kernel programming model
                                                                     used in the system.
[16]. Therefore, the response time of an application can be
determined by the execution time of a kernel [17] on a
                                                                     A. NVIDIA Jetson AGX Xavier
given computing unit. However, GPUs are proprietary systems,
which restricts the knowledge of their internal working. This           The NVIDIA Jetson AGX Xavier SoM [25] includes a
makes the prediction of latency caused due to GPUs uncertain.        compact carrier board and Jetson Xavier module. It is a
Another way to determine the execution time of a task on             powerful AI computer designed for autonomous machines.
a GPU is presented in [18]. In recent times, reconfigurable          It provides performance to handle sensor fusion, localization
platforms such as FPGAs are also being exploited as acceler-         and mapping, obstacle detection, and path planning algorithms
ators for image processing algorithms. PYNQ [19] is a FPGA           critical for autonomous driving. A 40-pin expansion header
based platform that hides the underlying hardware details            supports some standard communication interfaces such as I2C,
and exposes a python interface to use any computer vision            UART, SPI and CAN. A M.2 Key E slot can be used to add
framework such as OpenCV [20].                                       WiFi/LTE capability to the board. An overview of the com-
   Typically, an ADAS application consists of multiple tasks         ponents of the NVIDIA Jetson AGX Xavier is illustrated in
executing on dedicated cores. These programs consist of              Figure 1, followed by an in-depth description in the following
multiple event chain sequences where the input of next task          subsections.
depends on the previous task’s output. Each task has its own            1) Processing Unit: The CPU complex (CCPLEX) is di-
execution time. Varying speed of processors must be taken            vided into four clusters. Each cluster contains two identical 64-
into account while scheduling a task on that processor. At the       bit Carmel processors that are compliant to ARM’s v8.2 ISA
same time, an optimum scheduling sequence of these tasks             architecture. A high performance System Coherency Fabric
needs to be determined to make the overall application more          (SCF) connects all CPU clusters, thus enabling simultaneous
                       Carmel CPU Complex                                          share a common 4MB L3 cache. Each SM in GPU has an
                                                                                   additional L1 cache of 128KB as well as access to a common
  CPU cluster 1    CPU cluster 2      CPU cluster 3    CPU cluster 4
                                                                                   512KB L2 cache that is shared by all SMs.
                                                                                      4) Speed: Each CPU can operate at a maximum frequency
 Dual core ARMv   Dual core ARMv      Dual core ARMv   Dual core ARMv              of 2265 MHz. The GPU is operating at a frequency of
  8.2($D 64KB +    8.2($D 64KB +       8.2($D 64KB +    8.2($D 64KB +
    $I 128KB)        $I 128KB)           $I 128KB)        $I 128KB)                854 MHz, which can be boosted up to 1377 MHz. The
                                                                                   GPU provides workstation-class performance with up to 32
                                                                                   TeraOPS (TOPS) of peak compute and 750 Gbps of high-speed
  L2 cache          L2 cache            L2 cache        L2 cache                   I/O. The maximum system memory bandwidth is 137GB/s
   (2 MB)            (2 MB)              (2 MB)          (2 MB)
                                                                         32GB      providing a low latency in accessing it.
                         L3 Cache (4 MB)                                LPDDR
                                                                        256 bits      5) Power utilization: Jetson AGX Xavier enables new lev-
                                                                         DRAM
                                   Volta GPU                                       els of power efficiency. Users can configure operating modes
                                            SM                                     for their applications at 10W, 15W, or 30W.
                                           SM
                                          SM
                                        SM
                                                                                   B. Beaglebone AI
                                            Tensor
                               CUDA Cores                                             The Beaglebone AI [26] is used as secondary processing
                                            Cores
                                                                                   unit in our RaceCar platform. The board is built around a Texas
                                L1 cache (128 KB)                                  Instruments (TI) AM5729 system-on-chip (SoC). Its comput-
                               L2 Cache (512 KB)
                                                                                   ing capabilities enable the user to develop machine learning
                                                                                   applications with ease. It supports all standard communication
              Fig. 1. NVIDIA Jetson AGX Xavier Architecture.                       interfaces over an 46-pin header on either side of the board.
                                                                                   Additionally, it also supports a 16-bit LCD interface. The
                                                                                   board was developed specifically for AI applications and has
operation of all CPU cores for a true heterogeneous multi-                         an integrated neural engine that processes complex algorithms
processing (HMP) environment. The SCF also connects the                            on the hardware level. The block diagram of AM5729 SoC is
CPU clusters to DRAM through a Memory Controller Fabric                            depicted in Figure 2.
(MCF) and I/O blocks in the Memory Mapped I/O (MMIO)
space through an ARM Advanced eXtensible Interface (AXI).                                                                   AM5729 SoC

    2) Hardware Accelerators: The NVIDIA GPU GV10B is                                   Main Processing Unit            Image Processing Unit          Video Co-Processor

                                                                                     ARM Cortex      ARM Cortex
based on Volta Architecture which features 512 shading units,                           A15             A15          ARMv7E-M4      ARMv7E-M4          4 x Embedded Vision
                                                                                     $D 32KB + $I $D 32KB + $I                                               Engine
32 texture mapping units, and 16 Render output units (ROP).                            32KB L1      32KB L1
                                                                                                                     32KB L1 Cache 32KB L1 Cache
                                                                                                                                                   Program
                                                                                        Cache        Cache                                                            Emulation
It also includes 64 tensor cores, which help to improve the                                                                   L2 Cache              Cache
                                                                                           2 MB L2 Cache
speed of machine learning applications. Additionally, the Volta                                                      16KB ROM        64KB RAM
                                                                                                                                                     32-bit           Vector Co-
                                                                                        48KB Bootable ROM
GPU architecture features a new Streaming Multiprocessor                                                                                           RISC Core          Processor

                                                                                    Digital Signal Processing Unit
(SM) which allows an energy efficient, high performance                                                                 Graphics Processing Unit
                                                                                     TMS320C66x TMS320C66x                                                Interconnect
                                                                                                                       PowerVR        PowerVR
computation of tasks that have processing requirements of                            $D 32KB + $I $D 32KB + $I         SGX544         SGX544
                                                                                       32KB L1      32KB L1
large and complex data streams. Each SM is partitioned into                             Cache        Cache                                         32KB        32KB       32KB
                                                                                                                           288KB L2 Cache          RAM         RAM        RAM
four separate blocks referred as Streaming Multiprocessor Par-                            288KB L2 Cache
                                                                                        256KB                                                                              Error
titions(SMPs). Each SMP contains its own instruction buffer,                          Configurable
                                                                                                         34KB                                      MMU         DMA
                                                                                                                                                                         Detection
                                                                                                         SRAM           Graphics Accelerator
                                                                                        SRAM
scheduler, CUDA cores, and Tensor cores. GPUs core graphics
functions are performed inside the Graphics Processing Cluster
                                                                                                                       High Speed Interconnect
(GPC), which is a dedicated hardware block for computation,
rasterization, shading, and texturing. It is also augmented with                                                          16-bit 1GB DDR3L
an image signal processor (ISP), a multimedia engine, a pro-
grammable vision accelerators (PVAs), and a pair of NVIDIA                                                     Fig. 2. AM5729 SoC Architecture.
deep-learning accelerators (NVDLAs). These accelerators can
be used in parallel or in conjunction with CPU and GPU cores.                         1) Processing Unit: The Beaglebone AI comes with an
    3) Memory: The Xavier platform features a distributed                          ARM-A15 based dual-core processor. It supports the standard
shared memory architecture. It comes with a system memory                          ARM instruction set with hardware virtualization support.
of 32GB 256-Bit LPDDR4x and provides a eMMC storage                                Two dual-core Programmable Real-Time Units (PRU) are
of 32GB. CPU as well as GPU have direct access to system                           present to provide ultra low latency. Access to these PRUs
memory. Each CPU core includes 128 KB Instruction (I-cache)                        are enabled via expansion headers. These dual-core PRUs are
and 64 KB Data (D-cache) Level 1 caches, whereas a 2 MB L2                         based on ARMv7E-M architecture for general purpose usage,
cache is shared by both cores in a single cluster. All clusters                    particularly real-time control.
   2) Hardware Accelerators: The platform comes with a             In automated driving, monocular vision camera can detect only
3D graphics processing unit (GPU) subsystem based on               the classified objects, whereas stereo cameras replicate human
dual POWERVR SGX544 cores to support general embed-                vision, thus allowing accurate extraction of depth information,
ded applications. The GPU can process different data types         such as the distance of a moving object.
simultaneously, such as: pixel data, vertex data, video data,         The ZED2 [28] is a stereo camera that provides high
and general-purpose data. Additionally, the platform is also       definition 3D video and neural depth perception of the en-
equipped with a Vivante based 2D graphics accelerator. There       vironment. It has been designed for a variety of challenging
is a separate dedicated hardware for the machine learning          applications, such as autonomous navigation and mapping
libraries called Embedded Vision Engine (EVE) which func-          to augmented reality and 3D analytics. It supports video
tions as a programmable image and vision processing engine.        streaming with a maximum field of view of 120 degrees
Additionally, the platform also has two DSP subsystems for         and a maximum resolution of 2.2K at 15 frames per second
audio processing as well as general purpose image and video        (fps). For applications which require higher fps, the camera
processing.                                                        can provide a maximum rate of 100 fps with the resolution
   3) Memory: Similar to AGX platform, Beaglebone AI has a         getting compromised relatively. Any object within the depth
distributed shared memory architecture. It comes with a 16GB       range of 0.3m to 20m can be detected through the camera.
eMMC device and features a SD card slot. The system memory         It has a built-in Inertial Measurement Unit (IMU), barometric
is 16-bit 1GB DDRL device. Each core of the main processing        pressure sensor, and magnetic sensor, and can acquire inertial,
unit has a 32KB instruction and 32KB data L1 cache along           elevation, and magnetic field data in real time.
with shared 2MB L2 cache. The unit also features a 48KB               ZED2 camera is compatible with NVIDIA GPU platforms.
bootable ROM. Finally, the platform provides 32KB data and         Therefore, the computation power of Jetson AGX platform can
32KB instruction L1 cache for the DSP unit, 32KB shared            be leveraged in creating a real-time application.
L1 cache memory for the PRUs, and a system level cache of
128KB for the GPUs.                                                E. Slamtech Lidar Sensor
   4) Speed: The system memory operates at a frequency                Although cameras provides much of the sensing capabilities
of 553MHz yielding an effective rate of 1066Mb/s on the            for an autonomous vehicle, they suffer various limitations
DDR3L bus allowing for 4GB/s of DDR3L memory band-                 when dealing with e.g. shadows or bright lights, which may
width. The main processing unit operates at a frequency of         cause confusion in taking decisions. Moreover, calculating
1.5MHz, whereas the GPU operated at a maximum frequency            an object’s distance from raw images usually comes at high
of 532MHz.                                                         computation cost and requires correspondingly powerful com-
                                                                   puters. A viable solution to reduce these limitations is using
C. Vedder Electronic Speed Controller                              other sensing technologies such as LiDAR or RADAR. The
   Most modern Electronic Speed Controller (ESC) consist of a      RaceCar uses the capability of LiDAR along with stereo
microcontroller, which take input signals to regulate the speed    camera for sensing the environment in decision-making.
of an electric motor. VESC [27] is an open source ESC that en-        A LiDAR uses lasers to sense the surrounding environment.
ables advanced customization options with multiple interface       The concept for distance determination remains similar to
support. The VESC 6 MKIV uses STM32F4 microcontroller              RADAR, where the distance is calculated based on the du-
chip. It operates within the voltage range of 11.1V to 60V and     ration between the transmitted signal and the reflected signal
provides a continuous current of 80A, which can reach up to        received from the object. LiDARs have extremely fast response
120A in burst mode. Moreover, it can read its 3D orientation       times which gives the processing units on the autonomous cars
in space, 3 axis acceleration values, and directions via the in-   ample amount of time to react to the changing environment.
built Inertial Measurement Unit (IMU).                             One of its primary advantage is precision and accuracy.
   It is equipped with multiple communication interfaces and          The RPLIDAR A3M1 [29] used in RaceCar is the next
sensor ports. Hall sensors allow precise and powerful rotation     generation low cost 360 degree 2D laser scanner (LIDAR)
of motor rotors from a random position. Single Wire Debug          developed by SLAMTEC. It can take up to 16000 samples
(SWD) provides an interface for debugging, diagnosis of real-      of laser ranging per second with high rotation speed. The
time data on STM controller. Along with the connectors             system can perform 2D 360-degree scan within a 25-meter
for Brushless DC (BLDC) motor and servo motor, it also             range. It must be noted that the distance range for dark or
provides standard interfaces such as I2C, UART which allows        less reflective objects is limited to 10m. The generated 2D
its integration with other micro-controllers such as the Beagle-   point cloud data can be used in mapping, localization, and
bone AI. An additional CAN Bus interface allows integrating        object/environment modeling. The typical scanning frequency
multiple VESC devices into an array.                               of LiDAR is 10Hz (600rpm), and the frequency can be freely
                                                                   adjusted within a range from 5 to 20Hz according to the
D. ZED2 Camera                                                     specific requirements. With the 10Hz scanning frequency, the
   Cameras serve as a crucial component in enabling machine        sampling rate is 16kHz and the angular resolution is 0.225
vision and surroundings awareness. Based on the vision,            degree. It is worth mentioning that it provides a rotation
camera can be classified as monocular vision or stereo vision.     speed detection and adaptive system as it adjusts the angular
resolution automatically according to the actual rotating speed.
                                                                             ZED2
The LiDAR is augmented by a DSP unit which takes the input                  Camera
                                                                                           RP LiDAR
from the vision acquisition system, processes the sampled data
and provides the output distance, angle values between the
object and LiDAR.                                                                               Jetson AGX Xavier Carrier Board
   The RPLiDAR A3M1 can either be operated in enhanced
mode or outdoor mode. The enhanced mode is meant for                                                    NVIDIA Jetson
indoor environments and provides greater performance com-                                                AGX Xavier
                                                                                                                                    USB Hub
pared to outdoor mode, whereas the outdoor mode comes at
an increased reliability. The RPLiDAR needs a 5V supply for
powering the range scanner core and motor system.
                                                                                                           MCP2515          Power bank

                V. S YSTEM A RCHITECTURE
                                                                                     LiPo Battery          MCP2515
   The system architecture is coarsely based on the concept
of Operator-Controller-Module (OCM) [13]. The OCM ar-
chitecture helps in realization of a self-optimizing complex
system with adaptive behavior over changing environmental                               VESC            Beaglebone AI

conditions. It can be structured into three levels - Controller,
Reflective, and Cognitive Operator. The block diagram in
Figure 3 depicts the overall system architecture. The VESC
acts as a controller operator as it has direct access to the           DC Motor      Servo Motor
actuators and operates under hard-real time conditions. The
Beaglebone AI acts as Reflective operator. It receives the
information about motor speed, steer, and acceleration value                      Fig. 3. APP4MC RaceCar Block diagram
from the VESC. It does not have direct access to the actuators,
but regulates and supervises the VESC. At the same time,
it acts a communication interface between the VESC and              with any standard ESC. Additionally, it also provides a high-
NVIDIA Jetson AGX Xavier. The Jetson AGX Xavier board               torque digital steering servo with Futaba connectors. The
is part of Cognitive Operator as it deals with processing the       Traxxas 2075 steering servo has a transit time of 0.16 seconds
sensor data such that it adapts to the changing environment         which delivers a responsive steering.
conditions.                                                              d) Communication Interfaces: Our demonstrator uses
                                                                    some standard communication protocols to interact with differ-
A. Hardware Architecture                                            ent components on the RaceCar. The data transfer between the
                                                                    Beaglebone AI and VESC must be in full-duplex mode. Since
   The Traxxas chassis comes with a built-in ESC. However,          the amount of data exchanged between them is not very high,
for the purpose of better control and customization, the built-in   a UART interface with a baud rate of 115200 is sufficient for
ESC is replaced with the VESC.                                      the application. CAN interface is one of the standard protocol
      a) Power Management: The platform consists of two             used in automotive applications. The data transfer between
power sources - 3S LiPo battery and power bank. The LiPo            the Beaglebone AI and VESC must also be in full-duplex
battery is dedicated to provide power supply to VESC using          mode. A CAN bus over SPI interface is established between
XT-90 connector. The Jetson AGX Xavier board is powered             them for data exchange. For this purpose, MCP2515 breakout
with a 19V power supply from Patona power bank. An                  board is used, which provides a CAN Bus transceiver over
active USB is also connected to the power bank. The active          SPI interface.
USB hub consists of four USB ports, one is used to power
up Beaglebone AI. The power supply to RPLiDAR is also               B. Software Architecture
provided by the USB hub.                                               A modular software stack for individual components is
      b) Sensing: The sensing technology installed on the plat-     depicted in Figure 4.
form consists of ZED2 stereo camera and RPLiDAR A3M1.                  1) Jetson AGX Xavier Software Stack: To address the real-
Both the sensors provide a USB interface and are directly           time constraints of the application we have enabled RT-Linux
connected to Jetson AGX board. Additionally, the VESC as            to the NVIDIA Jetson Xavier platform. This signifies that
well as ZED2 camera comes with an in-built IMU sensor               among all the threads ready for execution, the one with
which can be used to determine the orientation, acceleration        the highest priority will be executed. The Linux kernel pro-
of the RC car.                                                      vides two real-time scheduling policies (SCHED_FIFO and
      c) Actuators: The Traxxas platform already comes with         SCHED_RR) that apply an individual arbitration in case of
a high-speed performance Velineon brushless DC motor. The           tasks having same priorities. Non-real-time tasks are scheduled
DC motor has a 3.5mm bullet connection interface to integrate       following the SCHED_NORMAL policy.
    Application Layer                                                                     of the road boundaries and the shape of each lane. The
                                                             BLDC Application
                                                                                          output of this task is a matrix of points representing the
           System Library                                                                 lane boundaries within the road, which is sent to the
                                                               ChibiOS/HAL
       Linux Kernel               RTID RTID                                               Planner task.
       Device driver                                                                    • Detection - The detection task is responsible for detecting
                                                                   Low Level Driver

             RT Scheduler
                                                                                          and classifying the objects within the visual range of the
                                                         ChibiOS/RT
                                                                                          camera. The output of this task is sent to the Planner task.
   RT-Linux Kernel
                                                UART                                    • Planner - The main purpose of this component is to
                                                                ARM Core
        ARM Core                                                                          calculate and follow a vehicle trajectory. The targeted ve-
                                                                  VESC
           Beaglebone AI                                                                  hicle motion parameters are passed to the CAN Controller
                          CAN over SPI
                                                                                          task.
                                                                                        • Car Controller - The main purpose of this task is to get
       Application Layer                                                                  the steering angle, speed, and acceleration value from the
                                                                                          Planner task and provide it to the reflective operator over
                                         RTID RTID RTID
           System Library                                                                 CAN Bus.
           Linux Kernel
                  Device driver
                                                                                           CPU                                            Camera Input
                                                         High Computation                  GPU
                  RT Scheduler                               Algorithm
        RT-Linux Kernel                                                                                                                SFM        Lane Detection

                                                       Accelerator (GP-GPU)
                                                                                                       CAN Polling depth estimation                      lane boundaries

                 ARM Cores
                                                                                                                      vehicle_status
                                                                                               vehicle_status
                                                                                                                                                     steer,
                                                                                                                            pose                     speed
                          Host Memory                                                                  Localization                     Planner                Controller
                                                                                       LiDAR Input
                                  Nvidia Jetson AGX Xavier
                                                                                                                Detection
              Sensor Data
                                                                                                 Fig. 5. APP4MC RaceCar Task Model with dataflow
             Fig. 4. APP4MC RaceCar Software Architecture

                                                                                         2) Beaglebone AI Software Stack: The RT-Linux Kernel
   The application is implemented on top of the Operating                             has been ported on Beaglebone AI.
System (OS) layer, further broken into several tasks, and                                The application layer consists of two threads - one in-
mapped on different cores. Those tasks that have high com-                            teracting with VESC to configure and send the commands
putational demands are offloaded to GP-GPUs. CPUs apply                               to it, the other to communicate with Jetson AGX Xavier
a fully preemptive fixed priority scheduling policy, whereas                          board. The Beaglebone AI provides the feedback regarding
GPUs follow weighted round-robin scheduling. The system                               the current speed, steering angle, acceleration, and orientation
memory is shared between the CPU cluster and GP-GPUs for                              of the RaceCar to Jetson AGX Xavier board over the CAN
better performance. Figure 5 depicts the task model along with                        bus.
the data flow from sensors to actuators. The task definition of                          3) VESC Application: As the VESC directly interacts with
ADAS application defined in this paper is mainly derived from                         the actuators, it must also conform to the real-time constraints
the Waters Challenge 2019 [14].                                                       required for controlling the actuators. The VESC firmware
   • Localization - The localization task is responsible for
                                                                                      is built using RTOS ChibiOS. It is a light weight operating
     determining the relative position of the RC car on a given                       system providing deterministic behavior of real-time multi-
     environmental map. It takes the point cloud data from                            threaded applications. The scheduling of threads on VESC
     LiDAR input and merges it with the RC car motion status                          is possible in two ways - Round Robin scheduling and
     to estimate the demonstrator’s position.                                         Cooperative Scheduling.
   • Can Polling - This task gets the key information about
                                                                                         A brushless DC (BLDC) motor application is implemented
     the demonstrator motion parameters from the on-board                             on top of ChibiOS RTOS. This application receives commands
     CAN bus and sends it to the Localization and Planner                             from Beaglebone AI over UART interface and performs the
     task.                                                                            respective operation.
   • Structure From Motion - This task is responsible for
                                                                                         VI. T IMING A NALYSIS USING T RACING FRAMEWORK
     estimating the depth of an object based on the stereo
     vision camera images. The distance of the object is passed                          Tracing the software application is one of the efficient
     to Planner task for further processing.                                          approaches in determining its timing behavior. BTF [24]
   • Lane Detection - This task provides accurate locations                           is a CSV (Comma-Separated Values) based format used in
 Trace generation                                                                                             R EFERENCES
           BTF Tracing
           Framework                   Execution on                              [1] Renesas r-car-h3. [Online]. Available: https://www.renesas.com/
                                                           Generated BTF             jp/en/products/automotive-products/automotive-system-chips-socs/
                                      heterogeneous          Trace File
       ADAS Application                  platform                                    r-car-h3-m3-starter-kit
                                                                                 [2] Nvidia drive. [Online]. Available: https://developer.nvidia.com/drive
                                                                                 [3] R. Okuda, Y. Kajiwara, and K. Terashima, “A survey of technical
                                                                                     trend of adas and autonomous driving,” in Technical Papers of 2014
                                                                                     International Symposium on VLSI Design, Automation and Test, 2014,
                    Timing Analysis                                                  pp. 1–4.
                                                      BTF Trace Visualization
                                                                                 [4] B. Xiao, C. Xu, and L. Xu, “Notice of violation of ieee publication
                          Derive Performance                                         principles: Automatic parallel parking of rc car using distance sensors,”
                                                               tool
                            Metrics (ATDB)                                           in 2009 Second International Conference on Future Information Tech-
                                                      (Eclipse Trace Compass)
                                                                                     nology and Management Engineering, 2009, pp. 525–528.
                                                                                 [5] L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, Y. F.
          Fig. 6. Timing Analysis Workflow using BTF Trace.                          Chen, C. Choi, J. Dusek, Y. Fang, D. Hoehener, S. Liu, M. Novitzky,
                                                                                     I. F. Okuyama, J. Pazis, G. Rosman, V. Varricchio, H. Wang, D. Yershov,
                                                                                     H. Zhao, M. Benjamin, C. Carr, M. Zuber, S. Karaman, E. Frazzoli,
                                                                                     D. Del Vecchio, D. Rus, J. How, J. Leonard, and A. Censi, “Duckietown:
recording events that are triggered on entities in a chrono-                         An open, inexpensive and flexible platform for autonomy education
logical order, on a system level. The integrated BTF tracing                         and research,” in 2017 IEEE International Conference on Robotics and
framework can be utilized to capture the events generated on                         Automation (ICRA), 2017, pp. 1497–1504.
                                                                                 [6] J. Betthauser, D. Benavides, J. Schornick, N. O’Hara, J. Patel, J. Cole,
each task at runtime in real-life scenarios. An overview of the                      and E. Lobaton, “Wolfbot: A distributed mobile sensing platform for
workflow for timing analysis of the application using tracing                        research and education,” in Proceedings of the 2014 Zone 1 Conference
capabilities is illustrated in Figure 6. The generated trace                         of the American Society for Engineering Education, 2014, pp. 1–8.
                                                                                 [7] S. Kannapiran and S. Berman, “Go-chart: A miniature remotely accessi-
file can be viewed on any standard BTF trace visualization                           ble self-driving car robot,” in 2020 IEEE/RSJ International Conference
tool, for example Eclipse Trace Compass [30]. The timing                             on Intelligent Robots and Systems (IROS), 2020, pp. 2265–2272.
performance metrics can be derived from the generated trace                      [8] S. S. Srinivasa, P. Lancaster, J. Michalove, M. Schmittle, C. Summers,
                                                                                     M. Rockett, J. R. Smith, S. Choudhury, C. Mavrogiannis, and F. Sadeghi,
file by converting it to the Eclipse APP4MC Amalthea Trace                           “Mushr: A low-cost, open-source robotic racecar for education and
Database (ATDB) [31] format. The ATDB file determines the                            research,” 2019.
execution time of each task and runnables which includes the                     [9] J ETRACER. [Online]. Available: https://github.com/NVIDIA-AI-IOT/
                                                                                     jetracer
average, best-case and worst-case execution time on a specific
                                                                                [10] D ONKEYCAR. [Online]. Available: https://www.hackster.io/wallarug/
core. The event-chain metrics in ATDB provides the latency                           donkey-car-with-jetson-nano-robo-hat-mm1-e53e21
of all the event-chain tasks in the application. At the same                    [11] B. Goldfain, P. Drews, C. You, M. Barulic, O. Velev, P. Tsiotras, and
time, the trace data also provides the information about the                         J. M. Rehg, “Autorally: An open platform for aggressive autonomous
                                                                                     driving,” IEEE Control Systems Magazine, vol. 39, no. 1, pp. 26–55,
resource utilization of the processing unit, thereby assisting in                    2019.
efficient mapping of the tasks on processing cores.                             [12] S. Karaman, A. Anders, M. Boulet, J. Connor, K. Gregson, W. Guerra,
                                                                                     O. Guldner, M. Mohamoud, B. Plancher, R. Shin, and J. Vivilec-
                                                                                     chia, “Project-based, collaborative, algorithmic robotics for high school
          VII. F UTURE W ORK AND C ONCLUSION                                         students: Programming self-driving race cars at mit,” in 2017 IEEE
                                                                                     Integrated STEM Education Conference (ISEC), 2017, pp. 195–203.
   The paper describes the state-of-the-art work on RC car                      [13] J. Gausemeier, U. Frank, J. Donoth, and S. Kahl, “Specification
platform and identifies the need for developing a new demon-                         technique for the description of self-optimizing mechatronic systems,”
strator. The APP4MC RaceCar provides a practical prototype                           Research in Engineering Design, vol. 20, pp. 201–223, 11 2009.
                                                                                [14] F. Wurst, D. Dasari, A. Hamann, D. Ziegenbein, I. Saudo, N. Capodieci,
of a full size autonomous vehicle with its heterogeneous archi-                      M. Bertogna, and P. Burgio, “System performance modelling of hetero-
tecture and sensing capabilities. The paper briefly describes the                    geneous hw platforms: An automated driving case study,” in 2019 22nd
architecture of the heterogeneous components and sensor used                         Euromicro Conference on Digital System Design (DSD), 2019, pp. 365–
                                                                                     372.
in our platform. It discusses the system architecture and the                   [15] Eclipse APP4MC. [Online]. Available: https://www.eclipse.org/app4mc/
data flow event-chain task model for an ADAS application on                     [16] D. A. Jamsek, “Designing and optimizing compute kernels on nvidia
a heterogeneous platform. The computing capabilities of GP-                          gpus,” in 2009 Asia and South Pacific Design Automation Conference,
                                                                                     2009, pp. 224–229.
GPUs can be used to implement and test an ADAS application
                                                                                [17] R. Saussard, B. Bouzid, M. Vasiliu, and R. Reynaud, “Optimal per-
in a real-life environment.                                                          formance prediction of adas algorithms on embedded parallel architec-
   Future work involves working on the mechanical design and                         tures,” in 2015 IEEE 17th International Conference on High Perfor-
                                                                                     mance Computing and Communications, 2015 IEEE 7th International
assembly of the RaceCar components. Further implementation                           Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th
involves integrating each component as well as developing                            International Conference on Embedded Software and Systems, 2015,
the Amalthea task model based on the described architecture.                         pp. 213–218.
In addition, porting a deterministic RTOS such as QNX on                        [18] C. Widerspick, W. Bauer, and D. Fey, “Latency measurements for an
                                                                                     emulation platform on autonomous driving platform nvidia drive px2,”
Beaglebone AI and Jetson AGX Xavier will further enhance                             in ARCS Workshop 2018; 31th International Conference on Architecture
the real-time capability of the system. Finally, we implement a                      of Computing Systems, 2018, pp. 1–8.
BTF tracing framework that allows us to use the demonstrator                    [19] K. Haeublein, W. Brueckner, S. Vaas, S. Rachuj, M. Reichenbach, and
                                                                                     D. Fey, “Utilizing pynq for accelerating image processing functions
to verify timing analysis results and efficient mapping of tasks                     in adas applications,” in ARCS Workshop 2019; 32nd International
on the processing nodes.                                                             Conference on Architecture of Computing Systems, 2019, pp. 1–8.
[20] O PEN CV. [Online]. Available: https://docs.opencv.org/master/index.
     html
[21] X. Wang, M. Cui, K. Huang, A. Knoll, and L. Chen, “Improving the
     performance of adas application in heterogeneous context: A case of lane
     detection,” in 2017 IEEE 20th International Conference on Intelligent
     Transportation Systems (ITSC), 2017, pp. 1–6.
[22] M. Hammond, G. Qu, and O. A. Rawashdeh, “Deploying and scheduling
     vision based advanced driver assistance systems (adas) on heterogeneous
     multicore embedded platform,” 2015 Ninth International Conference on
     Frontier of Computer Science and Technology, pp. 172–177, 2015.
[23] L. Krawczyk, M. Bazzal, R. P. Govindarajan, and C. Wolff, “An analyt-
     ical approach for calculating end-to-end response times in autonomous
     driving applications,” 06 2019.
[24] V. I. GmbH, “Best trace format (btf)technical specification v2.2.0,” 2020.
[25] NVIDIA J ETSON AGX X AVIER. [Online]. Available: https:
     //www.nvidia.com/de-de/autonomous-machines/embedded-systems/
     jetson-agx-xavier/
[26] B EAGLEBONE        AI     S YSTEM     R EFERENCE       M ANUAL. [On-
     line]. Available: https://github.com/beagleboard/beaglebone-ai/wiki/
     System-Reference-Manual
[27] V EDDER E LECTRONIC S PEED CO NTROLLER. [Online]. Available:
     https://vesc-project.com/
[28] ZED2 C AMERA S PECIFICATIONS. [Online]. Available: https://cdn.
     stereolabs.com/assets/datasheets/zed2-camera-datasheet.pdf
[29] RPLIDAR A3. [Online]. Available: https://www.slamtec.com/en/Lidar/
     A3
[30] Eclipse trace compass. [Online]. Available: https://www.eclipse.org/
     tracecompass/
[31] Eclipse app4mc amalthea trace database. [Online]. Available: https:
     //www.eclipse.org/app4mc/help/latest/index.html#section4.9

</pre>