=Paper=
{{Paper
|id=Vol-3381/paper_33
|storemode=property
|title=Domain-centric ADAS Datasets
|pdfUrl=https://ceur-ws.org/Vol-3381/33.pdf
|volume=Vol-3381
|authors=Vaclav Divis,Tobias Schuster,Marek Hruz
|dblpUrl=https://dblp.org/rec/conf/aaai/DivisSH23
}}
==Domain-centric ADAS Datasets==
<pdf width="1500px">https://ceur-ws.org/Vol-3381/33.pdf</pdf>
<pre>
Domain-centric ADAS datasets
Václav Diviš1,* , Tobias Schuster2 and Marek Hrúz3
1
  University of West Bohemia, Sedláčkova 214, Pilsen 3 301 00, Czech Republic
2
  Siemens Technology, Software Systems & Processes, Research in Verification & Test
3
  University of West Bohemia, Faculty of Applied Sciences, Department of Cybernetics and New Technologies for the Information Society


                                             Abstract
                                             Since the rise of Deep Learning methods in the automotive field, multiple initiatives have been collecting datasets in order
                                             to train neural networks on different levels of autonomous driving. This requires collecting relevant data and precisely
                                             annotating objects, which should represent uniformly distributed features for each specific use case. In this paper, we analyze
                                             several large-scale autonomous driving datasets with 2D and 3D annotations in regard to their statistics of appearance and
                                             their suitability for training robust object detection neural networks. We discovered that despite spending huge effort on
                                             driving hundreds of hours in different regions of the world, merely any focus is spent on analyzing the quality of the collected
                                             data, from an operational domain perspective. The analysis of safety-relevant aspects of autonomous driving functions, in
                                             particular trajectory planning with relation to time-to-collision feature, showed that most datasets lack annotated objects at
                                             further distances and that the distributions of bounding boxes and object positions are unbalanced. We therefore propose a
                                             set of rules which help find objects or scenes with inconsistent annotation styles. Lastly, we questioned the relevance of mean
                                             Average Precision (mAP) without relation to the object size or distance.

                                             Keywords
                                             Advanced Driver-Assistance Systems, Trajectory Planning, Domain-centric Datasets, Object Detection, mean Average
                                             Precision


1. Introduction and Motivation                                                                                                        that the data clearly conveys what the AI must learn. In
                                                                                                                                      the case of higher levels of ADAS [7], the vehicle can
The shift from classical programming paradigms to Ma-                                                                                 control the movement in longitudinal and lateral direc-
chine Learning-driven approaches (ML) is significant.                                                                                 tions, while relying predominantly on cameras, radars,
Humans are nowadays often relying on decisions and                                                                                    and nowadays more frequently on LIDAR systems. One
recommendations made by Artificial Neural Networks                                                                                    example of an ADAS function is the Adaptive Cruise
(ANNs), however most of the applications are harmless.                                                                                Control (ACC). The task of ACC is to drive within the
This is not the case of Advanced Driver-Assistance Sys-                                                                               lane at a certain speed and keep a safe distance from
tems (ADAS), where wrong decisions can lead to severe                                                                                 any potential obstacle. The Time-To-Collision (TTC) for
injuries [1]. The development of an ML-driven applica-                                                                                each object is calculated for the purpose of keeping a
tion within this field follows strict processes [2, 3], from                                                                          safe distance. If the TTC value decreases below a de-
acquiring data necessary for training to deploying and                                                                                fined threshold, a braking or evasive maneuver will be
evaluating the models. These processes comply with                                                                                    initiated. Based on linear kinematic equations, where dis-
functional safety standards [4, 5] but do not propose spe-                                                                            tance is equal to speed over time, the greater the speed
cific measures, nor concrete thresholds which the system                                                                              difference between the subject-automated vehicle (ego
should pass before being publicly released. Moreover, it                                                                              vehicle) and the object in front of it, the shorter the TTC
is generally not feasible to collect the full variation of                                                                            will be, hence the ACC needs to plan in advance. As a
information in a stochastic environment such as public                                                                                consequence, on highways, the ACC must incorporate
roads. This is why it is important to look deeper into the                                                                            objects at a further distance (on camera images objects
possibilities of diagnosing and analyzing unbalanced or                                                                               will appear smaller) into the trajectory planning [8]. On
missing information within large-scale datasets related                                                                               the contrary, in cities, where the maximum speed is lim-
to the operation domain, i.e. a data-centric approach [6].                                                                            ited to 50km/h, the average area of an encountered object
   A data-centric approach lays the focus on ensuring                                                                                 will be larger.
                                                                                                                                         Motivated by these physical dependencies, we took a
SafeAI 2023: The AAAI’s Workshop on Artificial Intelligence Safety,                                                                   deeper look at the annotated objects and analyzed their
Feb 13-14, 2023 Washington, D.C., US                                                                                                  statistical appearance. The examination was done in
*
  Corresponding main author.
                                                                                                                                      regard to the functionality of ACC, in state-of-the-art
$ divisvaclav@gmail.com (V. Diviš); tobias.schuster@siemens.com
(T. Schuster); mhruz@ntis.zcu.cz (M. Hrúz)                                                                                            (SOTA) large-scale automotive datasets. Our contribu-
 https://gitlab.com/divisvaclav/ (V. Diviš)                                                                                          tions are as follows:
 0000-0001-9935-7824 (V. Diviš); 0000-0002-9421-8566 (M. Hrúz)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
• We define a minimum safe distance 𝑠𝑚𝑖𝑛 =                 areas whereas A2D2 and ONCE also contain highways,
  𝑓 (𝑣, 𝑎𝑚𝑎𝑥 ) for a variety of scenarios (ego speed 𝑣 and country roads, tunnels etc. On top of that, the sensor
  weather dependent deceleration 𝑎𝑚𝑎𝑥 ). We calculate      setups vary as well, e.g. different camera resolutions. For
  the safe distance based on German legislation, but the   the KITTI dataset, the authors used a LIDAR sensor and
  process can easily be adapted to any other legislation.  two stereo cameras (left and right), whereas for Waymo
                                                           five LIDARs (restricted to 75m) and five cameras were
• We analyze the distribution of objects’ bounding boxes’ used. nuScenes used six cameras and one LIDAR sensor
  (BB) relative sizes, distances to ego vehicle, and posi- as well as five radars, ONCE uses one LIDAR and seven
  tions in the datasets.                                   cameras, and A2D2 five LIDARs and six cameras. How-
• We define a set of standardizable sanity checkers which ever, for the ACC, only the front cameras are taken into
  help verify the quality of the collected data and mark account, resulting in a smaller amount of usable images.
  ambiguously labeled data.                                   The KITTI dataset is the smallest in terms of scenes
                                                           and the least diverse, containing only sunny and cloudy
• We highlight the concrete missing information which daytime scenes. For a short period of time, the Waymo
  is not part of the datasets and diagnose the cause.      and nuScenes datasets provided the largest variety and
                                                           amount of data and annotations; they are among the most
• We propose an automotive mean Average Precision widely-used autonomous driving datasets. Although the
  (amAP) metric, which is now related to the distance to ONCE dataset recently set a new benchmark for the
  the object, or relative BB size.                         amount of driving hours and frames, Waymo contains
                                                           the highest amount of 3D bounding boxes because ONCE
2. Datasets and Related Work                               focuses on self-supervised learning without labels. Ta-
                                                           ble 1 gives an overview of important general information
2.1. Automotive Datasets                                   per dataset.

As mentioned in Section 1, the prerequisite for the cor-
                                                                2.2. Dataset Analysis
rect functionality of the ACC is information about ob-
ject classes, sizes (used in case of overtaking or evasive      The authors of the above-mentioned datasets compared
maneuver), and distances of the objects to the ego ve-          their works based on the common aspects of the datasets
hicle. Several common automotive datasets are there-            as shown in Table 1. General properties like the number
fore not suitable for such a task, despite some of them         of driving hours are often used to compare datasets and to
providing LIDAR data, namely BDD100K [9] (no LIDAR              state an improvement. Moreover, the number of scenes,
data), CityScapes [10] (no bounding box annotations),           images, or annotations is often used to determine the
Perl [11] (no 3D annotations) or Apollo Scape [12] (no          quality of the datasets.
images, only LIDAR). The other group of datasets con-              For instance, the authors of the ONCE [18] and
tains images and LIDAR point clouds including the 2D            A2D2 [14] dataset focus on the number of annotations,
as well as the 3D annotations. It is for this reason that       the amount of driving hours, the adverse weather condi-
we decided to take the following large-scale automotive         tions, the time (day/night) and different locations (urban,
datasets into account: KITTI [13], Audi Autonomous              highway, country roads) as well as countries/cities where
Driving Dataset (A2D2) [14], Lyft Level 5 dataset [15],         the data was captured. However, specific requirements
nuScenes dataset [16], Waymo Open Dataset [17] and              of driver assistance systems were not considered while
the ONCE dataset [18].                                          creating or evaluating any of those datasets. The intent
   Each dataset contains a various number of labeled            was rather to generate general datasets for a wide range
objects like small vehicles (vehicle, ego vehicle, SUV, mo-     of supervised and unsupervised learning tasks as well as
torcycle, etc.), large vehicles (truck, bus, tram), pedestri-   driving functions.
ans, and cyclists. Especially KITTI and nuScenes show              The authors of A2D2, as well as nuScenes, focused
high-class imbalance for some classes due to fine-grained       on statistics relevant to the ACC and other assistance
classes. Furthermore, the selected datasets contain la-         systems. Their work provides information about the
beled camera images with 2D and 3D bounding boxes and           distribution of the object distances for different classes
the corresponding LIDAR point cloud information which           as well as the absolute number of objects within the
provides distance information for each object. However,         dataset. Additionally, the authors of nuScenes analyzed
the size of the datasets, in terms of the number of labeled     the distributions of the velocities of common objects like
frames and captured ambient conditions, varies. For in-         vehicles and bikes as well as bounding box dimensions.
stance, A2D2 provides a dataset of 2D labeled images, but          The KITTI benchmark [13] is based on the perfor-
only a small part contains 3D bounding boxes. KITTI,            mance analysis of neural networks on the size of BBs
nuScenes, LyftLevel5 and Waymo reflect only urban               in pixels as a proxy for the distance of the ego vehicle
 Table 1
 Comparison of analyzed datasets. Cells with "-" indicate not mentioned in the original paper.

                                                 # 3D                    Ambient                             Year of
   Dataset            Scenes    # images                                                         # classes
                                             Bounding Boxes              conditions                          release
                                                                     urban, highways,
   ONCE [18]            1M         7M              417k                country roads;               5         2021
                                                                 day/night; various weather
   nuScenes [16]        1k        1.4M            1.4M                                              23        2020
                                                                      urban, highways,
   A2D2 [14]             -          -               -                   country roads;              14        2020
                                                                    day; various weather
   LyftLevel5 [15]      366        323k           1.3M           urban; day; various weather        9         2019
                                                                      urban; day/night;
   Waymo [17]          1150        1M              12M                                              4         2019
                                                                       various weather
   KITTI [13]            22        15k             200k           urban; day; sunny, cloudy         8         2012


to the object. Their work in general follows the COCO         on the same path". Let’s define the first object to be an
evaluation methodology [19], but no physical distance in-     obstacle (anything else than the ego vehicle) and the
formation is used. The authors of nuScenes and Waymo          second object to be the ego vehicle. We consider the ve-
did set a baseline for various detection tasks, yet without   locity of an obstacle to be equal to 0km/h (representing
considering the distances to the different objects explic-    the stand-still object and therefore the worst-case sce-
itly. The analysis closest to ours is done by the authors     nario), and the ego vehicle’s deceleration to be 7 𝑚/𝑠2
of the ONCE dataset. They analyzed the collected data         (can variate within a range from 7 𝑚/𝑠2 till 10 𝑚/𝑠2 on
regarding distance-wise mean Average Precision perfor-        dry roads [21],[22]). Vehicle deceleration can be seen as
mance for 3D object detection using only point clouds.        a function of adhesion between the tires and the road,
However, their distance thresholds were selected rather       which depends on the material used in the tires, material
intuitively, whereas we specifically derive the distance      of the road, temperature, weather conditions, mounted
from the domain safety requirements. Additionally, we         braking system, and the mass of the vehicle.
analyze the spatial distribution of objects within the im-       Based on the definition of TTC, let us consider three
ages as well as the bounding box/object size compared         driving scenarios:
to the image size.
                                                              • highway (recommended speed 130𝑘𝑚/ℎ ≈ 36𝑚/𝑠)

3. Background                                                 • country road (maximum speed 100𝑘𝑚/ℎ ≈ 28𝑚/𝑠)
                                                              • city (maximum speed 50𝑘𝑚/ℎ ≈ 14𝑚/𝑠)
Since the majority of related works only analyze datasets
from a general ML perspective, omitting the point of             We now compute the minimal safe distance which
data-centric paradigm, we decided to verify the SOTA au-      needs to be ensured in order to brake in time (with-
tomotive datasets in regard to a trajectory planning task.    out initiating any evasive maneuver), for the following
One part of our motivation is that forecasting the trajec-    case: ego vehicle is driving on the highway, the pos-
tory planning is conditioned by ego’s vehicle velocity,       sible deceleration is equal to 7𝑚/𝑠2 and reaction de-
which in case of higher value takes the further-distance      lay is 0.0𝑠. Based on the kinematic equations of a lin-
objects into account. In order to be able to evaluate the     early decelerating object, the distance which the object
sufficiency and quality of annotated objects within the       will travel is a function of time 𝑠 = 𝑠0 + 𝑣0 𝑡 − 12 𝑎𝑡2 ,
datasets, we’ve chosen the Time-To-Collision as an in-        where time 𝑡 is a function of deceleration 𝑡 = 𝑣0𝑎−𝑣 .
stance to calculate the minimal safe distance from the        With a linear deceleration of 7𝑚/𝑠2 , the vehicle, moving
ego vehicle.                                                  within the legal limits, will reach its standstill state in
   For the sake of simplicity, we do not consider any         time 𝑡 = 𝑣0𝑎−𝑣 = 36−0           = 5.14𝑠. Within this time
                                                                                          7
obstacle heading from the opposite direction (on the col-     frame, the ego vehicle will travel a distance of 𝑠 =
lision course), since we are working with static images       0 + 36 · 5.14 − 12 · 7 · 5.142 = 185.04 − 92.46 = 92.58𝑚.
and thus don’t have the information about the relative        For completeness, the braking distance under the same
motion of the objects. As described in [20]: "The TTC         weather conditions on country roads is 55.11𝑚 and in
value at instant 𝑡 is defined as the time for two objects     the city 14𝑚 as can be seen in Table 2. As mentioned
to collide if they continue at their present velocity and     earlier, this process can be generalized and repeated for
Table 2                                                      rather generally (small, middle, big), we propose to have
All scenarios with calculated minimum safe distances.        clearly specified operational domain dependencies and
                Max. allow        Time to      Min. safe     incorporate the minimum safe distances as thresholds.
  Scenario         speed        stand still distance 𝑑𝑠      Furthermore, the relation of AP to the objects’ relative
                  [km/h]            [s]            [m]       BB sizes or distance highlights detailed discrepancies
  highway            130            5.14          92.58      between the model’s performance trained on the same
  country
                     100            4.00          55.11      dataset. We therefore incorporate the minimal safe dis-
  road                                                       tance 𝑑𝑠 of each scenario as a threshold for the creation
  city                50            2.00            14       of test subsets 𝒜 ⊂ ℬ ⊂ 𝒞 from original test dataset
                                                             𝒟. For instance 𝒜 contains objects only in distance
                                                             > 𝑑𝑠 (ℎ𝑖𝑔ℎ𝑤𝑎𝑦). For each subset the average precision
any ambient conditions, type of vehicle, and speed limi- can be calculated, representing a concrete value for a
tations.                                                     specific operational domain e.g. driving in the city. Equa-
   It is noticeable that the highway’s maximum foresight tion 1 represents a different perspective on the average
boundary will be in reality limited by the physical prop- precision metric, which we call automotive mean AP.
erties of the camera or the maximum speed difference
between the ego vehicle and the object. But most impor-                                         1 ∑︁
                                                                                                    𝑐

tantly, objects within those safe ranges must be part of               𝑎𝑢𝑡𝑜𝑚𝑜𝑡𝑖𝑣𝑒 𝑚𝐴𝑃 =                𝐴𝑃𝑛𝑖 ,        (1)
                                                                                               𝑛𝑐 𝑖=1
a dataset (training and testing), otherwise, the system
will deal with an epistemic uncertainty [23]. In order where 𝑛𝑐 is a number of domain specific test-subsets.
to be able to investigate the statistics of objects’ appear-
ances, we need a dataset that provides information about
the object’s distance. As mentioned in Section 2, some 4. Analysis of datasets
datasets such as Perl and BKK100, etc. are not suitable for
this task. Consequently, we have chosen the following In this chapter we analyze all datasets from Table 1 for
large-scale automotive datasets: KITTI, Waymo open the characteristics mentioned in Section 3. In order to
Dataset, A2D2 from Audi, nuScenes, LyftLevel5 and evaluate the model’s generalization ability, the collected
ONCE, which contain 3D annotations and distances to data has to be divided into two parts, namely training
the objects.                                                 and validation. The training part is used for extracting
   In regard to the functionality of trajectory planning, we the relevant features and finding a reasonable combina-
focused on the following in-dataset object characteristics: tion in order to build a high-level feature representation,
                                                             whereas the validation part is used to evaluate the loss
• distribution of a BB’s relative size: in order to verify after each training cycle (epoch). The training loop usu-
   that a variety of objects’ sizes is captured within the ally ends when the loss, on a validation set, stagnates for
   dataset,                                                  several epochs [24]. Logic demands that both the training
                                                             and validation parts should have uniformly distributed
• distribution of the distance between obstacles and the
                                                             objects densities and their properties (e.g.size, distance).
   ego vehicle (with relation to minimum safe distances):
                                                             We therefore decided to evaluate the correlation between
   in order to verify that objects in further distances are
                                                             the theoretical uniform distribution and observed one by
   incorporated within the dataset,
                                                             calculating the Wasserstein distance [25].
• relation between BB’s relative size and object distance
   from ego vehicle: to discover abnormality within the 4.1. Analysis of outliers
   dependency,
                                                             There is no reason to assume that all data are flawlessly
• heatmap of an object’s appearance density: to visualize annotated since the task is usually done by several peo-
   the potential asymmetrical appearance of an object ple, which increases the uncertainty of an inconsistent
   with relation to the ego vehicle,                         annotation style. We therefore designed plausibility
                                                             check functions which allow us to indicate the poten-
• an optical flow between consecutive images; in order tially wrong annotated data and analyze them later on.
   to identify a series of static images, which can lead to
   class imbalanced dataset.                                 • 𝑓1 : 𝑟𝑒𝑡𝑢𝑟𝑛 (0.0 ≥ 𝑟𝑒𝑙.𝑏𝑏.𝑠𝑖𝑧𝑒 > 1.0)
   As originally presented in [19] and further explored • 𝑓2 : 𝑟𝑒𝑡𝑢𝑟𝑛 (𝑒− 𝑑𝑖𝑠𝑡  𝑑
                                                                                   +𝑙𝑜𝑛𝑔.𝑐𝑎𝑚.𝑜𝑓 𝑓
                                                                                                  )
in [18], it seems to be reasonable to observe the mean
Average Precision with relation to a specific object’s size. • 𝑓3 : 𝑟𝑒𝑡𝑢𝑟𝑛 (𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐(𝑏𝑏𝑖 , 𝑏𝑏𝑗 ) > 𝑡ℎ𝑟𝑜𝑣𝑒𝑟𝑙𝑎𝑝 )
Since the original authors clustered the object groups
                                                             • 𝑓4 : 𝑟𝑒𝑡𝑢𝑟𝑛 (𝑜𝑝𝑡.𝑓 𝑙𝑜𝑤(𝑖𝑚𝑔𝑖−1 , 𝑖𝑚𝑔𝑖 ) > 𝑡ℎ𝑟𝑠𝑡𝑎𝑡𝑖𝑐 )
                                                                                                                       Magnitude of dense optical flow over dataset: nuScenes
   The function 𝑓1 returns a bounding box whose rela-                                                                                                                      actual value
                                                                                                     7
tive size is out of the range (0.0, 1.0]. The outliers of                                                                                                                  median value, win_size = 4


                                                                   Magnitude of dense optical flow
                                                                                                     6                                                                     static scene threshold
otherwise exponentially decayed objects’ size with re-                                               5
spect to the distance to the ego are marked by function                                              4
𝑓2 . This function covers the quadratic dependency of the                                            3
BB area and the mapping of the captured object within                                                2
the real world to the pixel coordinates. The parameter                                               1
𝑙𝑜𝑛𝑔.𝑐𝑎𝑚.𝑜𝑓 𝑓 and the denominator 𝑑 were found with                                                  0
                                                                                                         3635       7271   10907     14543       18179      21815      25451      29087      32723
the least squares optimization from the analyzed data                                                                                        Index of images
and therefore are unique for each dataset and each class.
                                                                   Figure 2: As visible on the figure, 2.3% of the nuScenes dataset
The function 𝑓3 highlights objects whose bounding boxes
                                                                   are "stop and go" scenarios, where the magnitude of the dense
significantly overlap. The threshold of the relative inter-
                                                                   optical flow drops below a static scene threshold. The thresh-
section area can be defined by 𝑡ℎ𝑟𝑜𝑣𝑒𝑟𝑙𝑎𝑝 .                        old was recalculated for each dataset based on the image
   Examples of the outcomes for functions 𝑓1 − 𝑓3 are              resolution.
given in Figure 1. We further found that most of the
datasets contain sequences of "stop and go" in a traffic
jam, or idling at crossroads. These situations result in
the recording of many similar images without objects or
                                                                   4.2. Overall results
surrounding variation. Therefore we added 𝑓4 , which               Exemplary results of the analysis of the objects’ distances
calculates a dense optical flow [26] from previous and             distribution as well as the relation between the relative
actual images in the sequence and returns a positive flag          BB size and object distance can be seen in Figures 3 and 4.
in case the magnitude sinks bellow empirically defined             Moreover, we show an example of object appearance
threshold as seen in Figure 2. The higher the value of the         variation (heatmap) of class Vehicle in the A2D2’s dataset
magnitude, the more objects were moving from frame                 in Figure 5.
to frame. With this method, we could even identify if
the data were recorded repeatedly in the same place [27]                                                        A2D2's dataset distribution of distance to ego for objects: Car
                                                                                                     1400              14.0 Mean: 18.76          55.11                      92.58
(when recorded in one session), but we only used it to
                                                                                                     1200
discover static scenarios.
                                                                                                     1000
                                                                     frequency


                                                                                                      800
                                                                                                      600
                                                                                                                                                                    Minimum safe distance in [m]
                                                                                                      400                                                                     for 130km/h
                                                                                                      200                                                                     for 100km/h
                                                                                                                                                                              for 50km/h
                                                                                                        0
                                                                                                                0          20         40             60           80             100           120
                                                                                                                                           Distance to object [m]

                                              (b)
              (a)                                                  Figure 3: The information from subset (0.0 to 14.0) meters will
                                                                   be used in scenarios where a vehicle is driving less than 50km/h
                                                                   (idling on a crossroads, for instance). However, the A2D2
                                                                   dataset doesn’t contain any objects with the minimal safe
                                                                   distance necessary for highway driving (92.58m and more).


              (c)                             (d)                  5. Concluding Remarks
Figure 1: Examples of outliers (potentially wrongly annotated
data) within the analysed datasets. Images (a, b) capture          To summarize, most of the SOTA autonomous driving
overlapping BBs, whereas images (c, d) contain objects where       datasets are generated with a focus on a large amount of
the relative BB size doesn’t correlate with the object distance.   scenes/driving hours/frames while considering different
We further encourage to analyse such a subset of filtered          weather, location, and daylight conditions. However, as
images and adapt the parameters of the outliers function           we discovered, none of the datasets contains a sufficient
𝑓1 − 𝑓4 to make them suitable to the target domain.                amount of information for safe autonomous driving on
                                                                   the highway (described in Section 3). Every analyzed
                                                                   dataset was lacking high-distance annotated objects, as
                                                                   can be seen in Table 3. Such a gap can be explained by the
                                                                   physical limitations of the camera (too low resolution),
                        Table 3
                        Comparison of the analysed datasets highlighting the best results in regard to the training of an object detector used for
                        trajectory planning.

              Research Questions                                                      Class                 ONCE                         nuScenes          A2D2         LyftLevel5         Waymo          KITTI
              Which dataset contains
                                                                                         Ped.                          0                   6                   0             55                 0                0
               the most distant objects
                                                                                         Veh.                          0                   2464                4             1574               0                0
              (driving ≥ 130𝑘𝑚/ℎ)?
              Which datasets has the most
                                                                                         Ped.                    0.86                      0.82              0.80            0.60             0.65          0.89
              uniform distribution according
                                                                                         Veh.                    0.74                      0.57              0.81            0.50             0.53          0.70
              to the Wasserstein distance?
              Which datasets has the least
              outliers based on our sanity                                                   All                  212                      4742              563             146              6935              288
              check filters (f1 - f3)?
               Which dataset contains the
              least static images according                                                  All                  358                       786              157              59              895               77
              to optical flow f4?


                                            Relative BB size as a function of distance to object for class: Truck
                                                   14.0                      55.11                      92.58
                                                                                                                                           the annotation style (objects under a certain pixel area
                       1.0                                                                                                                 were excluded from the annotation process), and the
                       0.8                                                                                                                 ambient conditions in which the dataset was recorded. In
Relative BB size [-]


                                                                                                    Minimum safe distance in [m]
                       0.6
                                                                                                              e 6.0x + 0.075               addition, a system trained on such a dataset would have
                                                                                                              for 130km/h
                                                                                                              for 100km/h                  to deal with epistemic uncertainty and look for additional
                       0.4                                                                                    for 50km/h
                                                                                                                                           sources of information (namely LIDAR).
                       0.2
                                                                                                                                              Furthermore, all datasets contain predominantly small-
                       0.0                                                                                                                 sized objects (the highest MEAN value of the relative BB
                                            0             20        40            60           80               100                120
                                                                    Distance to object [m]                                                 size of the class Person was 0.091 in the case of the KITTI
                                                                                                                                           dataset). For comparison, the same can be stated for the
Figure 4: The black curve indicates the outliers’ decision
                                                                                                                                           well-known COCO dataset [19], where the class Person
boundary of otherwise exponentially-decayed objects size
                                                                                                                                           has a MEAN relative BB size equal to 0.089. By generating
with respect to the distance to the ego. Outliers can indicate
rotated, or wrongly annotated (with unnecessary big margin)                                                                                heatmaps, we discovered that 99.8% of the objects appear
objects.                                                                                                                                   only in the two lower quadrants of the image. Such
                                                                                                                                           information can lead to a significant downsizing of the
                                                                                                                                           field of view and the thereof acceleration of the detectors’
                                                 Heatmap of annotated objects with label: Car                                              inference time. The majority of overlapping BBs, with
                                    0                     480          960             1440               1920
                               0
                                        0                                2                                                                 potentially wrong annotation styles, were extracted from
                                                                                                                      17.5                 a sequence of streams on crossroads. Such a static data
                             302                                                                                      15.0
                                                                                                                      12.5
                                                                                                                                           sequence (9.55% in case of nuScenes dataset) contains
                             604                                                                                      10.0                 a lot of similar features (the majority of surrounding
                                                                                                                             [%]


                                        11632                            13382
                                                                                                                      7.5                  objects are not moving) and could be removed from the
                             906
                                                                                                                      5.0                  dataset.
                                                                                                                      2.5
                                                                                                                                              Finally, we defined and evaluated a reasonable set
                                                                                                                      0.0
                             1208                                                                                                          of rules, described in Section 4.1, which automatically
                                                                                                                                           proves the quality of the collected data from a domain-
Figure 5: The number in the top left corner of each quadrant                                                                               related perspective. We encourage the community to use
indicates the number of objects’ appearances in the respective
                                                                                                                                           our "domain-centric" approach in order to create a dataset
area. Contrary to the expected symmetrical heatmap, the
vehicle appears with different quantities and sizes on both                                                                                under concrete functional constraints and train detectors
sides of the ego. The distribution is obviously unbalanced in                                                                              on it. Our code and additional results are published on
the vertical and horizontal directions containing the majority                                                                             GitLab and can be publicly accessed. 1
of the objects in the lower half of the image. Such a statistical                                                                             This work deepened our vision of a domain-centric ML
information can be used in post-processing by plausibility                                                                                 approach in the automotive industry. To conclude, we
check of objects appearance.                                                                                                               outline some research directions which we are currently
                                                                                                                                           investigating: (a) Analysis of automotive mAP based on
                                                                                                                                            1
                                                                                                                                                https://gitlab.com/arrk-fi/ObjectDetectionCriticality/-/tree/
                                                                                                                                                dependency_branch.
                                  Distribution of relative BB sizes for objects: Small_Vehicle                                                           Distribution of distances for objects: Pedestrian
                      1.0                                                   A2D2 nsamp = 25043, wdist = 0.96                              1.0                                                    LyftLevel5 nsamp = 8751, wdist = 0.60
                                                                            Kitti nsamp = 28742, wdist = 0.98                                                                                    nuScenes nsamp = 2686, wdist = 0.82
                      0.8                                                   ONCE nsamp = 33292, wdist = 0.98                              0.8                                                    Waymo nsamp = 767542, wdist = 0.65
                                                                            nuScenes nsamp = 139575, wdist = 0.99                                                                                A2D2 nsamp = 4388, wdist = 0.80
  frequency density


                                                                                                                      frequency density
                                                                            Waymo nsamp = 1280729, wdist = 0.99                                                                                  ONCE nsamp = 3006, wdist = 0.86
                      0.6                                                   LyftLevel5 nsamp = 190029, wdist = 0.99                       0.6                                                    Kitti nsamp = 4487, wdist = 0.89

                      0.4                                                                                                                 0.4

                      0.2                                                                                                                 0.2

                      0.0                                                                                                                 0.0
                            0.0       0.2             0.4                 0.6               0.8                1.0                              0   20             40             60              80             100             120
                                                       BB relative size [-]                                                                                             Distance to object [m]


Figure 6: Left: We picked a Small Vehicle class where a various BB relative size density can be seen. It is visible that the
majority of large objects has very small relative BB size. Right: We show the distance to object distributions of class pedestrian
for each dataset, where 𝑛𝑠𝑎𝑚𝑝 is number of samples and 𝑤𝑑𝑖𝑠𝑡 is Wasserstein Distance between Uniform distribution 𝒰 and
the dataset distance to object distributions. The lower the value, the closer the two distributions are.


the relative BB size or distance to object with SOTA ob-                                                               [7] SAE International, Taxonomy and definitions for
ject detectors. (b) Object detector performance analysis                                                                   terms related to on-road motor vehicle automated
on cleaned data (without outliers). (c) Dataset creation                                                                   driving systems, volume J3016, 2014.
with respect to our domain-centric approach. (d) Com-                                                                  [8] A. Gasparetto, P. Boscariol, A. Lanzutti, R. Vidoni,
bination of datasets in order to achieve a more uniform                                                                    Path planning and trajectory planning algorithms:
data distribution. (e) Data augmentation to compensate                                                                     A general overview, Motion and operation planning
weak aspects in the datasets.                                                                                              of robotic systems (2015) 3–27.
                                                                                                                       [9] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Mad-
                                                                                                                           havan, T. Darrell, BDD100K: A diverse driving
Acknowledgments                                                                                                            video database with scalable annotation tooling,
                                                                                                                           CoRR abs/1805.04687 (2018). URL: http://arxiv.org/
This work is partly funded by ARRK Engineering GmbH.
                                                                                                                           abs/1805.04687. arXiv:1805.04687.
The work has also been supported by the grant of the
                                                                                                                      [10] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. En-
University of West Bohemia, project No. SGS-2022-017
                                                                                                                           zweiler, R. Benenson, U. Franke, S. Roth, B. Schiele,
and by the Technology Agency of the Czech Republic,
                                                                                                                           The cityscapes dataset for semantic urban scene
project No. CK03000179.
                                                                                                                           understanding, in: Proc. of the IEEE Conference on
                                                                                                                           Computer Vision and Pattern Recognition (CVPR),
References                                                                                                                 2016, pp. 3213–3223.
                                                                                                                      [11] G. Pandey, J. R. McBride, R. M. Eustice, Ford campus
 [1] E. Schwalb, Analysis of safety of the intended use                                                                    vision and lidar data set, The International Journal
     (sotif) (2019).                                                                                                       of Robotics Research 30 (2011) 1543–1552.
 [2] W. W. Royce, Managing the development of large                                                                   [12] Y. Ma, X. Zhu, S. Zhang, R. Yang, W. Wang,
     software systems: concepts and techniques, in:                                                                        D. Manocha, Trafficpredict: Trajectory prediction
     Proceedings of the 9th international conference on                                                                    for heterogeneous traffic-agents, in: Proceedings
     Software Engineering, 1987, pp. 328–338.                                                                              of the AAAI Conference on Artificial Intelligence,
 [3] K. Petersen, C. Wohlin, D. Baca, The waterfall                                                                        volume 33, 2019, pp. 6120–6127.
     model in large-scale development, in: International                                                              [13] A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision
     Conference on Product-Focused Software Process                                                                        meets robotics: The kitti dataset, The International
     Improvement, Springer, 2009, pp. 386–400.                                                                             Journal of Robotics Research 32 (2013) 1231–1237.
 [4] ISO 26262:2018, Road vehicles — Functional safety                                                                [14] J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou,
     (ISO 26262), Standard, International Organization                                                                     R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham,
     for Standardization, 2018.                                                                                            M. Mühlegg, S. Dorn, T. Fernandez, M. Jänicke,
 [5] ISO/PAS 21448:2019, Road vehicles - Safety of the                                                                     S. Mirashi, C. Savani, M. Sturm, O. Vorobiov,
     intended functionality, Standard, International Or-                                                                   M. Oelker, S. Garreis, P. Schuberth, A2d2: Audi au-
     ganization for Standardization, 2019.                                                                                 tonomous driving dataset, 2020. URL: https://arxiv.
 [6] A. Ng, Deep Learning AI data-centric ai com-                                                                          org/abs/2004.06320. doi:10.48550/ARXIV.2004.
     petition, https://https-deeplearning-ai.github.io/                                                                    06320.
     data-centric-comp/, 2021. Accessed: 2022-08-16.                                                                  [15] R. Kesten, M. Usman, J. Houston, T. Pandya, K. Nad-
                                                                                                                           hamuni, A. Ferreira, M. Yuan, B. Low, A. Jain, P. On-
     druska, S. Omari, S. Shah, A. Kulkarni, A. Kazakova,
     C. Tao, L. Platinsky, W. Jiang, V. Shet, Level 5 percep-
     tion dataset 2020, https://level-5.global/level5/data/,
     2019.
[16] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong,
     Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom,
     nuscenes: A multimodal dataset for autonomous
     driving, in: CVPR, 2020, pp. 11621–11631.
[17] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard,
     V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine,
     et al., Scalability in perception for autonomous
     driving: Waymo open dataset, in: Proceedings of
     the IEEE/CVF conference on computer vision and
     pattern recognition, 2020, pp. 2446–2454.
[18] J. Mao, M. Niu, C. Jiang, X. Liang, Y. Li, C. Ye,
     W. Zhang, Z. Li, J. Yu, C. Xu, et al., One million
     scenes for autonomous driving: Once dataset, 2021.
[19] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona,
     D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft coco:
     Common objects in context, in: ECCV, Springer,
     2014, pp. 740–755.
[20] R. Van Der Horst, J. Hogema, Time-to-collision and
     collision avoidance systems (1993).
[21] I. für Unfallanalysen, Bremstabelle hamburg
     - institut für unfallanalysen - bremstabelle,
     https://unfallanalyse.hamburg/index.php/
     ifu-lexikon/bremsen/bremstabelle-a/,               2022.
     Accessed: 2022-09-14.
[22] A. Erd, M. Jaśkiewicz, G. Koralewski, D. Rutkowski,
     J. Stokłosa, Experimental research of effectiveness
     of brakes in passenger cars under selected condi-
     tions, in: 2018 Xi International Science-Technical
     Conference Automotive Safety, IEEE, 2018, pp. 1–5.
[23] A. Kendall, Y. Gal, What uncertainties do we need
     in bayesian deep learning for computer vision?, in:
     NIPS, 2017, pp. 5574–5584.
[24] L. Prechelt, Early stopping-but when?, in: Neural
     Networks: Tricks of the trade, Springer, 1998, pp.
     55–69.
[25] C. Villani, Optimal transport, old and new. notes for
     the 2005 saint-flour summer school, Grundlehren
     der mathematischen Wissenschaften [Fundamental
     Principles of Mathematical Sciences]. Springer 3
     (2008).
[26] G. Farnebäck, Two-frame motion estimation based
     on polynomial expansion, in: Scandinavian confer-
     ence on Image analysis, Springer, 2003, pp. 363–370.
[27] A. J. Davison, Real-time simultaneous localisation
     and mapping with a single camera, in: Computer
     Vision, IEEE International Conference on, volume 3,
     IEEE Computer Society, 2003, pp. 1403–1403.

</pre>