<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ReBound: An Open-Source 3D Bounding Box Annotation Tool for Active Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wesley Chen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Edgley</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raunak Hota</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joshua Liu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ezra Schwartz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aminah Yizar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Neehar Peri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Purtilo</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In recent years, supervised learning has become the dominant paradigm for training deep-learning based methods for 3D object detection. Lately, the academic community has studied 3D object detection in the context of autonomous vehicles (AVs) using publicly available datasets such as nuScenes and Argoverse 2.0. However, these datasets may have incomplete annotations, often only labeling a small subset of objects in a scene. Although commercial services exists for 3D bounding box annotation, these are often prohibitively expensive. To address these limitations, we propose ReBound, an open-source 3D visualization and dataset re-annotation tool that works across diferent datasets. In this paper, we detail the design of our tool and present survey results that highlight the usability of our software. Further, we show that ReBound is efective for exploratory data analysis and can facilitate active-learning. Our code and documentation is available on GitHub.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Autonomous Driving</kwd>
        <kwd>Active Learning</kwd>
        <kwd>3D Annotation Tools</kwd>
        <kwd>Human Computer Interaction</kwd>
        <kwd>Data Visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
3D object detection is a critical component of the
autonomous vehicle (AV) perception stack [1, 2]. To
facilitate research in 3D object detection, the AV
industry has released large-scale 3D annotated multimodal
datasets [2, 3, 4]. These datasets include LiDAR sweeps
and multi-camera RGB images from diverse driving logs,
which captures detailed information about the
surrounding environment. Crucially, objects of interest are
annotated by drawing 3D bounding boxes and labeling them
as part of a particular category. Contemporary 3D
detectors [5, 6, 7, 8, 9] are trained using supervised learning,
and are limited by the annotations provided with the
dataset. For example, objects in the nuScenes dataset Figure 1: We present ReBound, an open-source 3D
annotaare inconsistently labeled beyond 50m, making it chal- tion tool that allows users to add, delete, and modify
annotalenging to evaluate long-range detection. Further, the tions from existing datasets or model predictions to support
nuScenes dataset does not label street signs and trafic active learning.
lights, which are critical for safe navigation.</p>
      <p>Manually re-annotating datasets with new 3D
bounding box annotations is particularly challenging. Anno- notate 3D data at scale, it is often prohibitively expensive.
tating 3D bounding boxes using 2D RGB images is dif- As a result, several tools have been created for quick,
ifcult because it is not possible to accurately estimate eficient, and easy point cloud annotation [ 10, 11, 12, 13],
bounding-box depth. Similarly, annotating 3D bound- with the intention of making it simpler for researchers
ing boxes using LiDAR point clouds is dificult because to create their own datasets. However, these tools do
LiDAR returns are sparse, making it tough to identify not support a wide variety of data formats, multi-modal
individual objects at long-range and in cluttered scenes, exploratory data analysis and active learning.
as shown in Fig. 2. Although commercial services can an- In this paper, we introduce ReBound, an open-source
annotation tool which aims to simplify the process of
AutomationXP23: Intervening, Teaming, Delegating Creating Engag- multi-modal 3D visualization and bounding box
annota∗inCgorAruetsopmonadtiionngEaxuptehroiern:cpeusr,tAilpor@il u2m3rdd.,eHduamburg, Germany tion. Our framework is designed to provide users with
†These authors contributed equally. the ability to import and modify annotations from
var‡These authors advised equally. ious datasets. To achieve this, we propose a generic
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License data type that can be extended to accommodate
diferCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
ent data formats. Additionally, ReBound enables active ing on a targeted subset of informative examples to
siglearning by allowing users to correct bounding box pre- nificantly improve model performance. Selecting the
dictions and labels, create new custom labels, analyze most relevant data to be annotated by human experts is
model predictions, and export new annotations back to a primary challenge for active learning. Recent works
dataset-specific formats for model re-training. We mea- define a scoring function, using either an
uncertaintysure the efectiveness of our tool through user studies, based [19, 20, 21, 22, 23, 24, 25, 26] or a diversity-based
focusing on the ease of using the data conversion and approach [27, 26], to determine the most informative
annotation editing features. Our results show that the training samples. When selecting samples based on
untool is both intuitive to use and useful for rapidly adding certainty, informativeness is measured by the predictive
new annotations. uncertainty, and the samples with the highest uncertainty
are provided to the human annotators. When selecting
samples based on diversity, scores are assigned based on
2. Related Work spatial and temporal diversity [27]. Both the
uncertaintybased and diversity-based approaches have been used for
In this section we present a brief overview of existing 3D bootstrapping 3D object detection, but the
uncertaintyannotation tools and active learning methods for object based approach has been shown to be more efective in
detection. practice. More recently, [26] attempts to combine both
3D Annotation Tools. Modern 3D detectors require strategies. Despite substantial work in active learning for
diverse, large-scale 3D annotations for supervised learn- image recognition and 2D object detection, exploration
ing. A variety of tools have been created to address into active learning for 3D object detection is limited.
this challenge [15], including labelCloud[10], SAnE[11], ReBound helps support active learning by allowing users
LATTE[12], and 3D BAT[13], which all allow for eficient to filtering predicted annotations based on detection
conmanual or semi-automatic dataset annotation. However, ifdence score, which is useful for measuring uncertainty.
unlike these existing tools, ReBound also allows users to
manually update annotations from existing datasets for
evolving use cases like detecting a new category or updat- 3. ReBound: 3D Bounding-Box
ing annotations for far-field objects. Our tool currently
Re-Annotation
supports re-annotation for the nuScenes[16], Waymo
[17], and Argoverse 2.0[18] datasets. In this section, we describe the functionality and software</p>
      <p>Active Learning for 3D Object Detection. Al- architecture of ReBound. Our code, documentation, and
though manually labeling tens of thousands of images videos demonstrating our tool are available on GitHub.
or LiDAR sweeps can be prohibitively expensive, recent Data Format. ReBound currently supports three
difwork in active learning suggests that we can bootstrap ferent datasets: nuScenes, Waymo Open Dataset, and
deep learning models by iteratively annotating and
trainArgoverse 2.0. Annotations for these three datasets are
converted into our generic ReBound format using a
separgaetneecroicmdmataanfdorlimneattocoapltbuurieltsitnhePymt hinoinm. uSmpeicnificfoalrlmy,atthioen secSun estaD isrevnoCsecSun tpircS
required to annotate a 3D bounding box (e.g. object
centaellro,wbosxussizteo, bscoaxlerotthaitsiofno,rmanadt saecnrosossr emxutrlitnipsliecsd),awtahseicths. omyaesWtaD isrevnoC omyaW tpircS taD dnuoBeRcirneG notzlausiV loIUTG woydnailWpsiD
Users can support new datasets by extending our
command line tool for their use case. sevogrA sevogrA 0.2</p>
      <p>Data Visualization. The visualization tool has three 0.2 isrevnoC tpircS
windows: a control window, a point cloud viewer, and
an RGB image viewer, as shown in Figure 1. The control
wfrainmdeoswinalalodwrisvtinhge luosge,rstwoitncahvibgeattweetehnroduifegrhendtifecraemnt- dA secSun rtopxEtpircS secSun estaD
era views in the RGB image window, and filter through iatonA noitpO
Tbohtehpgorionutncdloturdutvhieawnenrodtaistipolnayssatnhdempooidnetl cplroeuddicctoiornres-. notgivaN enaP tidE efidoMcirneGestaD omyaWrtopxEtpircS omyaesWtaD
saapsnonwnoidtraientfirgoantmosetahcneudbcpuoirrdreesd.nicTttfhiroeanmusse(e,irfascaavwnaeirllaolbtaalsetae),lfltogrrarnothusalnatdtfert,raaumnthde iatonA noitpO sevo0g.r2A tpircS 0s.2evogrA
zoom in/out to interact with the scene. We use Open3D
as our rendering back-end as this natively supports 3D
rendering on top of LiDAR sweeps and RGB images. Figure 3: Our tool supports data conversion, visualization,</p>
      <p>Editing Tool. Users can directly click on a location and modification of the nuScenes, Waymo and Argoverse 2.0
in the point cloud window to add, edit, or delete an an- datasets. First, dataset-specific RGB images, LiDAR sweeps,
notation. To edit a box’s properties, users must first click sensor extrinsics, and annotations are converted to the
Reon a desired cuboid in the point cloud window. This will Bound data format using our conversion scripts and visualized
highlight the bounding box and display the position, ro- in the GUI. Using this GUI, we can add, edit, or delete existing
tation, size and annotation class of the selected box in the annotations. Lastly, we can export data from the ReBound
control window. These fields can all be directly updated, data format back to the respective dataset-specific formats
allowing the user to make precise changes. Users can using the provided export scripts.
also make coarse changes to the location of the selected
box by clicking and dragging the box within the LiDAR
viewer. To make it easier to interact with 3D objects, the
mouse tool is restricted to two diferent control modes:</p>
      <p>Exporting. Users can export updated annotations
back to the original dataset-specific formats using a
command line tool. Importantly, ReBound only exports the
• Horizontal Translation Mode: The user can artic- modified bounding box annotations back to the
origiulate objects along the  and  axes with the  nal format. Since ReBound does not modify the LiDAR
axis locked sweeps or RGB images, we can directly use the data from
• Vertical Translation Mode: The user can click and the original dataset. We find that this dramatically
indrag objects along the  axis and rotate objects creases data export speed. Interestingly, since we convert
about the  axis. all datasets into a unified format, we can easily export
annotations between datasets. Concretely, this means we</p>
      <p>Finer-grained modifications can be manually entered can convert nuScenes annotations into the Argoverse
in the control window. Users can also delete the selected 2.0 format using the ReBound format as an
intermedibounding box, and create new bounding boxes with a ary, making it easier to evaluate models across diferent
single mouse-click. All bounding box transformations datasets. Further, this can facilitate future research in
are updated in both the LiDAR and RGB image viewers model robustness and domain adaptation between
difin real-time. In practice, the editing tool can be used to ferent datasets. Similar to the data conversion tool that
update bounding boxes that are too large, are misaligned converts from dataset-specific formats to the ReBound
with the true object location (c.f. Figure 4), update mis- format, users that wish to support a new dataset can
classified objects, and add new categories like stop signs extend our command line tool for their use case.
and trafic lights.</p>
    </sec>
    <sec id="sec-2">
      <title>4. Experiments</title>
      <sec id="sec-2-1">
        <title>In this section we present the results of our user survey to evaluate the efectiveness of our tool.</title>
        <p>Ease of Use. We conducted surveys to evaluate the
tool’s ease of use in facilitating data conversion between
data formats and bounding box adjustments among both
people familiar and unfamiliar with autonomous vehicle
datasets and 3D annotation tools.</p>
        <p>We asked ten participants to perform 13 tasks (shown
in Table 1) over a video conference. Participants were
shown a demonstration of the tool before being asked
to complete a new, but related task using the tool.</p>
        <p>This method of demonstration allows us to avoid
redownloading the datasets and re-installing the software
on a new computer for each trial. After completing all
13 tasks, users were asked to complete a questionnaire
designed to rate diferent parts of the user experience on Figure 4: We visualize a ground truth annotation from the
a scale of one to five (five is highest). Argoverse 2.0 dataset (left). We find that the annotation is</p>
        <p>Based on our survey, we identified that the greatest misaligned with the true car location. We adjust this ground
challenge for users was translating and rotating bound- truth annotation using ReBound (right). Visually inspecting
ing boxes. Both of these operations require fine-grained the updated bounding box in both the LiDAR and RGB
imcontrols and navigating in a 3D space, which may be not age viewer confirms that the updated annotation correctly
be intuitive for some users. However these features also localizes the car.
had high standard deviation compared to the other tasks,
indicating that the utility of our tool may depend on prior
use of 3D visualization and annotation tools. In general, 5. Conclusion
participants stated that the application can be useful for
autonomous vehicle research. The academic community studies 3D object detection</p>
        <p>Bounding Box Adjustments. We visualize ground using publicly available AV datasets, which often provide
truth annotations in both the LiDAR and RGB image view- LiDAR sweeps, RGB images, and 3D bounding box
annoers. We are able to identify examples where the ground tations. However, these datasets are limited by the
annotruth annotation is misaligned with the real object, as tations provided by their curators. In some cases, we find
shown in Figure 4, highlighting a potential application that these annotations may be incorrect or incomplete.
of our tool. This is practically meaningful as training 3D Existing annotation tools cannot be easily used to update
detectors with noisy labels can result in degraded model annotations for existing datasets, which makes it
difiperformance. cult to fix incorrect annotations, annotate new categories
of interest, or iteratively improve 3D object detection
models through active learning. In this paper, we
propose ReBound, an open-source alternative that simplifies
the process of bounding box re-annotation for existing
datasets. We validate the utility of our tool through user
surveys and find that users are able to rapidly add, modify,
and delete 3D bounding box annotations.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <sec id="sec-3-1">
        <title>Author James Purtilo was supported by N000142112821</title>
        <p>while working on this project. This work is supported by
Table 1 the SEAM Lab at the University of Maryland and the Argo
Our survey results show that users found it easier to create, AI Center for Autonomous Vehicle Research at Carnegie
delete, and modify annotations but found it more dificult Mellon University.
to rotate, translate, and export bounding boxes from the
ReBound data format to dataset-specific formats.</p>
        <p>Task List
Convert nuScenes data to the ReBound generic type
Enter annotation mode
Add a new annotation bounding box
Add a custom annotation type
Select an existing bounding box
Translate a bounding box
Rotate a bounding box
Change the label of a bounding box
Delete a bounding box
Use the control viewer to edit an annotation
Save modified annotations
Exit the application
Export data back to the nuScenes data format</p>
      </sec>
      <sec id="sec-3-2">
        <title>IEEE Intelligent Transportation Systems Confer</title>
        <p>ence, ITSC 2019, Auckland, New Zealand, October
[1] A. Geiger, P. Lenz, R. Urtasun, Are we ready for 27-30, 2019, IEEE, 2019, pp. 265–272.
autonomous driving? the kitti vision benchmark [13] W. Zimmer, A. Rangesh, M. M. Trivedi, 3d BAT:
suite, in: IEEE Conference on Computer Vision and A semi-automatic, web-based 3d annotation
toolPattern Recognition, 2012. box for full-surround, multi-modal data streams,
[2] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, in: 2019 IEEE Intelligent Vehicles Symposium, IV
Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, 2019, Paris, France, June 9-12, 2019, IEEE, 2019, pp.
nuscenes: A multimodal dataset for autonomous 1816–1821.
driving, in: Proceedings of the IEEE/CVF Confer- [14] S. Gupta, J. Kanjani, M. Li, F. Ferroni, J. Hays, D.
Raence on Computer Vision and Pattern Recognition, manan, S. Kong, Far3det: Towards far-field 3d
de2020. tection, in: NeurIPS, 2022.
[3] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, [15] Y. Li, J. Ibañez-Guzmán, Lidar for autonomous
S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. driving: The principles, challenges, and trends for
Pontes, D. Ramanan, P. Carr, J. Hays, Argoverse 2: automotive lidar and perception systems, IEEE
Next generation datasets for self-driving perception Signal Process. Mag. 37 (2020) 50–61.
and forecasting, in: Neural Information Processing [16] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong,
Systems Datasets and Benchmarks Track, 2021. Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom,
[4] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, nuscenes: A multimodal dataset for autonomous
V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, driving, in: 2020 IEEE/CVF Conference on
ComV. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Tim- puter Vision and Pattern Recognition, CVPR 2020,
ofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Seattle, WA, USA, June 13-19, 2020, Computer
ViY. Zhang, J. Shlens, Z. Chen, D. Anguelov, Scalabil- sion Foundation / IEEE, 2020, pp. 11618–11628.
ity in perception for autonomous driving: Waymo [17] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard,
open dataset, in: IEEE/CVF Conference on Com- V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine,
puter Vision and Pattern Recognition (CVPR), 2020. V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A.
Tim[5] T. Yin, X. Zhou, P. Krähenbühl, Center-based ofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi,
3d object detection and tracking, arXiv preprint Y. Zhang, J. Shlens, Z. Chen, D. Anguelov,
ScalabilarXiv:2006.11275 (2020). ity in perception for autonomous driving: Waymo
[6] N. Peri, A. Dave, D. Ramanan, S. Kong, Towards open dataset, in: 2020 IEEE/CVF Conference on
long-tailed 3d detection, in: Conference on Robot Computer Vision and Pattern Recognition, CVPR
Learning (CoRL), 2022. 2020, Seattle, WA, USA, June 13-19, 2020, Computer
[7] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, Vision Foundation / IEEE, 2020, pp. 2443–2451.</p>
        <p>O. Beijbom, Pointpillars: Fast encoders for ob- [18] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh,
ject detection from point clouds, in: IEEE Confer- S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K.
ence on Computer Vision and Pattern Recognition Pontes, D. Ramanan, P. Carr, J. Hays, Argoverse
(CVPR), 2019. 2: Next generation datasets for self-driving
percep[8] B. Zhu, Z. Jiang, X. Zhou, Z. Li, G. Yu, Class- tion and forecasting, in: J. Vanschoren, S. Yeung
balanced grouping and sampling for point cloud 3d (Eds.), Proceedings of the Neural Information
Proobject detection, arXiv preprint arXiv:1908.09492 cessing Systems Track on Datasets and Benchmarks
(2019). 1, NeurIPS Datasets and Benchmarks 2021,
Decem[9] N. Peri, J. Luiten, M. Li, A. Osep, L. Leal-Taixe, D. Ra- ber 2021, virtual, 2021.</p>
        <p>manan, Forecasting from lidar via future object [19] E. Haussmann, M. Fenzi, K. Chitta, J. Ivanecky,
detection, arXiv:2203.16297 (2022). H. Xu, D. Roy, A. Mittel, N. Koumchatzky, C.
Fara[10] C. Sager, P. Zschech, N. Kühl, labelcloud: A bet, J. M. Alvarez, Scalable active learning for object
lightweight domain-independent labeling tool for detection, in: IEEE Intelligent Vehicles Symposium,
3d object detection in point clouds, CoRR IV 2020, Las Vegas, NV, USA, October 19 -
Novemabs/2103.04970 (2021). ber 13, 2020, IEEE, 2020, pp. 1430–1435.
[11] H. A. Arief, M. Arief, G. Zhang, Z. Liu, M. Bhat, U. G. [20] T. Yuan, F. Wan, M. Fu, J. Liu, S. Xu, X. Ji, Q. Ye,
MulIndahl, H. Tveite, D. Zhao, Sane: Smart annotation tiple instance active learning for object detection,
and evaluation tools for point cloud data, IEEE in: CVPR, 2021.</p>
        <p>Access 8 (2020) 131848–131858. [21] D. Feng, X. Wei, L. Rosenbaum, A. Maki, K.
Diet[12] B. Wang, V. Wu, B. Wu, K. Keutzer, LATTE: ac- mayer, Deep active learning for eficient training of
celerating lidar point cloud annotation via sensor a lidar 3d object detector, in: 2019 IEEE Intelligent
fusion, one-click annotation, and tracking, in: 2019 Vehicles Symposium, IV 2019, Paris, France, June
9-12, 2019, IEEE, 2019, pp. 667–674.
[22] S. Roy, A. Unmesh, V. P. Namboodiri, Deep active
learning for object detection, in: British Machine
Vision Conference 2018, BMVC 2018, Newcastle,</p>
        <p>UK, September 3-6, 2018, BMVA Press, 2018, p. 91.
[23] A. Hekimoglu, M. Schmidt, A. Marcos-Ramiro,</p>
        <p>G. Rigoll, Eficient active learning strategies for
monocular 3d object detection, in: 2022 IEEE
Intelligent Vehicles Symposium, IV 2022, Aachen,
Germany, June 4-9, 2022, IEEE, 2022, pp. 295–302.
[24] S. Schmidt, Q. Rao, J. Tatsch, A. C. Knoll, Advanced
active learning strategies for object detection, in:
IEEE Intelligent Vehicles Symposium, IV 2020, Las
Vegas, NV, USA, October 19 - November 13, 2020,</p>
        <p>IEEE, 2020, pp. 871–876.
[25] M. Meyer, G. Kuschk, Automotive radar dataset for
deep learning based 3d object detection, in: 2019
16th European Radar Conference (EuRAD), 2019.
[26] Y. Luo, Z. Chen, Z. Wang, X. Yu, Z. Huang, M.
Baktashmotlagh, Exploring active 3d object
detection from a generalization perspective, CoRR
abs/2301.09249 (2023).
[27] Z. Liang, X. Xu, S. Deng, L. Cai, T. Jiang, K. Jia,</p>
        <p>Exploring diversity-based active learning for 3d
object detection in autonomous driving, CoRR
abs/2205.07708 (2022).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>