-

ReBound: An Open-Source 3D Bounding Box Annotation Tool for Active Learning

Wesley Chen

Andrew Edgley

Raunak Hota

Joshua Liu

Ezra Schwartz

Aminah Yizar

Neehar Peri

James Purtilo

In recent years, supervised learning has become the dominant paradigm for training deep-learning based methods for 3D object detection. Lately, the academic community has studied 3D object detection in the context of autonomous vehicles (AVs) using publicly available datasets such as nuScenes and Argoverse 2.0. However, these datasets may have incomplete annotations, often only labeling a small subset of objects in a scene. Although commercial services exists for 3D bounding box annotation, these are often prohibitively expensive. To address these limitations, we propose ReBound, an open-source 3D visualization and dataset re-annotation tool that works across diferent datasets. In this paper, we detail the design of our tool and present survey results that highlight the usability of our software. Further, we show that ReBound is efective for exploratory data analysis and can facilitate active-learning. Our code and documentation is available on GitHub.

eol>Autonomous Driving Active Learning 3D Annotation Tools Human Computer Interaction Data Visualization

1. Introduction 3D object detection is a critical component of the autonomous vehicle (AV) perception stack [1, 2]. To facilitate research in 3D object detection, the AV industry has released large-scale 3D annotated multimodal datasets [2, 3, 4]. These datasets include LiDAR sweeps and multi-camera RGB images from diverse driving logs, which captures detailed information about the surrounding environment. Crucially, objects of interest are annotated by drawing 3D bounding boxes and labeling them as part of a particular category. Contemporary 3D detectors [5, 6, 7, 8, 9] are trained using supervised learning, and are limited by the annotations provided with the dataset. For example, objects in the nuScenes dataset Figure 1: We present ReBound, an open-source 3D annotaare inconsistently labeled beyond 50m, making it chal- tion tool that allows users to add, delete, and modify annotalenging to evaluate long-range detection. Further, the tions from existing datasets or model predictions to support nuScenes dataset does not label street signs and trafic active learning. lights, which are critical for safe navigation.

Manually re-annotating datasets with new 3D bounding box annotations is particularly challenging. Anno- notate 3D data at scale, it is often prohibitively expensive. tating 3D bounding boxes using 2D RGB images is dif- As a result, several tools have been created for quick, ifcult because it is not possible to accurately estimate eficient, and easy point cloud annotation [ 10, 11, 12, 13], bounding-box depth. Similarly, annotating 3D bound- with the intention of making it simpler for researchers ing boxes using LiDAR point clouds is dificult because to create their own datasets. However, these tools do LiDAR returns are sparse, making it tough to identify not support a wide variety of data formats, multi-modal individual objects at long-range and in cluttered scenes, exploratory data analysis and active learning. as shown in Fig. 2. Although commercial services can an- In this paper, we introduce ReBound, an open-source annotation tool which aims to simplify the process of AutomationXP23: Intervening, Teaming, Delegating Creating Engag- multi-modal 3D visualization and bounding box annota∗inCgorAruetsopmonadtiionngEaxuptehroiern:cpeusr,tAilpor@il u2m3rdd.,eHduamburg, Germany tion. Our framework is designed to provide users with †These authors contributed equally. the ability to import and modify annotations from var‡These authors advised equally. ious datasets. To achieve this, we propose a generic © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License data type that can be extended to accommodate diferCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) ent data formats. Additionally, ReBound enables active ing on a targeted subset of informative examples to siglearning by allowing users to correct bounding box pre- nificantly improve model performance. Selecting the dictions and labels, create new custom labels, analyze most relevant data to be annotated by human experts is model predictions, and export new annotations back to a primary challenge for active learning. Recent works dataset-specific formats for model re-training. We mea- define a scoring function, using either an uncertaintysure the efectiveness of our tool through user studies, based [19, 20, 21, 22, 23, 24, 25, 26] or a diversity-based focusing on the ease of using the data conversion and approach [27, 26], to determine the most informative annotation editing features. Our results show that the training samples. When selecting samples based on untool is both intuitive to use and useful for rapidly adding certainty, informativeness is measured by the predictive new annotations. uncertainty, and the samples with the highest uncertainty are provided to the human annotators. When selecting samples based on diversity, scores are assigned based on 2. Related Work spatial and temporal diversity [27]. Both the uncertaintybased and diversity-based approaches have been used for In this section we present a brief overview of existing 3D bootstrapping 3D object detection, but the uncertaintyannotation tools and active learning methods for object based approach has been shown to be more efective in detection. practice. More recently, [26] attempts to combine both 3D Annotation Tools. Modern 3D detectors require strategies. Despite substantial work in active learning for diverse, large-scale 3D annotations for supervised learn- image recognition and 2D object detection, exploration ing. A variety of tools have been created to address into active learning for 3D object detection is limited. this challenge [15], including labelCloud[10], SAnE[11], ReBound helps support active learning by allowing users LATTE[12], and 3D BAT[13], which all allow for eficient to filtering predicted annotations based on detection conmanual or semi-automatic dataset annotation. However, ifdence score, which is useful for measuring uncertainty. unlike these existing tools, ReBound also allows users to manually update annotations from existing datasets for evolving use cases like detecting a new category or updat- 3. ReBound: 3D Bounding-Box ing annotations for far-field objects. Our tool currently Re-Annotation supports re-annotation for the nuScenes[16], Waymo [17], and Argoverse 2.0[18] datasets. In this section, we describe the functionality and software

Active Learning for 3D Object Detection. Al- architecture of ReBound. Our code, documentation, and though manually labeling tens of thousands of images videos demonstrating our tool are available on GitHub. or LiDAR sweeps can be prohibitively expensive, recent Data Format. ReBound currently supports three difwork in active learning suggests that we can bootstrap ferent datasets: nuScenes, Waymo Open Dataset, and deep learning models by iteratively annotating and trainArgoverse 2.0. Annotations for these three datasets are converted into our generic ReBound format using a separgaetneecroicmdmataanfdorlimneattocoapltbuurieltsitnhePymt hinoinm. uSmpeicnificfoalrlmy,atthioen secSun estaD isrevnoCsecSun tpircS required to annotate a 3D bounding box (e.g. object centaellro,wbosxussizteo, bscoaxlerotthaitsiofno,rmanadt saecnrosossr emxutrlitnipsliecsd),awtahseicths. omyaesWtaD isrevnoC omyaW tpircS taD dnuoBeRcirneG notzlausiV loIUTG woydnailWpsiD Users can support new datasets by extending our command line tool for their use case. sevogrA sevogrA 0.2

Data Visualization. The visualization tool has three 0.2 isrevnoC tpircS windows: a control window, a point cloud viewer, and an RGB image viewer, as shown in Figure 1. The control wfrainmdeoswinalalodwrisvtinhge luosge,rstwoitncahvibgeattweetehnroduifegrhendtifecraemnt- dA secSun rtopxEtpircS secSun estaD era views in the RGB image window, and filter through iatonA noitpO Tbohtehpgorionutncdloturdutvhieawnenrodtaistipolnayssatnhdempooidnetl cplroeuddicctoiornres-. notgivaN enaP tidE efidoMcirneGestaD omyaWrtopxEtpircS omyaesWtaD saapsnonwnoidtraientfirgoantmosetahcneudbcpuoirrdreesd.nicTttfhiroeanmusse(e,irfascaavwnaeirllaolbtaalsetae),lfltogrrarnothusalnatdtfert,raaumnthde iatonA noitpO sevo0g.r2A tpircS 0s.2evogrA zoom in/out to interact with the scene. We use Open3D as our rendering back-end as this natively supports 3D rendering on top of LiDAR sweeps and RGB images. Figure 3: Our tool supports data conversion, visualization,

Editing Tool. Users can directly click on a location and modification of the nuScenes, Waymo and Argoverse 2.0 in the point cloud window to add, edit, or delete an an- datasets. First, dataset-specific RGB images, LiDAR sweeps, notation. To edit a box’s properties, users must first click sensor extrinsics, and annotations are converted to the Reon a desired cuboid in the point cloud window. This will Bound data format using our conversion scripts and visualized highlight the bounding box and display the position, ro- in the GUI. Using this GUI, we can add, edit, or delete existing tation, size and annotation class of the selected box in the annotations. Lastly, we can export data from the ReBound control window. These fields can all be directly updated, data format back to the respective dataset-specific formats allowing the user to make precise changes. Users can using the provided export scripts. also make coarse changes to the location of the selected box by clicking and dragging the box within the LiDAR viewer. To make it easier to interact with 3D objects, the mouse tool is restricted to two diferent control modes:

Exporting. Users can export updated annotations back to the original dataset-specific formats using a command line tool. Importantly, ReBound only exports the • Horizontal Translation Mode: The user can artic- modified bounding box annotations back to the origiulate objects along the and axes with the nal format. Since ReBound does not modify the LiDAR axis locked sweeps or RGB images, we can directly use the data from • Vertical Translation Mode: The user can click and the original dataset. We find that this dramatically indrag objects along the axis and rotate objects creases data export speed. Interestingly, since we convert about the axis. all datasets into a unified format, we can easily export annotations between datasets. Concretely, this means we

Finer-grained modifications can be manually entered can convert nuScenes annotations into the Argoverse in the control window. Users can also delete the selected 2.0 format using the ReBound format as an intermedibounding box, and create new bounding boxes with a ary, making it easier to evaluate models across diferent single mouse-click. All bounding box transformations datasets. Further, this can facilitate future research in are updated in both the LiDAR and RGB image viewers model robustness and domain adaptation between difin real-time. In practice, the editing tool can be used to ferent datasets. Similar to the data conversion tool that update bounding boxes that are too large, are misaligned converts from dataset-specific formats to the ReBound with the true object location (c.f. Figure 4), update mis- format, users that wish to support a new dataset can classified objects, and add new categories like stop signs extend our command line tool for their use case. and trafic lights.

4. Experiments In this section we present the results of our user survey to evaluate the efectiveness of our tool.

Ease of Use. We conducted surveys to evaluate the tool’s ease of use in facilitating data conversion between data formats and bounding box adjustments among both people familiar and unfamiliar with autonomous vehicle datasets and 3D annotation tools.

We asked ten participants to perform 13 tasks (shown in Table 1) over a video conference. Participants were shown a demonstration of the tool before being asked to complete a new, but related task using the tool.

This method of demonstration allows us to avoid redownloading the datasets and re-installing the software on a new computer for each trial. After completing all 13 tasks, users were asked to complete a questionnaire designed to rate diferent parts of the user experience on Figure 4: We visualize a ground truth annotation from the a scale of one to five (five is highest). Argoverse 2.0 dataset (left). We find that the annotation is

Based on our survey, we identified that the greatest misaligned with the true car location. We adjust this ground challenge for users was translating and rotating bound- truth annotation using ReBound (right). Visually inspecting ing boxes. Both of these operations require fine-grained the updated bounding box in both the LiDAR and RGB imcontrols and navigating in a 3D space, which may be not age viewer confirms that the updated annotation correctly be intuitive for some users. However these features also localizes the car. had high standard deviation compared to the other tasks, indicating that the utility of our tool may depend on prior use of 3D visualization and annotation tools. In general, 5. Conclusion participants stated that the application can be useful for autonomous vehicle research. The academic community studies 3D object detection

Bounding Box Adjustments. We visualize ground using publicly available AV datasets, which often provide truth annotations in both the LiDAR and RGB image view- LiDAR sweeps, RGB images, and 3D bounding box annoers. We are able to identify examples where the ground tations. However, these datasets are limited by the annotruth annotation is misaligned with the real object, as tations provided by their curators. In some cases, we find shown in Figure 4, highlighting a potential application that these annotations may be incorrect or incomplete. of our tool. This is practically meaningful as training 3D Existing annotation tools cannot be easily used to update detectors with noisy labels can result in degraded model annotations for existing datasets, which makes it difiperformance. cult to fix incorrect annotations, annotate new categories of interest, or iteratively improve 3D object detection models through active learning. In this paper, we propose ReBound, an open-source alternative that simplifies the process of bounding box re-annotation for existing datasets. We validate the utility of our tool through user surveys and find that users are able to rapidly add, modify, and delete 3D bounding box annotations.

Acknowledgments Author James Purtilo was supported by N000142112821

while working on this project. This work is supported by Table 1 the SEAM Lab at the University of Maryland and the Argo Our survey results show that users found it easier to create, AI Center for Autonomous Vehicle Research at Carnegie delete, and modify annotations but found it more dificult Mellon University. to rotate, translate, and export bounding boxes from the ReBound data format to dataset-specific formats.

Task List Convert nuScenes data to the ReBound generic type Enter annotation mode Add a new annotation bounding box Add a custom annotation type Select an existing bounding box Translate a bounding box Rotate a bounding box Change the label of a bounding box Delete a bounding box Use the control viewer to edit an annotation Save modified annotations Exit the application Export data back to the nuScenes data format

IEEE Intelligent Transportation Systems Confer

ence, ITSC 2019, Auckland, New Zealand, October [1] A. Geiger, P. Lenz, R. Urtasun, Are we ready for 27-30, 2019, IEEE, 2019, pp. 265–272. autonomous driving? the kitti vision benchmark [13] W. Zimmer, A. Rangesh, M. M. Trivedi, 3d BAT: suite, in: IEEE Conference on Computer Vision and A semi-automatic, web-based 3d annotation toolPattern Recognition, 2012. box for full-surround, multi-modal data streams, [2] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, in: 2019 IEEE Intelligent Vehicles Symposium, IV Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, 2019, Paris, France, June 9-12, 2019, IEEE, 2019, pp. nuscenes: A multimodal dataset for autonomous 1816–1821. driving, in: Proceedings of the IEEE/CVF Confer- [14] S. Gupta, J. Kanjani, M. Li, F. Ferroni, J. Hays, D. Raence on Computer Vision and Pattern Recognition, manan, S. Kong, Far3det: Towards far-field 3d de2020. tection, in: NeurIPS, 2022. [3] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, [15] Y. Li, J. Ibañez-Guzmán, Lidar for autonomous S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. driving: The principles, challenges, and trends for Pontes, D. Ramanan, P. Carr, J. Hays, Argoverse 2: automotive lidar and perception systems, IEEE Next generation datasets for self-driving perception Signal Process. Mag. 37 (2020) 50–61. and forecasting, in: Neural Information Processing [16] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Systems Datasets and Benchmarks Track, 2021. Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, [4] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, nuscenes: A multimodal dataset for autonomous V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, driving, in: 2020 IEEE/CVF Conference on ComV. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Tim- puter Vision and Pattern Recognition, CVPR 2020, ofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Seattle, WA, USA, June 13-19, 2020, Computer ViY. Zhang, J. Shlens, Z. Chen, D. Anguelov, Scalabil- sion Foundation / IEEE, 2020, pp. 11618–11628. ity in perception for autonomous driving: Waymo [17] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, open dataset, in: IEEE/CVF Conference on Com- V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, puter Vision and Pattern Recognition (CVPR), 2020. V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Tim[5] T. Yin, X. Zhou, P. Krähenbühl, Center-based ofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, 3d object detection and tracking, arXiv preprint Y. Zhang, J. Shlens, Z. Chen, D. Anguelov, ScalabilarXiv:2006.11275 (2020). ity in perception for autonomous driving: Waymo [6] N. Peri, A. Dave, D. Ramanan, S. Kong, Towards open dataset, in: 2020 IEEE/CVF Conference on long-tailed 3d detection, in: Conference on Robot Computer Vision and Pattern Recognition, CVPR Learning (CoRL), 2022. 2020, Seattle, WA, USA, June 13-19, 2020, Computer [7] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, Vision Foundation / IEEE, 2020, pp. 2443–2451.

O. Beijbom, Pointpillars: Fast encoders for ob- [18] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, ject detection from point clouds, in: IEEE Confer- S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. ence on Computer Vision and Pattern Recognition Pontes, D. Ramanan, P. Carr, J. Hays, Argoverse (CVPR), 2019. 2: Next generation datasets for self-driving percep[8] B. Zhu, Z. Jiang, X. Zhou, Z. Li, G. Yu, Class- tion and forecasting, in: J. Vanschoren, S. Yeung balanced grouping and sampling for point cloud 3d (Eds.), Proceedings of the Neural Information Proobject detection, arXiv preprint arXiv:1908.09492 cessing Systems Track on Datasets and Benchmarks (2019). 1, NeurIPS Datasets and Benchmarks 2021, Decem[9] N. Peri, J. Luiten, M. Li, A. Osep, L. Leal-Taixe, D. Ra- ber 2021, virtual, 2021.

manan, Forecasting from lidar via future object [19] E. Haussmann, M. Fenzi, K. Chitta, J. Ivanecky, detection, arXiv:2203.16297 (2022). H. Xu, D. Roy, A. Mittel, N. Koumchatzky, C. Fara[10] C. Sager, P. Zschech, N. Kühl, labelcloud: A bet, J. M. Alvarez, Scalable active learning for object lightweight domain-independent labeling tool for detection, in: IEEE Intelligent Vehicles Symposium, 3d object detection in point clouds, CoRR IV 2020, Las Vegas, NV, USA, October 19 - Novemabs/2103.04970 (2021). ber 13, 2020, IEEE, 2020, pp. 1430–1435. [11] H. A. Arief, M. Arief, G. Zhang, Z. Liu, M. Bhat, U. G. [20] T. Yuan, F. Wan, M. Fu, J. Liu, S. Xu, X. Ji, Q. Ye, MulIndahl, H. Tveite, D. Zhao, Sane: Smart annotation tiple instance active learning for object detection, and evaluation tools for point cloud data, IEEE in: CVPR, 2021.

Access 8 (2020) 131848–131858. [21] D. Feng, X. Wei, L. Rosenbaum, A. Maki, K. Diet[12] B. Wang, V. Wu, B. Wu, K. Keutzer, LATTE: ac- mayer, Deep active learning for eficient training of celerating lidar point cloud annotation via sensor a lidar 3d object detector, in: 2019 IEEE Intelligent fusion, one-click annotation, and tracking, in: 2019 Vehicles Symposium, IV 2019, Paris, France, June 9-12, 2019, IEEE, 2019, pp. 667–674. [22] S. Roy, A. Unmesh, V. P. Namboodiri, Deep active learning for object detection, in: British Machine Vision Conference 2018, BMVC 2018, Newcastle,

UK, September 3-6, 2018, BMVA Press, 2018, p. 91. [23] A. Hekimoglu, M. Schmidt, A. Marcos-Ramiro,

G. Rigoll, Eficient active learning strategies for monocular 3d object detection, in: 2022 IEEE Intelligent Vehicles Symposium, IV 2022, Aachen, Germany, June 4-9, 2022, IEEE, 2022, pp. 295–302. [24] S. Schmidt, Q. Rao, J. Tatsch, A. C. Knoll, Advanced active learning strategies for object detection, in: IEEE Intelligent Vehicles Symposium, IV 2020, Las Vegas, NV, USA, October 19 - November 13, 2020,

IEEE, 2020, pp. 871–876. [25] M. Meyer, G. Kuschk, Automotive radar dataset for deep learning based 3d object detection, in: 2019 16th European Radar Conference (EuRAD), 2019. [26] Y. Luo, Z. Chen, Z. Wang, X. Yu, Z. Huang, M. Baktashmotlagh, Exploring active 3d object detection from a generalization perspective, CoRR abs/2301.09249 (2023). [27] Z. Liang, X. Xu, S. Deng, L. Cai, T. Jiang, K. Jia,

Exploring diversity-based active learning for 3d object detection in autonomous driving, CoRR abs/2205.07708 (2022).