=Paper= {{Paper |id=Vol-2617/paper11 |storemode=property |title=Object-Centric Camera Drone Control for Unconstrained Telepresence |pdfUrl=https://ceur-ws.org/Vol-2617/paper11.pdf |volume=Vol-2617 |authors=Jiannan Li,Ravin Balakrishnan,Tovi Grossman |dblpUrl=https://dblp.org/rec/conf/chi/LiBG20 }} ==Object-Centric Camera Drone Control for Unconstrained Telepresence== https://ceur-ws.org/Vol-2617/paper11.pdf
                                   Object-Centric Camera Drone Control
                                   for Unconstrained Telepresence

Jiannan Li                                                                           Abstract
University of Toronto                                                                Camera drones, a rapidly emerging technology, offer people
40 St George St, Toronto, ON,                                                        the ability to remotely inspect an environment with a high
Canada
                                                                                     degree of mobility and agility. However, manual remote pi-
jiannanli@dgp.toronto.edu
                                                                                     loting of a drone is prone to errors. In contrast, autopilot
                                                                                     systems are not necessarily designed to support flexible
Ravin Balakrishnan                                                                   visual inspections. We propose the object-centric control
University of Toronto                                                                paradigm for efficient camera drone navigation, in which
40 St George St, Toronto, ON,                                                        a user directly specifies the navigation of a drone camera
Canada                                                                               relative to a specified object of interest. We demonstrated
jiannanli@dgp.toronto.edu
                                                                                     the strengths of this approach through our first prototype,
                                                                                     StarHopper, and discuss future research opportunities.
Tovi Grossman
University of Toronto                                                                Author Keywords
40 St George St, Toronto, ON,                                                        Drone; telepresence; object-centric
Canada
tovi@dgp.toronto.edu                                                                 Introduction
                                                                                     Researchers in telepresence have long envisioned ‘beyond
                                                                                     being there’ [1]. Replicating all relevant local experiences,
                                                                                     while remote, should not be the only goal of telepresence;
                                                                                     rather, we should also strive to create telepresence systems
This paper is published under the Creative Commons Attribution 4.0 International     which can enable benefits that are not possible when the
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their
personal and corporate Web sites with the appropriate attribution.
                                                                                     person is physically present. As such, telepresence goes
Interdisciplinary Workshop on Human-Drone Interaction (iHDI 2020)                    from replication to augmentation. One particular instance
CHI ’20 Extended Abstracts, 26 April 2020, Honolulu, HI, US
© Creative Commons CC-BY 4.0 License.
                                                                                     of this vision is enabled by camera drones: our local bodies
                                                                                     can only walk on the ground, but our remote bodies can fly.
                                      Researchers have noted a number of social and functional
                                      issues due to the insufficient mobility of current remote
                                      robotic presence platforms [15]. With drones becoming
                                      more affordable and reliable, they hold the potential for en-
                                      abling more flexible remote presence and visual inspection
                                      experiences (e.g. [2]) for the general population.

                                      While drones offer promise for such telepresence applica-
                                      tions, they are challenging to manually control remotely, due
                                      to numerous factors including high degrees of freedom, nar-
                                      row camera field-of-views, and network delays [7]. Their
                                      control interfaces - virtual or physical joysticks for consumer
                                      drones - are also unfamiliar for many users and take ex-
                                      tended training time to master [9].                                           Figure 2: StarHopper system components.

                                      To relieve the burden of manual piloting, autopilot tech-
                                      niques have been applied to drone control. Most existing
                                      drone autopilot interfaces are based on specifying a se-           the location of a 3D object of interest. We demonstrated
                                      ries of planned waypoints in a 2D or 3D global map (e.g.           the potential of this approach through our first prototype,
                                      [9]). However, in a situation where a user wishes to perform       StarHopper [6], and illustrate future research opportunities.
                                      a real-time inspection, setting waypoints a priori may not
                                      be efficient for producing the viewer’s desired viewpoints.        Previous Work: An Object-Centric Interface for
Figure 1: Operating a camera          Some autonomous systems avoid the use of waypoints and             Remote Inspection
drone remotely to inspect an                                                                             StarHopper is a remote object-centric camera drone navi-
                                      execute higher-level plans, such as following a subject to
apartment. (a) The user specifies a
                                      form canonical shots [3], but they typically do not offer the      gation interface that is operated through familiar touch inter-
desired view of the coffee machine
                                      flexibility for exploring remote environments.                     actions and relies on minimal geometric information of the
by dragging on the drone’s camera
view (b) the drone flies towards to
                                                                                                         environment (Figure 1). It consists of an overhead camera
                                      The difficulty of drone piloting poses a significant barrier for   view for context and a 3D-tracked drone’s first-person view
the specified viewpoint.
                                      the widespread adoption of free-flying robots. The goal of         for focus (Figure 2). New objects of interest can be speci-
                                      this research is to design a camera drone control interface        fied through simple touch gestures on both camera views.
                                      to support efficient and flexible telepresence experience.         We combine automatic and manual control via four navi-
                                      Our work is inspired by decades of research in interactive         gation mechanisms that can complement each other with
                                      graphics, for which many camera navigation techniques              unique strengths, to support efficient and flexible visual in-
                                      have been established (e.g. [5]).                                  spection. The system focuses on indoor environments, rep-
                                                                                                         resentative of tasks such as remote warehouse inspection
                                      Most relevant, we build upon object-centric techniques,
                                                                                                         [9] and museum visits [11], and where positional tracking
                                      where zooming, panning, and orbiting occurs relative to
                                                                                                          construction methods. She selects the object of interest
                                                                                                          through a drag gesture first in the overview camera view
                                                                                                          and then in the drone camera view. A computer vision al-
                                                                                                          gorithm triangulates the position of the object from these
                                                                                                          two regions and estimates the dimensions of a bounding
                                                                                                          cylinder of the object (see [6] for more technical details).

                                                                                                          Inspired by camera control mechanisms in interactive graph-
                                                                                                          ics, we have designed three object-centric physical camera
                                                                                                          navigation mechanisms for viewing an object of focus: 360
                                                                                                          viewpoint widget, delayed through-the-lens control, and
                                                                                                          object-centric joysticks.
                                      Figure 3: The StarHopper user interface. (a) Remote drone
                                      camera view. (b) Overview camera view. (c) Virtual joysticks. (d)   360 Viewing widget
                                      Object-of-interest list. (e) Icon for object-centric mode.
                                                                                                          The 360 viewpoint widget is a widget for quickly navigat-
                                                                                                          ing to and focusing on an object of interest, from a user-
                                                                                                          specified viewing angle. The widget takes the shape of a
                                      technology is more reliable.                                        semi-transparent 3D ring, surrounding the focus object (Fig-
                                                                                                          ure 4a). A 3D arrow aimed at the ring appears upon touch,
                                      Design Guidelines                                                   indicating the desired viewing direction. The user can drag
Figure 4: Interaction with the 360
                                      We base our design for remote object-centric drone nav-             the finger on the ring to set the desired viewpoint position
viewpoint widget. (a) The user
                                      igation on a set of guidelines grounded by our review of            (Figure 4b). Once the user releases the finger, the autopi-
touches the area around the ring to
activate the widget. (b) The user     prior literature. These guidelines are: (1) support situation       lot system moves the drone to the calculated viewpoint.
drags the finger to adjust the        awareness (2) minimize reliance on environmental informa-           The algorithm determines a reasonable default viewing dis-
viewing angle and camera height.      tion (3) combine automated and manual control (4) Support           tance, based on the size of the bounding cylinder.
Upon releasing the drag, the drone    simple touch interactions (5) Respect physical constraints.
navigates to the specified                                                                                Delayed through-the-lens control
viewpoint.                            User Interface and Navigation Mechanisms                            To use this technique, the user first rests two fingers on the
                                      StarHopper provides a touch screen interface for the users          drone camera view to freeze the current frame (Figure 5a,
                                      to view the drone’s live stream video and to perform drone          next page). The user then performs a two-finger pinch-and-
                                      navigations (Figure 3). The drone camera feed fills the             pan gesture to transform the current frame to the desired
                                      screen. The overview camera video and two virtual joy-              viewpoint (Figure 5b, next page). The system then calcu-
                                      sticks are at the bottom of the interface.                          lates a new drone position that can produce the desired
                                                                                                          viewpoint which the drone navigates towards.
                                      The user can obtain the approximate position and dimen-
                                      sions of an object through a simple two-step procedure,
                                      without using pre-built maps or expensive real-time 3D re-
                                     Object-centric joysticks                                          mechanisms (Table 1). The 360 viewpoint widget, despite
                                     We remap the axis of traditional drone control joysticks to       its high efficiency, lacks in flexibility and can be comple-
                                     object-centric commands and add constraints to prevent            mented by delayed through-the-lens control, object-centric
                                     manipulation errors. More specifically, under the object-         joysticks and manual controls.
                                     centric constraints, the drone keeps the object of interest in
                                     its field-of-view during the pan movements (Figure 7a, next       User Study
                                     page). In object-centric zoom, the drone aims its camera          To evaluate the navigation mechanisms of StarHopper,
                                     at the object of interest and moves closer or further away        we conducted a user study consisting of a remote object
                                     from it (Figure 7b, next page). In response to the orbiting       inspection task with 12 volunteers (7 female, Mage =
                                     commands, the drone orbits around the object while aiming         26.3, SDage = 4.4). A Ryze Tello drone was used in the
                                     at its center (Figure 7c, next page).                             study. We compared StarHopper to a baseline, consisting
                                                                                                       of conventional manual joystick controls. In each trial, the
                                     Manual joysticks                                                  participant was instructed to fly the drone from the start-
                                     In addition to the three object-centric navigation mecha-         ing position to inspect one of the four sides (Left, Right,
                                     nisms, StarHopper also supports fully manual controls. This       Front, Back) of an item (Figure 8) using one of the two con-
                                     could be useful in cases where the user wishes to make            trol interfaces, StarHopper or manual joysticks (Manual).
                                     slight adjustments to a viewpoint that the auto-pilot system      We recorded the completion time of each trial.
                                     navigated to.
Figure 5: Adjusting the camera                                                                         A repeated measures analysis of variance showed that it
view using delayed                   Managing objects of interest                                      was significantly faster to complete the task with StarHop-
through-the-lens control. (a) The    The object-of-interest list on the right of the interface (3d)    per than with egocentric manual control (F1,11 = 23.8, p <
user rests two fingers on the        records the thumbnails of all previously registered objects       0.001). Overall StarHopper was 35.4% faster (StarHopper:
screen to freeze the current view.   of interest. The user can tap on the thumbnail to set it as       20.33s, Manual: 31.45s), demonstrating a substantial gain
(b) A pan and zoom gesture on the    the object-of-interest, and the drone will turn towards it. A     in efficiency (Figure 6).
frozen frame specifies the desired   double-tap on the thumbnail will trigger the drone to ap-
view.
                                     proach that object.

                                     Navigation mechanism properties
                                     StarHopper consists of a set of four navigation mecha-
                                     nisms, ranging from fully automated to fully manual. This
                                     suite of techniques allows users to perform both flexible and     Figure 6: Mean task completion time of manual control and
                                                                                                       StarHopper. Error bars represent 95% CI.
                                     efficient scene inspections by leveraging their contrasting
                                     capabilities (Table 1, page 6).

                                     We recognize the trend that a higher automation level in-         Future Research Opportunities
                                     creases efficiency but reduces flexibility. Taken together, the   The StarHopper prototype demonstrated the potential ef-
                                     system offers the user both efficient and flexible navigation     ficiency advantage for object-centric camera drone con-
                                       trol. More importantly, it revealed several future challenges    rately interpreting remote users’ actions and intentions for
                                       and opportunities for better leveraging the object-centric       local users. Such challenges are exacerbated on drones as
                                       paradigm for unconstrained telepresence.                         their movements and form factors can be very different from
                                                                                                        humans. Recent research proposed signaling drone mo-
                                       Increasing Situation Awareness for Leveraging Greater Mobil-     tion intent with augmented reality [12]. However, future flight
                                       ity                                                              paths and waypoints can be insufficient for a remote user
                                       Supporting situation awareness has long been a key theme         who operates the drone to establish common ground with a
                                       in robot teleoperation research ([14]). With greater mo-         local user, for example, when they want to make sure that
                                       bility, drone operators face greater risk of getting lost in     they are discussing about the same object among a num-
                                       space ([7]). Prior research has shown the effectiveness          ber of candidates in the environment. Visualizing objects of
                                       of a live exocentric overview for enhancing situation aware-     interest can complement the above signaling method and
                                       ness in teleoperation (e.g. [8]). StarHopper incorporated        facilitate communication.
                                       a static live overview camera, but this setup reduced the
                                       area where the drone could fly. Future research can explore      Privacy Considerations
                                       awareness mechanisms that do not sacrifice mobility. For         A free-roaming viewpoint, such as a drone, raises privacy
                                       example, a second, spatially coupled camera drone serving        concerns about remote users intentionally or unintentionally
                                       as the overhead camera [10].                                     seeing private visual information of local users. Prior re-
                                                                                                        search in video-mediated communication has looked exten-
                                       Richer Interaction Using Objects-of-Interest Semantic Informa-   sively into privacy issues, but largely for fixed cameras (e.g.
                                       tion                                                             [4]). Privacy research on drones mostly studied perceptions
                                       Interactions with objects-of-interest in StarHopper were lim-    about drones operated by strangers (e.g. [13]). Drones for
                                       ited to specifying desirable viewpoints, as StarHopper only      telepresence, especially drones that work closely with hu-
                                       exploited simple geometric information. With recent ad-          mans, call for new privacy mechanisms. Local users can
                                       vancements in image understanding, a natural next step           define sensitive objects or zones, which remotely operated
                                       would be enabling richer and more meaningful interactions        drones should always avert.
                                       using semantic information about objects of interest. For
Figure 7: The object-centric           instance, instead of following the single rule of placing the    Conclusion
joystick controls. Red areas           object at the center of the camera frame by default, the sys-    Remotely operated camera drones hold potential for un-
indicate the joystick axes used. (a)   tem could choose a more appropriate camera framing and           constrained telepresence ‘beyond being there’ but require
Pan. (b) Zoom. (c) Orbit.              trajectory depending on the object and relevant context.         careful control interface designs to realize such poten-
                                       The drone could focus on the upper body of a person in           tial. Through prototyping and evaluating StarHopper, we
                                       conversation, or zooming in on the console of an instrument      showed the advantage of the object-centric control paradigm
                                       for key readings.                                                for camera drone teleoperation. We further invite the re-
                                                                                                        search community to consider future opportunities in apply-
                                       Design for Local Users
                                                                                                        ing the object-centric paradigm to develop more useful and
                                       While tele-operated robots give the ability to control view-
                                                                                                        usable camera drone control interfaces for unconstrained
                                       points back to remote users, they raise challenges of accu-
                                                                http://dx.doi.org/10.1145/142750.142769
                                                                event-place: Monterey, California, USA.
                                                             [2] Brennan Jones, Kody Dillman, Richard Tang, Anthony
                                                                 Tang, Ehud Sharlin, Lora Oehlberg, Carman
                                                                 Neustaedter, and Scott Bateman. 2016. Elevating
                                                                 Communication, Collaboration, and Shared
                                                                 Experiences in Mobile Video Through Drones.
                                                                 Proceedings of the 2016 ACM Conference on
                                                                 Designing Interactive Systems (2016), 1123–1135.
                                                                 DOI:http://dx.doi.org/10.1145/2901790.2901847
       Table 1: Properties of the four control mechanisms.       event-place: Brisbane, QLD, Australia.
                                                             [3] Niels Joubert, Jane L. E, Dan B. Goldman, Floraine
                                                                 Berthouzoz, Mike Roberts, James A. Landay, and Pat
                                                                 Hanrahan. 2016. Towards a Drone Cinematographer:
                                                                 Guiding Quadrotor Cameras using Visual Composition
                                                                 Principles. arXiv:1610.01691 [cs] (5 10 2016).
                                                                 http://arxiv.org/abs/1610.01691 arXiv:
                                                                 1610.01691.
                                                             [4] Tejinder K. Judge, Carman Neustaedter, and
                                                                 Andrew F. Kurtz. 2010. The family window: the design
                                                                 and evaluation of a domestic media space.
                                                                 Proceedings of the SIGCHI Conference on Human
                                                                 Factors in Computing Systems (10 4 2010),
                                                                 2361–2370. DOI:
Figure 8: Mean task completion time of manual control and        http://dx.doi.org/10.1145/1753326.1753682
StarHopper. Error bars represent 95% CI.                         [Online; accessed 2020-02-10].
                                                             [5] Azam Khan, Ben Komalo, Jos Stam, George
                                                                 Fitzmaurice, and Gordon Kurtenbach. 2005.
telepresence.
                                                                 HoverCam: Interactive 3D Navigation for Proximal
                                                                 Object Inspection. Proceedings of the 2005
REFERENCES
                                                                 Symposium on Interactive 3D Graphics and Games
 [1] Jim Hollan and Scott Stornetta. 1992. Beyond Being
                                                                 (2005), 73–80. DOI:
     There. Proceedings of the SIGCHI Conference on
                                                                 http://dx.doi.org/10.1145/1053427.1053439
     Human Factors in Computing Systems (1992),
                                                                 event-place: Washington, District of Columbia.
     119–125. DOI:
 [6] Jiannan Li, Ravin Balakrishnan, and Tovi Grossman.             N. Roy, J. Schulte, and D. Schulz. 2000. Probabilistic
     2020. StarHopper: A Touch Interface for Remote                 Algorithms and the Interactive Museum Tour-Guide
     Object-Centric Drone Navigation. Proceedings of                Robot Minerva. The International Journal of Robotics
     Graphical Interface 2020 (2020).                               Research 19, 11 (1 11 2000), 972–999. DOI:
                                                                    http://dx.doi.org/10.1177/02783640022067922
 [7] David Pitman and Mary L. Cummings. 2012.
     Collaborative Exploration with a Micro Aerial Vehicle: A   [12] Michael Walker, Hooman Hedayati, Jennifer Lee, and
     Novel Interaction Method for Controlling a Mav with a           Daniel Szafir. 2018. Communicating Robot Motion
     Hand-held Device. Adv. in Hum.-Comp. Int. 2012 (1               Intent with Augmented Reality. Proceedings of the
     2012), 18:18–18:18. DOI:                                        2018 ACM/IEEE International Conference on
     http://dx.doi.org/10.1155/2012/768180                           Human-Robot Interaction (2018), 316–324. DOI:
 [8] D. Saakes, V. Choudhary, D. Sakamoto, M. Inami, and             http://dx.doi.org/10.1145/3171221.3171253
     T. Lgarashi. 2013. A teleoperating interface for ground         event-place: Chicago, IL, USA.
     vehicles using autonomous flying cameras. 2013 23rd        [13] Yang Wang, Huichuan Xia, Yaxing Yao, and Yun
     International Conference on Artificial Reality and              Huang. 2016. Flying Eyes and Hidden Controllers: A
     Telexistence (ICAT) (12 2013), 13–19. DOI:                      Qualitative Study of People’s Privacy Perceptions of
     http://dx.doi.org/10.1109/ICAT.2013.6728900                     Civilian Drones in The US. Proceedings on Privacy
 [9] Daniel Szafir, Bilge Mutlu, and Terrence Fong. 2017.            Enhancing Technologies 2016, 3 (1 7 2016), 172–190.
     Designing planning and control interfaces to support            DOI:http://dx.doi.org/10.1515/popets-2016-0022
     user collaboration with flying robots. The International   [14] H. A. Yanco and J. Drury. 2004. "Where am I?"
     Journal of Robotics Research 36, 5-7 (1 6 2017),                Acquiring situation awareness using a remote robot
     514–542. DOI:                                                   platform. 2004 IEEE International Conference on
     http://dx.doi.org/10.1177/0278364916688256                      Systems, Man and Cybernetics (IEEE Cat.
[10] Ryotaro Temma, Kazuki Takashima, Kazuyuki Fujita,               No.04CH37583) 3 (10 2004), 2835–2840 vol.3. DOI:
     Koh Sueda, and Yoshifumi Kitamura. 2019.                        http://dx.doi.org/10.1109/ICSMC.2004.1400762
     Third-Person Piloting: Increasing Situational
                                                                [15] Lillian Yang, Brennan Jones, Carman Neustaedter,
     Awareness Using a Spatially Coupled Second Drone.
                                                                     and Samarth Singhal. 2018. Shopping Over Distance
     Proceedings of the 32Nd Annual ACM Symposium on
                                                                     Through a Telepresence Robot. Proc. ACM
     User Interface Software and Technology (2019),
                                                                     Hum.-Comput. Interact. 2, CSCW (11 2018),
     507–519. DOI:
                                                                     191:1–191:18. DOI:
     http://dx.doi.org/10.1145/3332165.3347953
                                                                     http://dx.doi.org/10.1145/3274460
     event-place: New Orleans, LA, USA.
[11] S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A. B.
     Cremers, F. Dellaert, D. Fox, D. Hähnel, C. Rosenberg,