=Paper=
{{Paper
|id=Vol-1716/WSICC_2016_paper_9
|storemode=property
|title=WorldViews: Connecting the World Through Easy Sharing of Views
|pdfUrl=https://ceur-ws.org/Vol-1716/WSICC_2016_paper_9.pdf
|volume=Vol-1716
|authors=Don Kimber,Enock Glidden,Jennifer Marlow
|dblpUrl=https://dblp.org/rec/conf/tvx/KimberGM16
}}
==WorldViews: Connecting the World Through Easy Sharing of Views==
WorldViews: Connecting the world through easy sharing of views
Don Kimber Enock Glidden Jennifer Marlow
FXPAL & WorldViews WorldViews FXPAL
3174 Porter Drive, 3174 Porter Drive, 3174 Porter Drive,
Palo Alto, CA, 94304 Palo Alto, CA, 94304 Palo Alto, CA, 94304
kimber@fxpal.com enockglidden@hotmail.com marlow@fxpal.com
ABSTRACT
The confluence of technologies such as telepresence,
immersive imaging, model based virtual mirror worlds,
mobile live streaming, etc. give rise to a capability for
people anywhere to view and connect with present or past
events nearly anywhere on earth. This capability properly
belongs to a public commons, available as a birthright of all
humans, and can been seen as part of an evolutionary
transition supporting a global collective mind. We
describe examples and elements of this capability, and
suggest how they can be better integrated through a tool we
call TeleViewer and a framework called WorldViews,
which supports easy sharing of views as well as connecting
of providers and consumers of views all around the world.
Author Keywords
Telepresence; interactive media; GIS systems; multimedia
browsing; social media
ACM Classification Keywords Figure 1. TeleViewer: a virtual earth based viewer
H.5.1 Multimedia information systems, H5.2 user portal providing easy views of any place on earth.
interfaces, H.m Miscellaneous.
needs, are part of a collective process.
INTRODUCTION
Teilhard de Chardin’s noosphere, is now more commonly
This paper examines trends in digital media related to how
termed cyberspace. It entails not only the online
people see the world, starting from a global perspective of
computational and informational world, such as all data and
human evolution. Pierre Teilhard de Chardin has described
knowledge resident in the cloud, but the association of
evolution in broad terms as having given rise not only to the
human minds and consciousness with and through that
geologic structure of our planet and its atmosphere, and
space. John Barlow Perry writes “We will create a
then of a living biosphere, but ultimately to a noosphere -
civilization of the Mind in Cyberspace.” [2] The
the space of mind and thought. [1] Human intelligence is
development of language, writing, printing, have all been
not primarily an intelligence of the individual, but a
important steps in the evolution of noosphere. But the
collective intelligence. Everything we know and think as
development of electronic media, and ultimately of the
individuals, lays within a cultural matrix tying us together,
internet, are to the evolution of human collective
and all our achievements, whether as grandiose as walking
intelligence, as the transition from purely chemical
on the moon, or as mundane as meeting our basic daily
signaling between cells to electrically based neural
signaling was in the development of intelligent organisms.
In the same sense as the development of nervous systems,
and then ever more elaborate brains gave rise to
4th International Workshop on Interactive Content intelligence, the deployment of information technology
Consumption at TVX’16, enables an awakening and blossoming of human collective
June 22, 2016, Chicago, IL, USA intelligence.
Copyright is held by the author/owner(s).
In considering universal aspects of mind as they apply to
the human collective mind, we begin to appreciate their
connection with social and digital media. A full discussion
Figure 2: Panoramic Video Map
Somewhat like google street view, video is collected along paths, indoor and outdoor. The
paths are shown on the map, and a user may drag anywhere along the path to see those
from a given position. Tthe user can also “Go” to watch continuous motion along the path,
while freely choosing the view direction. Video can be collected by robot or person
carrying the special camera.
of this is off topic for this paper, but briefly, key aspects of Model Based: Computer graphics generated “mirror
mind include memory, perception, proprioception, emotion, worlds” that show what a place looks like - as far as the
cognition, intention for action, and facets of consciousness, model is correct, from nearly any view.
such as mechanisms for control of attention. Digital media,
Telerobotic: Live views from fixed or controllable
and its integration through shared models of the world (e.g.
cameras, or possibly on robotic devices - telerobots, drones,
digital maps and globes and our shared images of the earth)
UAV’s, etc.
can be thought of as supporting humanity’s collective
cognitive and emotional systems - of how we think about Human Guided: Streamed views, in which people carry
and understand ourselves and each other. Streaming media and stream from cameras, typically on their phone, or with
are a key element of our collective perception and wearable or assistive devices such as google glass, or
proprioception - how humans together perceive what is “Polly” described below.
happening on Earth.
Rather than questioning which of these means is better than
People now can increasingly view all kinds of events and the others, we consider them as complementary and
places all over the planet. The ways people see remote synergistic and having some overlap. For example, live
events and places can be placed in several categories, which video feeds can be embedded into a “mirror world” to help
roughly are: give the spatial context of the views, and can also be
recorded as a supplement to the stored collection of images
Image Based: Recorded images played back as images,
for image based viewing. Stored images can also be used
video, possibly panoramic video, possibly along
to help in the generation and updating of virtual models
constrained paths, or possibly using image based rendering.
used for. The models of are helpful in planning live events
This approach also includes lightfields and lumigraphs.
and in coordinating during those events.
We are undertaking a project called WorldViews, with the
goal of better supporting the use of all these methods for
viewing the world One aspect of this is a prototype tool we the startup Mapillary, which collects images taken by
call TeleViewer, which seeks to integrate these methods smartphone’s during drives.
and providing a single portal through which to access them.
A second is what we call the ‘coodination layer’ which is Panoramic cameras, indexed images from moving, or any
the means of using social media not only to push views, but other methods for collecting large numbers of images which
to request or organize produced views. are in some sense comprehensive or encyclopedic, could be
viewed very generally as sampling of a sort of “God’s-eye-
PREVIOUS AND RELATED WORK view from everywhere” - the so called plenoptic function.
In this section we describe examples of each of the types of [21] In its full glory, this is a seven dimensional function of
viewers described above. Although we give other position, direction, time, and wavelength, giving the
examples, for reasons of familiarity, most of the examples intensity of illumination for any color viewed from given
will be of projects carried out at FXPAL. position in the given direction at the given time. While it is
not possible to exactly determine the plenoptic function,
Image Based Views many methods for trying to comprehensively capture, and
Many projects at FXPAL and elsewhere have shaped our support rich views, may be thought of as sampling of the
views about live video streaming. For the FlyCam project plenoptic function. [3] Lightfields, and Lumigraphs are
in 2000 we built a panoramic video camera that could be advanced methods for giving a richer sampling and
used to live stream or record. [5] We investigated various playback of the plenoptic function. [7,8]
usage scenarios, such as meetings, recording of events, etc.
but found that one of the more interesting uses, inspired by Another project that influenced our thinking about event
the “Aspen Movie Map” [4], was “spatial capture” in which streaming was FlySPEC which investigated the live
the camera is moved around along interesting paths such as streaming and recording of presentations and meetings in a
roads or paths through a garden or park etc. Using this room with 4 special cameras, each combining a Panoramic
system, called FlyAbout, viewers could then later “virtually view, with a PTZ camera to gather close-ups. [9] The
move” along those paths and get a sense of moving through system used a hybrid human/automatic control system
the space. [6] This essentially provided a means by which designed to work sensibly in the case of no human
people could make a virtual visit to a space, as well as direction, direction from one person, or multiple inputs. The
capturing images that could be used as a backdrop for system used the low resolution panoramas and a model of
seeing an event. Figure 2 shows a web based video map of the information value of various possible close-ups - to
the FXPAL research center using such a map. Google street choose views. If one person made a view request it would
view has done something similar, on an impressively be honored, by moving the camera accordingly. If many
comprehensive scale, although it still does not provide people made view requests, it would try to best satisfy the
video giving a smooth sense of motion through the space. overall requests in terms of the information value model.
Current efforts to crowdsource the collection of these kinds
of comprehensive imaging from around the world include
Figure 3: Mirror World showing FXPAL and ACM Multimedia conference in Nara
A Mirror World viewer running in the Google Earth web plugin. The left shows a view of
the FXPAL research center, with colors indicating which rooms are occupied, billboards
above rooms showing information about offices, etc. The left view is of the ACM
Multimedia conference in Nara Japan, with billboards showing session information.
Model Based Views – Mirror Worlds what was happening in our lab, including seeing live views
We worked for several years at FXPAL on a system with from several cameras, views of presentation screen content
many cameras around our lab, that could track movement of updated live during presentations, and images on some
people, but also provide views of locations where most whiteboards updated when pictures of them were taken with
activity was happening. [10] Although this was conceived a special smartphone app. The user could then “fly over to
partly as a possible surveillance system, we also considered Yokohama” and get a similar view there, or to the
how it might be used at venues such as conference centers. conference venue model to see how that looked. The tool
We combined the system with 3D modeling of the could be used by prospective attendees and conference
location, to create viewers that helped users understand the planners beforehand for planning. For example they could
spatial layout and camera placement. [11] But also we built see the models of the rooms presentations would take place
viewers we called ‘Magic Mirrors’ that showed the 3D in, and also could see in Google Earth nearby temples.
model of the location, with the camera views, on large Although the Nara venue did not have advanced camera,
displays. These displays could be locally at locations screen capture, and display facilities, we anticipate that
around our space (e.g. we had one in the lobby, one in a many venues in the future will, as well as having telerobots
main conference room, and one in our kitchen) that gave for remote participants.
people awareness of the other locations, but also could be
placed at remote locations such as our sister lab in Japan.
Telerobotic Views:
Having the local displays of the views provided by the
Increasingly telepresence robots are used at many locations
cameras made the space feel less “creepy” but did not fully
to make those locations available to remote viewers. Some
address reciprocity when remote people were watching.
of these are private for enterprise use, but many are starting
That issue is addressed by in socially mediated systems
to be available at public locations like museums. [16] Also,
discussed below.
drones are increasingly used to provide views of many
locations. For example the sites travelbydrone.com and
The culmination of our work combining multiple cameras
travelwithdrone.com each provide thousands of curated
with 3D models, was our “Mirror World” project, in which
videos showing scenery from all over the world, with
we had detailed models of several locations, including our
scatter maps showing the locations where videos are
own lab, our sister lab in Yokohama, and a conference
available. Makers of popular consumer drones, like DJI
venue in Nara, Japan. (Figure 3.) These models could all
and Parrot provide sites for sharing videos taken by
be viewed in a web browser using the Google Earth plugin,
customers, and periscope recently announced they will
with Javascript API. Using this system, someone could see
Figure 4: Polly Hangout interface for remote participants
Remote users controlled Polly through a google hangout and app. Here there were 3
remote participants, and two Polly devices, which could be controlled using a GUI on
the right. A map show positions of the guide is also shown on the right.
partner with DJI to allow streaming of drone video through and single remote viewer, but also tried a few events, such
periscope. as a tour of the Stanford University campus, with multiple
Human Provided and Socially Mediated Views
The projects describe above mostly address the three
aspects of rich image capture, use of models, controllable
live streams, but except for some of the telepresence robots,
don’t address some of the more important social issues.
These include the human involvement by local participants
as guides for providing views to remote participants, or
reciprocity of view, so that local people would not feel
‘spied on’ and remote people would not feel ‘disembodied’.
The Polly project takes a different approach to providing
remote views of places or events, following ideas from the Figure 5: Polly telepresence device. Shoulder worn
tele-actor project [13] of letting a person with a camera act metaphorical “parrot on shoulder” provides telepresence
on a behalf of remote participants. A `guide’ carries a to remote person.
device, which remote viewers may connect to and control.
The metaphor is of a “parrot on the shoulder” serving as an remote participants joined together in a google hangout,
avatar for the remote participant. The device includes a which also could have more than one Polly available. A
remotely controllable camera, with a stabilization gimbal hangout app allowed anyone in the hangout to control any
that reduces annoyance of shaking, but also provides a Polly. (Figure 4.)
measure of view independence for the remote participant.
[14] Contention was controlled in a simple manner and did not
appear to be a problem with say 5 people in the hangout.
We implemented various Polly prototypes, but most of our When a Polly device was available for control, any user
investigation was with versions that used a smartphone for could perform UI actions to remotely control it. The
the camera, and with the phone display providing a view of interface would then indicate to everyone that Polly was
the remote participant. (Figure 5.) This allows social under that viewer’s control, and only that viewer would be
interaction with the guide, or with other people present able to control it. After a timeout of a few seconds without
locally. We mostly looked at scenarios with a single guide any control actions, that Polly became available again to
anyone wishing to control it. Meanwhile, everyone could
Figure 6: Coordination Layer: Google Earth as One Possible Coordination Map. Many
layers can be listed showing different kinds of source views (e.g. drone, robot, human
carried, etc.) as well as positions of guides, events, questions, etc.
see the views from the camera or whichever Polly they participant, in taking high resolutions pictures on the
chose to watch. A map also showed the position on a map smartphone. Either the guide or remote viewer may ‘click
of the Polly devices and guides. We consider this a the shutter’ to take the picture, which is then automatically
prototype of what could be called a “coordination map” for uploaded and made available at full precision. The system
much larger events, or access to guides and views. uses a version of the jumpch.at WebRTC app for streaming,
modified to allow full resolution pictures to be taken at the
The Polly prototype was interesting in that it gave remote same time.
users some autonomy, and some local presence through the
display, but was bulky, unwieldy and fragile. We expect
that over the next few years, much better Polly like devices ISSUES, OPPORTUNITIES & RESEARCH QUESTIONS:
will be build, with varying capabilities. The Parrot Bebop Rise of a New Role: The Virtual Guide
drone for example, could form the basis for a lightweight Just as blogging provided a new and greatly expanded
Polly that pushes the parrot metaphor further by not only channel for writers and other creative individuals to share
sitting on the shoulder - but by flying under remote control. their content, cultivate a following, and take on the new role
That drone has a large field of view 14 megapixel camera of “blogger”, the new streaming technologies will enable a
with the capability for digital image stabilization in real new role of “virtual guide.” People with the right mix of
time, eliminating the weight and fragility of a mechanical personality, knowledge, access to interesting locations or
gimbal. events, may thrive as “local experts” or “guides.” They
may choose their own schedule, streaming at their whim, or
However, we felt the greatest opportunity for short term at high value locations and events. But others will be “on
contribution to the advancement of ‘guided telepresence’ call” by running Apps that notify them when remote people
scenarios is not in building the more sophisticated kinds of have questions or want a quick view “Who can show me
devices we anticipate, but actually in choosing simpler the Golden Gate bridge now, and chat with me a bit about
scenarios using devices widely available. A good example it?”
of this kind of approach is the Virtual Photo Walk project
[15], which pairs up photographers on photo shoots at An interesting question is how the economics for guides
interesting locations or events, with remote participants, in will work out. We anticipate that there will be cases of
many cases people with limited mobility (such as through people making money as virtual guides - indeed it may even
disabilities) not able to visit many places. The rigs used are become a full time profession for some. Others will do it
simply a DSLR camera, together with an iPhone running a for fun or to build up reputation, or motivated by a sense of
hangout app. The remote participants communicate with service, such as for cultural sharing or to provide
the participants about what they would like to see, which experiences to disadvantaged individuals who would not
views for pictures they would like, etc. They cannot otherwise not have access. Mechanisms like earning hearts
directly control views or take pictures, but communicate in periscope support the non-monetary and reputation
these intentions through the hangout. aspects of motivation. We anticipate tourism scenarios
whereby people make money as independent guides for
This inspired another project, SharedCam, which lets a visiting tourists, but serve as virtual guides as a way of
local guide with a smartphone pair up with a remote getting business.
Figure 7: Coordination Layer: This shows a coordination map as it might be seen zoomed in to a small area.
WorldViews Coordination Layer
As people embark on personal adventures they will be able A key component of a good event streaming system, or of
to use this technology to bring people along with them. the overall ecosystem of various tools and people involved
They will have the opportunity to monetize their own in live streaming, is a coordination mechanism for seeing
content by using the videos on crowdfunding sites to show what events are happening, which views are available, what
people what they are doing and why they need help with views or information are requested, understanding where
funding. For example, the OpenROV project set up a site views are taken from, etc. This could be presented in a
called OpenExplorer to let people list planned explorations variety of ways, but one natural way is as layers and
and seek followers and support. [19] In conjunction with markings on a 2D maps or 3D virtual globes. These can
blogging and other social media, adventurers will be able to show were prospective guides are located, automatic or
allow people to have an immersive experience almost as if remotely controllable views are available, and what people
they are on the adventure with them. This will allow the want to see or learn about. It also supports scheduling, and
adventurer to engage followers in an exciting way and may a market, so that people can make arrangements with
encourage their followers to help with funding, so they can guides. Furthermore the statistics at this level, which may
see more of this content. An adventurer may even be able to include “likes”, “hearts”, number of followers of a stream,
have their own site where they allow a pay per view like etc. contribute to the attention focus mechanism by which
experience where the follower pays for access to be on the large numbers of remote participants may be directed to the
adventure in a virtual way. Professional adventure guides most interesting or relevant content. This layer
may also take advantage of this technology by enabling coordination layer can also be used to show which video
them to sell hybrid guided expeditions with in person devices, particularly remotely controllable devices such as
guiding and virtual guiding. When people get more funding telerobots, drones or UAV’s are available, or to help people
more content will become available and more people may schedule access for such devices. Overall this layer can act
be inspired to go on their own adventures. as a sort of “Uber for views” where views are provided by
humans, robots or some hybrid systems such as Polly.
WorldViews TeleViewer The Production Value Chain
The state of the ecosystem for accessing views of remote As we have said, the production values for video available
parts of the world, is reminiscent of the state of the internet live during an event is typically lower than what is available
before the development of the web. A variety of after a period of post-production. One reason is bandwidth
information sources were accessible by various tools such or connection limitations during the event. It is generally
as ftp, gopher, etc., but a single simple to use access tool easy to capture the event with many high resolution
was lacking, as was an integrative framework for easily cameras that record locally, such as onto SD cards in the
interconnecting sources in useful ways. The advent of cameras, even in remote areas with little or no connectivity.
HTML, together with open source browsers and HTTP More typically some level of connection is possible, but not
servers, quickly changed that. The situation today seems enough for HD video, or for more immersive media such as
similar. Digital earths such as google earth, provide a “geo panoramic 4K video. Another issue is that during a large
browser” which is a good start. But what is needed are event, with many video channels, only a small fraction of
good open source equivalents, and standards for integration the overall video may be interesting, and editing possible
of the sources into those tools, and a convergence of after the fact can produce a much higher value summary.
methods for accumulating and sharing layers of GIS data.
Of course, one typically makes best efforts to provide
WorldViews is dedicated to a more unified access to all the highest value at each stage, but an interesting question is
methods described above for seeing the world. We are whether the remote participants can provide input before
developing a prototype browser called TeleViewer built the event, or during the event, which have impact on the
upon Cesium, based on WebGL. [20] Information layers final higher value video, and which give the participants a
corresponding all kinds of views may be added or greater sense of involvement or ownership of the event.
subscribed to, and appear in the browser, or as placed links. For example, could some involvement at low fidelity during
Figure 1, shows a view of the whole world, with a set of the event, in which input is solicited about what to see and
layer choices on the left. These layers correspond to where to go, combined with a later viewing at higher
sources such as geotagged videos associated with various quality or greater immersion (e.g. on a head mounted
topics, drone videos, travel blogs, live sources such as display), give an overall stronger sense of having been at
periscope & meerkat, access to ShareCam apps, etc. There the event?
is also a layer corresponding to dynamic queries for geo
tagged videos matching given kewords.
CONCLUSION
We believe it is time for a convergence and integration of
the methods available for people to “virtually explore” the
earth, see other places and connect with other people. This
integration is a further step in the evolution of human humanistic perspective. Is it really helpful to humanity to
collective intelligence, particularly of our “perceptual” and create this kind of system, or is it just the next level of
“proprioception” whereby we see and share our views of distraction to come along? We believe it is of value to
what is happening in our world. A step in that direction is a humanity not only as a crucial element of an emerging
provided by open source Viewers, such as TeleViewer, that global collective mind, but also as a means to connect
integrate sources through one tool, and that allow for people around the world, and to promote shared
discussion and bidirectional requests for views. We believe understanding and solidarity as global citizens. We
it is essential that the core technologies for this new anticipate ever greater ease with which people will be able
capability exist in open source form, and that a core level of to see and understand other cultures, other parts of the
content belong to the creative commons. world, and connect through what might be called virtual
tourism. We believe the connections formed in this way,
After working for many years on the kinds of technologies which in many cases will become “real life” friendships,
described in this paper - mostly guided by a technology will play a crucial role in creating a more beautiful world.
perspective - we have now shifted towards a more
12. Kimber, D., Shingu J., Vaughan, J., Arendash, D., Lee,
REFERENCES
1. Teilhard de Chardin, P. (1959). The phenomenon of D., Back, M., Uchihashi, S. (2012) Through the
man (Vol. 19). London: Collins. Looking-Glass: Mirror Worlds for Augmented
Awareness & Capability, Proceedings of the twentieth
2. Barlow, J. P. (1996). A Declaration of the Independence ACM international conference on Multimedia. ACM.
of Cyberspace.
13. Goldberg, K. Y., Song, D., Khor, Y.-N., Pescovitz, D.,
3. Zhang, C. (2004) On sampling of image based rendering Levandowski, A., Himmelstein, J. C., Shih, J., Ho, A.,
data. Ph.D. Thesis, Carnegie Mellon University, Paulos, E., and Donath, J. S. (2002) Collaborative online
Pittsburg, PA. teleoperation with spatial dynamic voting and a human
4. Lippman, A. (1980). Movie-maps: An application of the ”tele-actor”. In ICRA, IEEE (2002), 1179–1184.
optical videodisc to computer graphics. ACM 14. Kimber, D., Proppe, P., Kratz, S., Vaughan. J., Liew. B.,
SIGGRAPH Computer Graphics. ACM. Severns. D., Su, W., (2014) Polly: Telepresence from a
5. Foote, J., & Kimber, D. (2000). Flycam: Practical Guide’s Shoulder, Computer Vision - ECCV 2014
panoramic video and automatic camera control. Workshops, Volume 8927 of the series Lecture Notes in
Multimedia and Expo, 2000. ICME 2000. Computer Science pp 509-523
6. Kimber, D., Foote, J., & Lertsithichai, S. (2001). 15.Virtual Photo Walks. Organization founded by John
Flyabout: spatially indexed panoramic video. Butterill. http://www.virtualphotowalks.org/
Proceedings of the ninth ACM international conference 16.Virtual Museum tours
on Multimedia. ACM. http://www.nma.gov.au/engage-learn/robot-tours
7. Gortler, S. J., Grzeszczuk, R., Szeliski, R., & Cohen, M. http://futureofmuseums.blogspot.com/2014/05/exploring-
F. (1996). The lumigraph. Proceedings of the 23rd robots-for-accessibility-in.html
annual conference on Computer graphics and interactive 17.Drone Touring sites
techniques. ACM. http://travelbydrone.com/
8. Levoy, M. (2006). Light fields and computational http://travelwithdrone.com/
imaging. Computer, (8), 46-55. http://www.droneworldvideo.com/
http://www.airpano.com/google_map.php
9. Liu, Q., Kimber, D., Foote, J., Wilcox, L., & Boreczky,
J. (2002). FLYSPEC: A multi-user video camera system 18.Pierce, Dennis. “Google Seeks Teachers to Pilot 3D
with hybrid human and automatic control. Proceedings Virtual Field Trips”, The Journal, 28 Sept. 2015. Web. 22
of the tenth ACM international conference on May 2016
Multimedia. ACM. https://thejournal.com/articles/2015/09/28/google-seeks-
teachers-to-pilot-3d-virtual-field-trips.aspx
10. Girgensohn, A., Kimber, D., Vaughan, J., Yang, T.,
Shipman, F., Turner, T., et al. (2007). DOTS: support 19.OpenExplorer, https://openexplorer.com/home
for effective video surveillance. Proceedings of the 15th 20.Cesium, “An open-source JavaScript library for world-
international conference on Multimedia. ACM. class 3D globes and maps” https://cesiumjs.org
11. Rieffel, E. G., Girgensohn, A., Kimber, D., Chen, T., & 21.Adelson, E. and Bergen. J. The plenoptic function and
Liu, Q. (2007). Geometric tools for multicamera the elements of early vision. in “Computational Models of
surveillance systems, ICDSC'07, ACM/IEEE Visual Processing” Landy and Movshon (eds), (pp. 3-
International Conference on Distributed Smart Cameras. 20). Cambridge, MA: MIT Press (1991).