Towards Declarative 3D in Web Architecture
                                                          Jean Le Feuvre
                                        Telecom ParisTech; Institut Telecom; CNRS LTCI
                                             46, rue Barrault 75634 PARIS CEDEX 13
                                            jean.lefeuvre@telecom-paristech.fr

ABSTRACT                                                              approaches can already be seen in most systems, using XML or
The recent WebGL integration in major web browser has open the        JSON parsing with XMLHttpRequest [3]. This paper does not
way to many 3D applications as well as high-level libraries           aim at describing the different solutions already available [4] for
targeting 3D content developers. While most of these libraries        integrated Web and 3D, nor to start yet another discussion on
provide solid grounds for interoperable 3D on web browsers, one       declarative versus imperative approaches: each solution has its
might wonder if their use could not be simplified both in terms of    pros and cons, but each might be needed depending on the
processing overhead and 3D description syntax; looking beyond         application requirements. This paper will therefore attempt to
these issues, if there is room for a declarative 3D language for      focus on requirements that would make a browser-native
web architecture, its features should be well defined to ensure its   declarative 3D support more appropriated than existing JS-based
success. In this paper, we review some use cases, some existing       solutions.
technologies and some drawback of existing tools in order to          As part of its research work on scene description technologies,
derive some requirement for the upcoming declarative 3D               Telecom ParisTech multimedia lab has developed GPAC [5], an
language for the HTML ecosystem.                                      open-source multimedia player. The research topics cover mainly
                                                                      2D scene descriptions such as SVG or BIFS; it also covers some
Categories and Subject Descriptors                                    3D aspects, through VRML based technologies such as X3D or
H.5.2       [INFORMATION            INTERFACES       AND              BIFS. One specific topic of this work was on integrating these
PRESENTATION]: User Interfaces – Graphical user interfaces            different scene representation technologies within a single
(GUI), Standardization, Windowing Systems.                            graphics engine and mixing them in one multimedia presentation.
                                                                      This work was demonstrated in [6]. The purpose of this paper is to
                                                                      share some of the experience acquired during the development of
General Terms                                                         this hybrid 2D/3D renderer, along with some more requirements
Standardization, Languages.                                           derived from academic work related to this topic. These
                                                                      requirements are intended to be generic and uncorrelated with
Keywords                                                              final syntax and future design choices such as handling of
Declarative, 3D, mixed 2D and 3D, WebGL, Stereoscopic                 animations or usage of CSS.
Displays.                                                             This paper is organized as follows: in Section 2, we will briefly
                                                                      advocate for declarative 3D versus existing tools, and draft a first
1. INTRODUCTION                                                       set of basic requirements. In Section 3, we will investigate some
Over the last twenty years, a growing number of technologies for      specific requirements around the topic of mixed 2D and 3D; in
describing, animating and controlling 3D objects or 3D worlds         Section 4, we briefly investigate some aspects of multi-view
have appeared, and sometimes disappeared. Whether imperative          rendering for auto-stereoscopic displays and derive some
or declarative, most of these technologies have had success in        requirements for the upcoming Declarative 3D task. Section 5
some market areas, but it is hard to identify the "big winner": the   finally concludes this paper.
one technology to be used in any business environments. With the
growing importance of the Web architecture as an underlying           2. Advocating for Declarative 3D
platform for many applications and market places, enabling 3D on
the web has become a major part of recent web developments.           2.1 On scene graph
The most noticeable 3D "newcomer" in the web is with no doubt         Virtual worlds are usually complex 3D environments with a large
WebGL [1], enabling web browsers a fast yet simple access to the      number of independent objects presented together on the screen.
device's GPU through the OpenGL ES 2.0 API [2]. Many                  Whether each object is made of a single data structure (or node) or
interesting projects have been launched around this powerful API,     of a collection of structures, all 3D engines manipulate the
using imperative approaches through JavaScript (JS), like the         collection of objects as a graph representing the scene to display,
promising GLGE, SceneJS, Three.js or PhiloGL. Declarative             or scene graph in the usual terminology. This graph describes the
approaches have also surfaced; we can cite X3DOM, an X3D              relationship between objects, with more or less details. The basic
implementation in JavaScript, or XML3D, a JS implementation of        level will be description of spatial relationships (transformation
a 3D scene graph closely related to web concepts of HTML and          matrices), but complex scene graphs may also include
CSS. It is worth notifying that even imperative approaches, such      interactivity relationship (scripting), temporal relationships
as game engines, usually require some declarative way of              (animations),     physics    relationships   (collision,   material
expressing the 3D models or levels design, and declarative            elasticity...). Obviously, the more information a scene graph
                                                                      provides on the objects in the scene, the more complex and time
                                                                      consuming the rendering of the scene may grow. Scene graph is
 Copyright © 2012 for the individual papers by the papers'            an important part of the interactive application logics, as it is
 authors. Copying permitted only for private and academic             usually the place where all software optimizations are done, such
 purposes. This volume is published and copyrighted by its
 editors.

 Dec3D2012 workshop at WWW2012, Lyon, France
as matrix stack handling, object picking, partial traversal of the     the WebGLBuffer from the model and reference it in a shader
graph... Benchmarks done in [7] show that existing JS 3D engines       program.
have hard times competing with a native scene graph and OpenGL
implementation, but we should however keep in mind that the JS         2.3 On WebGL and DEC3D
libraries tested are generic purposes libraries, rather than on-       As stated previously, it is likely that relying only on a declarative
purpose designed ones. This is maybe one of the most challenging       scene graph may not suits the designer needs, for example when
areas for DEC3D: obviously, declarative 3D implies usage of a          some default rendering algorithm in the DEC3D language cannot
scene graph, however it shall have clear advantages over JS ones,      be easily expressed in a declarative way (dynamic shader
whether JSON or XML or binary, in order to be attractive to            design...). In the same way that OpenGL ES moved from a hard-
application designer. Indeed, a native scene graph is "frozen", and    wired graphical pipeline interface to a programmable-only GPU
the implementation is not in the hands of the developer: if some       control, we believe that DEC3D should take into account the
design of this scene graph does not suit his needs, he will likely     possibilities of unthought-of use cases and provide WebGL
move to a script-base approach. One way to avoid such situation        fallback to the developer; this will ensure a future-proof, flexible
is to ensure modularity of the scene graph design. The focus of        design and will encourage authors to use the language. This can
this paper is not to dig into the specific features supported by the   be expressed by the following requirements:
scene graph, as existing standards such as X3D already cover a
                                                                       REQ5: DEC3D shall allow an author to use only some native
broad set of common features for DEC3D. It should however be
                                                                       functionalities of the scene graph, for example object picking,
noted that DEC3D is intended for integration with Web
                                                                       while overriding other functionalities with WebGL code, for
technologies, and as such could use a CSS-oriented design for
                                                                       example drawing;
styling, transformations and script-less animations such as SVG
animations. Such features should typically be made configurable        REQ6: DEC3D shall allow an author to use some native
in the scene tree to optimize rendering routines, discarding for       functionalities of the scene graph in parts of the scene tree while
example the CSS inheritance phase or the animation module.             using custom behavior in other parts through WebGL callbacks.
From this remark, the following requirements are derived:
REQ1: DEC3D scene graph shall be modular; in particular, it            3. Integration of 2D and 3D
shall allow an author to turn off unneeded features from the graph     One thrilling aspect of DEC3D is its usage in scenarios where 2D
itself during the traversing of the scene tree (e.g. lighting, color   (HTML, SVG) and 3D (DEC3D, WebGL) objects are used at the
transformations, collision detections, animations...), while still     same time, and communicating with each other. When designing
allowing for dynamic modifications of desired features,                an integrated renderer for SVG/BIFS/X3D, we have faced some
                                                                       issues that DEC3D could be confronted to, which are detailed in
REQ2: DEC3D scene graph shall be extensible, in particular it          this section.
shall allow an author to design its own nodes; this should be done
either programmatically or through Proto/XBL concepts.                 3.1 Rendering Contexts for 2D and 3D
                                                                       Integrating 2D and 3D descriptions in an HTML scene can seem
2.2 On 3D Models                                                       straightforward at first glance, but raises the same design issues as
While WebGL provides direct and fast access to the GPU, most           integration of SVG in HTML: HTML is a flow-layout scene
existing WebGL frameworks need to handle the objects they are          description based on relative positioning of blocs or boxes, and is
rendering by themselves. This includes object geometry                 not well suited to host absolute positioning languages in its flow.
(polygons, triangles sets/fans/...), appearance (material and          The usual approach to solve this problem is to define a rendering
texture), positions (camera and model transformations), lighting       region, similar to canvas, where the hosted language paints itself.
and other graphical effects (shadows, particle systems...). Once       This is for example the case when integrating SVG in HTML, one
these properties are assigned to an object, rendering is achieved      cannot simply insert an svg <circle> element in the flow, it has to
through WebGL in near native speed. Most if not all these              be inside an <svg> element assigning a local coordinate system
properties are loaded and manipulated in JavaScript, which can         and bounds for the drawing area, in order to perform the HTML
cost time. The loading of this properties from a model description     flow layout. Note that the bounds do not necessarily have to
(OBJ, Collada...) relies, when done in JS, on XHR [3] for text-        define a clipping area, e.g. the hosted content could be drawn
based description (JSON, XML...), and additionally ByteArray           outside this area. This approach is very similar to the canvas
objects for binary-coded models such as MPEG-4 3DMC ones               approach, where the size of the canvas region is exactly defined in
[8]. Reaching high performances with such JS APIs remains              terms of CSS dimensions so that flow layout can happen.
challenging, as shown in [9], JS increasing the load time of very
complex models as used in CAO or medical applications. Having          REQ7: DEC3D shall support drawing of 3D shapes and scene
a native support for model importing will drastically reduce           elements within the HTML flow layout, and shall not enforce the
loading times of many models; such a feature should however            entire scene management to be in a 3D context.
retain compatibility with pure WebGL imperative programming,           On the other hand, some applications may wish to be full-window
in order to respect the specific needs of the application developer.   or full-screen 3D application, with no HTML layout above the 3D
We can therefore derive the following requirements:                    part. This is typical in games and virtual worlds, but other use
REQ3: DEC3D shall support native loading of various model              cases may require this.
types, either textual or binaries, from any local or remote            REQ8: DEC3D shall support using the entire HTML window as
location; an appropriated MIME type should identify model              its 3D rendering area
formats,
                                                                       3.2 Events and Coordinate System
REQ4: DEC3D shall define ways for natively loaded models to be         The major inconvenience when handling a document mixing 2D
used in a WebGL environment; for example, API/ID to retrieve           and 3D content is the event system. The event system defines how
user events (mouse, keyboard, HMI devices), network events or         this by using the HTML Canvas object in 2D mode for texture
other notification events are handled in a scene graph.               creation, then passing the texture to GPU through WebGL's
Unfortunately, each standard has its own way of defining its own      glTexImage2D. Note however that drawing web content into a 2D
event system, and most of the time these are not compatible.          canvas is not allowed in most browsers, hence not yet
VRML/X3D uses types events following the node field data types,       interoperable.
and ROUTE mechanism to copy events from their source to any
destination desired; events are generated by dedicated UI sensor
nodes, such as TouchSensor or ProximitySensor. SVG and HTML
use the DOM Event model, where events are generated with no
explicit sensor but rather "appear" at any visible/geometry node
and bubble up the scene graph from this node to the root node.
These events are not typed in terms of XML data types, but have
an IDL definition allowing manipulation of these events in script.
Without scripting, interactivity is much more limited. In order to
allow a simple design of the application mixing 2D and 3D
content, we can add the following requirement:
REQ9: DEC3D shall use the DOM event model in order to
cohabit with SVG or HTML applications.
Note that this requirement does not exclude usage of existing
VRML/X3D sensors such as ProximitySensor or SphereSensor,
but will rather transform them into grouping nodes catching           Figure 1 - Integration of SVG menu, X3D model and MPEG-4
simple mouse or keyboard events and firing new, 3D-specific
events if desired.                                                    There are endless possibilities with the ability to transform part of
                                                                      a sub-tree into a texture usable in 2D or 3D contexts, especially
Another issue faced with 2D/3D integration is the handling of         for non-linear transformations. Having a declarative mean to
coordinate systems. By default, most 2D languages use a raster-       define such textures / offscreen rendering areas feel quite
aligned coordinate system, with the origin (0,0) at the top-left of   intuitive, as using WebGL and JS to implement such simple data
the canvas and the Y-axis going downwards; on the opposite,           transfers to GPU texture units seems quite an overhead. It should
most 3D languages use a 3D Cartesian coordinate system, with          be noted that such features are present in MPEG-4 BIFS, through
the origin (0,0) at the center of the canvas, the Y-axis going        CompositeTexture nodes, as shown in Figure 1. These elements
upwards. While the handling of such differences is annoying for       also allow for interactions and react to mouse and keyboards
the implementation (Y scaling and translations happening all over     events. Existing layering elements such as HTML <div> or inner
the place), it becomes even trickier for the application designer.    <svg> could be a base for such a design.
DOM-based 2D scene representations do not expose hit
coordinate at hit point. On the opposite, 3D scene representation     REQ12: DEC3D shall have support for simple definition of
events usually carry much more information than screen and            offscreen rendering areas for 2D or 3D DOM content, and reuse
client coordinate. Getting hit point coordinate in 3D space is a      of these areas as 3D textures or 2D patterns in SVG.
basic use case, and getting the value of the normal or the texture    REQ13: DEC3D shall have support for offscreen rendering of
coordinate at the hit point is also common when dealing with          part of the DOM tree, with support for DOM events in these sub
interactive textures. Scripting approaches such as getScreenCTM       trees.
in SVG are clearly not sufficient to compute these details, as they
would require computing in JS mouse ray and shape intersection        4. 3D Displays
to compute this data, and insert flip/translation matrix when         The past few years have seen the regain of interest for 3D
switching between DEC3D and SVG. In order to simplify                 entertainment using the human binocular vision system. 3D
handling of clicking on / picking of shapes in an application         displays are becoming more and more widespread, whether for
mixing 2D and 3D, a unified system for retrieving hit coordinates     mobile devices (phones, portable gaming devices) or for home
in the local coordinate system:                                       entertainment (TV, picture frames...).
REQ10: DEC3D shall use a coordinate system for events aligned         The current focus of the industry is to achieve interoperable
with DOM Event coordinate system and provide a simple way of          playback of video on these devices, through a various set of
accessing pointing device coordinates in the local coordinate         standards ranging from frame packing in AVC (two views in side-
system.                                                               by-side or top-and-bottom packed in one frame), to more modern
                                                                      approaches combining video and depth / disparity maps, in order
REQ11: DEC3D shall have support for hit point coordinate,
                                                                      to generate multiple views from arbitrary viewer positions, as
texture coordinate and normal value at hit in a DOM Event
                                                                      shown in Figure 2.
compatible way; this feature should only be enabled when
advanced interaction is required.                                     The next logical step will be to achieve interoperability of
                                                                      applications using such displays, for any possible characteristics
3.3 Offscreen Rendering                                               of the display such as the number of views available or the
One of the most compelling use case for 2D and 3D integration is      optimal viewing distance. We therefore think the following
getting more and more widespread in window manager of various         requirement is reasonable:
operating systems through the terminology "Compositing" or            REQ14: DEC3D shall have support for 3D displays and auto-
"Composite Desktop": being able to use the output of any              stereoscopic interactive services.
application as a texture for another application. WebGL allows for
                                                                       4.2 Virtual Camera Calibration
                                                                       One important aspect of rendering for 3D displays is that the
                                                                       depth effect may not be exactly tied to the perspective settings of
                                                                       the 3D environment, and authors may decide to center a dedicated
                                                                       object on the screen plane, or before or behind the screen, without
                                                                       changing virtual camera settings. In other words, an author may
                                                                       decide to change the vergence point of the different cameras used
                                                                       during multiview rendering passes. Moreover, most of existing 3D
                                                                       languages do not take into account camera parameters for
                                                                       multiview rendering, such as camera displacement between views
                                                                       (circular, linear, off-axis), which may be modified by an author
     Figure 2 - Five-views synthesis from video and depth              depending on its application type. This leads to non-
                                                                       interoperability between implementations. If DEC3D includes
4.1 Depth for 2D                                                       support for multi-view displays, it must therefore fulfill the
Existing 2D scene descriptions such as HTML or SVG usually             following requirement:
work with fixed z-order in the scene tree, which can be altered        REQ18: DEC3D shall be able to define the camera parameters
through scripting mechanism by removing objects and inserting          used during multi-view generation, such as for vergence point
them back at the desired layer. These languages typically follow       (screen plane) location or camera displacement type.
the painter algorithm when drawing their shapes, and do not take
into account any depth information: the nodes are drawn in the
order they are found in the scene tree. While this model is fine in
                                                                       5. Conclusion
                                                                       In this paper, we have exposed our views on some aspects a
2D space, it is no longer appropriated when designing interfaces
for 3D displays, where depth (or z) is an inherent dimension of the    declarative 3D language for web architecture should cover. More
service, as are horizontal and vertical positions. On the other        specifically, we have reviewed some of the difficulties
hand, defining a complete 3D rendering context for the sole            encountered during the development of a mixed 2D and 3D
purpose of displaying an HTML button with a depth effect (screen       multimedia renderer. We have also exposed some limitations in
pop-out, back and forth bouncing at the screen surface) seems          existing declarative technologies when designing content for auto-
                                                                       stereoscopic displays. Based on this analysis, we have derived
quite an overhead for the author. This situation will only get
                                                                       some requirements for such a language and hope to contribute, in
worse if a 2D area with a depth effect also has a 2D sub-area with
                                                                       the near future, to DEC3D activity, both in terms of requirements
another depth effect. Simple extensions such as depth / z offset
                                                                       and developments.
and scaling for 2D objects will be sufficient for most effects, but
more powerful tools such as CSS 3D transforms could also help
here.                                                                  6. ACKNOWLEDGMENTS
                                                                       Part of this work has been financed by the French-funded ANR
REQ15: DEC3D shall support simple ways of assigning a depth
                                                                       project CALDER.
or z value to a 2D HTML or SVG area; depth values shall be
cumulated in a hierarchical way, as are regular 3D
transformation matrices.                                               7. REFERENCES
                                                                       [1] WebGL, http://www.khronos.org/webgl/
Another interesting feature in the years to come will be the ability
of the device hardware to use depth-image along with texture data      [2] OpenGL ES 2.0, http://www.khronos.org/registry/gles/
to generate image-dependent viewpoints. Depth-data handling            [3] XHR, http://www.w3.org/TR/XMLHttpRequest/
also make its ways into UI systems with devices such as the
Microsoft Kinect, and it won't be long until TV are equipped with      [4] http://www.khronos.org/webgl/wiki/User_Contributions
such cameras. This naturally leads to believe that introducing         [5] Le Feuvre, J., Concolato, C., and Moissinac, J. 2007. GPAC:
DIBR (Depth-Image Base Rendering) into DEC3D seems an                      open source multimedia framework. In Proceedings of the
interesting path.                                                          15th international Conference on Multimedia (Augsburg,
                                                                           Germany, September 25 - 29, 2007). MULTIMEDIA '07.
                                                                       [6] Concolato, C. and Le Feuvre, J. 2008. Playback of mixed
                                                                           multimedia document. In Proceeding of the Eighth ACM
   Figure 3 - Synthesizing depth map from SVG gradients                    Symposium on Document Engineering (Sao Paulo, Brazil,
As explained in [10], we believe that using SVG or canvas 2D to            September 16 - 19, 2008). DocEng '08. ACM, New York,
generate texture data that could be used as depth data for other 2D        NY, 219-220. DOI=
objects, through simple component transfer rules as shown in               http://doi.acm.org/10.1145/1410140.1410185
Figure 3, is a powerful way of authoring transition effects for 3D     [7] http://granular.cs.umu.se/browserphysics/?p=7
displays.
                                                                       [8] B Jovanova, M Preda, and F Preteux, “Mpeg-4 part 25: A
REQ16: DEC3D shall have support for Depth-Image Based                      graphics compression framework for xml-based scene graph
Representation, in order to allow for multiple view generation of          formats,” Signal Processing: Image Communication, vol. 24,
2D objects or areas in the content.                                        pp. 101–114, 2009.
REQ17: DEC3D shall be able to generate synthetic depth maps            [9] http://blog.n01se.net/?p=248
from the different graphical primitives in the content, whether 2D
                                                                       [10] http://svgopen.org/2010/papers/54-
or 3D, and whether defined in DEC3D or external to its
                                                                            SVG_Extensions_for_3D_displays/
namespace.