Towards Declarative 3D in Web Architecture Jean Le Feuvre Telecom ParisTech; Institut Telecom; CNRS LTCI 46, rue Barrault 75634 PARIS CEDEX 13 jean.lefeuvre@telecom-paristech.fr ABSTRACT approaches can already be seen in most systems, using XML or The recent WebGL integration in major web browser has open the JSON parsing with XMLHttpRequest [3]. This paper does not way to many 3D applications as well as high-level libraries aim at describing the different solutions already available [4] for targeting 3D content developers. While most of these libraries integrated Web and 3D, nor to start yet another discussion on provide solid grounds for interoperable 3D on web browsers, one declarative versus imperative approaches: each solution has its might wonder if their use could not be simplified both in terms of pros and cons, but each might be needed depending on the processing overhead and 3D description syntax; looking beyond application requirements. This paper will therefore attempt to these issues, if there is room for a declarative 3D language for focus on requirements that would make a browser-native web architecture, its features should be well defined to ensure its declarative 3D support more appropriated than existing JS-based success. In this paper, we review some use cases, some existing solutions. technologies and some drawback of existing tools in order to As part of its research work on scene description technologies, derive some requirement for the upcoming declarative 3D Telecom ParisTech multimedia lab has developed GPAC [5], an language for the HTML ecosystem. open-source multimedia player. The research topics cover mainly 2D scene descriptions such as SVG or BIFS; it also covers some Categories and Subject Descriptors 3D aspects, through VRML based technologies such as X3D or H.5.2 [INFORMATION INTERFACES AND BIFS. One specific topic of this work was on integrating these PRESENTATION]: User Interfaces – Graphical user interfaces different scene representation technologies within a single (GUI), Standardization, Windowing Systems. graphics engine and mixing them in one multimedia presentation. This work was demonstrated in [6]. The purpose of this paper is to share some of the experience acquired during the development of General Terms this hybrid 2D/3D renderer, along with some more requirements Standardization, Languages. derived from academic work related to this topic. These requirements are intended to be generic and uncorrelated with Keywords final syntax and future design choices such as handling of Declarative, 3D, mixed 2D and 3D, WebGL, Stereoscopic animations or usage of CSS. Displays. This paper is organized as follows: in Section 2, we will briefly advocate for declarative 3D versus existing tools, and draft a first 1. INTRODUCTION set of basic requirements. In Section 3, we will investigate some Over the last twenty years, a growing number of technologies for specific requirements around the topic of mixed 2D and 3D; in describing, animating and controlling 3D objects or 3D worlds Section 4, we briefly investigate some aspects of multi-view have appeared, and sometimes disappeared. Whether imperative rendering for auto-stereoscopic displays and derive some or declarative, most of these technologies have had success in requirements for the upcoming Declarative 3D task. Section 5 some market areas, but it is hard to identify the "big winner": the finally concludes this paper. one technology to be used in any business environments. With the growing importance of the Web architecture as an underlying 2. Advocating for Declarative 3D platform for many applications and market places, enabling 3D on the web has become a major part of recent web developments. 2.1 On scene graph The most noticeable 3D "newcomer" in the web is with no doubt Virtual worlds are usually complex 3D environments with a large WebGL [1], enabling web browsers a fast yet simple access to the number of independent objects presented together on the screen. device's GPU through the OpenGL ES 2.0 API [2]. Many Whether each object is made of a single data structure (or node) or interesting projects have been launched around this powerful API, of a collection of structures, all 3D engines manipulate the using imperative approaches through JavaScript (JS), like the collection of objects as a graph representing the scene to display, promising GLGE, SceneJS, Three.js or PhiloGL. Declarative or scene graph in the usual terminology. This graph describes the approaches have also surfaced; we can cite X3DOM, an X3D relationship between objects, with more or less details. The basic implementation in JavaScript, or XML3D, a JS implementation of level will be description of spatial relationships (transformation a 3D scene graph closely related to web concepts of HTML and matrices), but complex scene graphs may also include CSS. It is worth notifying that even imperative approaches, such interactivity relationship (scripting), temporal relationships as game engines, usually require some declarative way of (animations), physics relationships (collision, material expressing the 3D models or levels design, and declarative elasticity...). Obviously, the more information a scene graph provides on the objects in the scene, the more complex and time consuming the rendering of the scene may grow. Scene graph is Copyright © 2012 for the individual papers by the papers' an important part of the interactive application logics, as it is authors. Copying permitted only for private and academic usually the place where all software optimizations are done, such purposes. This volume is published and copyrighted by its editors. Dec3D2012 workshop at WWW2012, Lyon, France as matrix stack handling, object picking, partial traversal of the the WebGLBuffer from the model and reference it in a shader graph... Benchmarks done in [7] show that existing JS 3D engines program. have hard times competing with a native scene graph and OpenGL implementation, but we should however keep in mind that the JS 2.3 On WebGL and DEC3D libraries tested are generic purposes libraries, rather than on- As stated previously, it is likely that relying only on a declarative purpose designed ones. This is maybe one of the most challenging scene graph may not suits the designer needs, for example when areas for DEC3D: obviously, declarative 3D implies usage of a some default rendering algorithm in the DEC3D language cannot scene graph, however it shall have clear advantages over JS ones, be easily expressed in a declarative way (dynamic shader whether JSON or XML or binary, in order to be attractive to design...). In the same way that OpenGL ES moved from a hard- application designer. Indeed, a native scene graph is "frozen", and wired graphical pipeline interface to a programmable-only GPU the implementation is not in the hands of the developer: if some control, we believe that DEC3D should take into account the design of this scene graph does not suit his needs, he will likely possibilities of unthought-of use cases and provide WebGL move to a script-base approach. One way to avoid such situation fallback to the developer; this will ensure a future-proof, flexible is to ensure modularity of the scene graph design. The focus of design and will encourage authors to use the language. This can this paper is not to dig into the specific features supported by the be expressed by the following requirements: scene graph, as existing standards such as X3D already cover a REQ5: DEC3D shall allow an author to use only some native broad set of common features for DEC3D. It should however be functionalities of the scene graph, for example object picking, noted that DEC3D is intended for integration with Web while overriding other functionalities with WebGL code, for technologies, and as such could use a CSS-oriented design for example drawing; styling, transformations and script-less animations such as SVG animations. Such features should typically be made configurable REQ6: DEC3D shall allow an author to use some native in the scene tree to optimize rendering routines, discarding for functionalities of the scene graph in parts of the scene tree while example the CSS inheritance phase or the animation module. using custom behavior in other parts through WebGL callbacks. From this remark, the following requirements are derived: REQ1: DEC3D scene graph shall be modular; in particular, it 3. Integration of 2D and 3D shall allow an author to turn off unneeded features from the graph One thrilling aspect of DEC3D is its usage in scenarios where 2D itself during the traversing of the scene tree (e.g. lighting, color (HTML, SVG) and 3D (DEC3D, WebGL) objects are used at the transformations, collision detections, animations...), while still same time, and communicating with each other. When designing allowing for dynamic modifications of desired features, an integrated renderer for SVG/BIFS/X3D, we have faced some issues that DEC3D could be confronted to, which are detailed in REQ2: DEC3D scene graph shall be extensible, in particular it this section. shall allow an author to design its own nodes; this should be done either programmatically or through Proto/XBL concepts. 3.1 Rendering Contexts for 2D and 3D Integrating 2D and 3D descriptions in an HTML scene can seem 2.2 On 3D Models straightforward at first glance, but raises the same design issues as While WebGL provides direct and fast access to the GPU, most integration of SVG in HTML: HTML is a flow-layout scene existing WebGL frameworks need to handle the objects they are description based on relative positioning of blocs or boxes, and is rendering by themselves. This includes object geometry not well suited to host absolute positioning languages in its flow. (polygons, triangles sets/fans/...), appearance (material and The usual approach to solve this problem is to define a rendering texture), positions (camera and model transformations), lighting region, similar to canvas, where the hosted language paints itself. and other graphical effects (shadows, particle systems...). Once This is for example the case when integrating SVG in HTML, one these properties are assigned to an object, rendering is achieved cannot simply insert an svg element in the flow, it has to through WebGL in near native speed. Most if not all these be inside an element assigning a local coordinate system properties are loaded and manipulated in JavaScript, which can and bounds for the drawing area, in order to perform the HTML cost time. The loading of this properties from a model description flow layout. Note that the bounds do not necessarily have to (OBJ, Collada...) relies, when done in JS, on XHR [3] for text- define a clipping area, e.g. the hosted content could be drawn based description (JSON, XML...), and additionally ByteArray outside this area. This approach is very similar to the canvas objects for binary-coded models such as MPEG-4 3DMC ones approach, where the size of the canvas region is exactly defined in [8]. Reaching high performances with such JS APIs remains terms of CSS dimensions so that flow layout can happen. challenging, as shown in [9], JS increasing the load time of very complex models as used in CAO or medical applications. Having REQ7: DEC3D shall support drawing of 3D shapes and scene a native support for model importing will drastically reduce elements within the HTML flow layout, and shall not enforce the loading times of many models; such a feature should however entire scene management to be in a 3D context. retain compatibility with pure WebGL imperative programming, On the other hand, some applications may wish to be full-window in order to respect the specific needs of the application developer. or full-screen 3D application, with no HTML layout above the 3D We can therefore derive the following requirements: part. This is typical in games and virtual worlds, but other use REQ3: DEC3D shall support native loading of various model cases may require this. types, either textual or binaries, from any local or remote REQ8: DEC3D shall support using the entire HTML window as location; an appropriated MIME type should identify model its 3D rendering area formats, 3.2 Events and Coordinate System REQ4: DEC3D shall define ways for natively loaded models to be The major inconvenience when handling a document mixing 2D used in a WebGL environment; for example, API/ID to retrieve and 3D content is the event system. The event system defines how user events (mouse, keyboard, HMI devices), network events or this by using the HTML Canvas object in 2D mode for texture other notification events are handled in a scene graph. creation, then passing the texture to GPU through WebGL's Unfortunately, each standard has its own way of defining its own glTexImage2D. Note however that drawing web content into a 2D event system, and most of the time these are not compatible. canvas is not allowed in most browsers, hence not yet VRML/X3D uses types events following the node field data types, interoperable. and ROUTE mechanism to copy events from their source to any destination desired; events are generated by dedicated UI sensor nodes, such as TouchSensor or ProximitySensor. SVG and HTML use the DOM Event model, where events are generated with no explicit sensor but rather "appear" at any visible/geometry node and bubble up the scene graph from this node to the root node. These events are not typed in terms of XML data types, but have an IDL definition allowing manipulation of these events in script. Without scripting, interactivity is much more limited. In order to allow a simple design of the application mixing 2D and 3D content, we can add the following requirement: REQ9: DEC3D shall use the DOM event model in order to cohabit with SVG or HTML applications. Note that this requirement does not exclude usage of existing VRML/X3D sensors such as ProximitySensor or SphereSensor, but will rather transform them into grouping nodes catching Figure 1 - Integration of SVG menu, X3D model and MPEG-4 simple mouse or keyboard events and firing new, 3D-specific events if desired. There are endless possibilities with the ability to transform part of a sub-tree into a texture usable in 2D or 3D contexts, especially Another issue faced with 2D/3D integration is the handling of for non-linear transformations. Having a declarative mean to coordinate systems. By default, most 2D languages use a raster- define such textures / offscreen rendering areas feel quite aligned coordinate system, with the origin (0,0) at the top-left of intuitive, as using WebGL and JS to implement such simple data the canvas and the Y-axis going downwards; on the opposite, transfers to GPU texture units seems quite an overhead. It should most 3D languages use a 3D Cartesian coordinate system, with be noted that such features are present in MPEG-4 BIFS, through the origin (0,0) at the center of the canvas, the Y-axis going CompositeTexture nodes, as shown in Figure 1. These elements upwards. While the handling of such differences is annoying for also allow for interactions and react to mouse and keyboards the implementation (Y scaling and translations happening all over events. Existing layering elements such as HTML
or inner the place), it becomes even trickier for the application designer. could be a base for such a design. DOM-based 2D scene representations do not expose hit coordinate at hit point. On the opposite, 3D scene representation REQ12: DEC3D shall have support for simple definition of events usually carry much more information than screen and offscreen rendering areas for 2D or 3D DOM content, and reuse client coordinate. Getting hit point coordinate in 3D space is a of these areas as 3D textures or 2D patterns in SVG. basic use case, and getting the value of the normal or the texture REQ13: DEC3D shall have support for offscreen rendering of coordinate at the hit point is also common when dealing with part of the DOM tree, with support for DOM events in these sub interactive textures. Scripting approaches such as getScreenCTM trees. in SVG are clearly not sufficient to compute these details, as they would require computing in JS mouse ray and shape intersection 4. 3D Displays to compute this data, and insert flip/translation matrix when The past few years have seen the regain of interest for 3D switching between DEC3D and SVG. In order to simplify entertainment using the human binocular vision system. 3D handling of clicking on / picking of shapes in an application displays are becoming more and more widespread, whether for mixing 2D and 3D, a unified system for retrieving hit coordinates mobile devices (phones, portable gaming devices) or for home in the local coordinate system: entertainment (TV, picture frames...). REQ10: DEC3D shall use a coordinate system for events aligned The current focus of the industry is to achieve interoperable with DOM Event coordinate system and provide a simple way of playback of video on these devices, through a various set of accessing pointing device coordinates in the local coordinate standards ranging from frame packing in AVC (two views in side- system. by-side or top-and-bottom packed in one frame), to more modern approaches combining video and depth / disparity maps, in order REQ11: DEC3D shall have support for hit point coordinate, to generate multiple views from arbitrary viewer positions, as texture coordinate and normal value at hit in a DOM Event shown in Figure 2. compatible way; this feature should only be enabled when advanced interaction is required. The next logical step will be to achieve interoperability of applications using such displays, for any possible characteristics 3.3 Offscreen Rendering of the display such as the number of views available or the One of the most compelling use case for 2D and 3D integration is optimal viewing distance. We therefore think the following getting more and more widespread in window manager of various requirement is reasonable: operating systems through the terminology "Compositing" or REQ14: DEC3D shall have support for 3D displays and auto- "Composite Desktop": being able to use the output of any stereoscopic interactive services. application as a texture for another application. WebGL allows for 4.2 Virtual Camera Calibration One important aspect of rendering for 3D displays is that the depth effect may not be exactly tied to the perspective settings of the 3D environment, and authors may decide to center a dedicated object on the screen plane, or before or behind the screen, without changing virtual camera settings. In other words, an author may decide to change the vergence point of the different cameras used during multiview rendering passes. Moreover, most of existing 3D languages do not take into account camera parameters for multiview rendering, such as camera displacement between views (circular, linear, off-axis), which may be modified by an author Figure 2 - Five-views synthesis from video and depth depending on its application type. This leads to non- interoperability between implementations. If DEC3D includes 4.1 Depth for 2D support for multi-view displays, it must therefore fulfill the Existing 2D scene descriptions such as HTML or SVG usually following requirement: work with fixed z-order in the scene tree, which can be altered REQ18: DEC3D shall be able to define the camera parameters through scripting mechanism by removing objects and inserting used during multi-view generation, such as for vergence point them back at the desired layer. These languages typically follow (screen plane) location or camera displacement type. the painter algorithm when drawing their shapes, and do not take into account any depth information: the nodes are drawn in the order they are found in the scene tree. While this model is fine in 5. Conclusion In this paper, we have exposed our views on some aspects a 2D space, it is no longer appropriated when designing interfaces for 3D displays, where depth (or z) is an inherent dimension of the declarative 3D language for web architecture should cover. More service, as are horizontal and vertical positions. On the other specifically, we have reviewed some of the difficulties hand, defining a complete 3D rendering context for the sole encountered during the development of a mixed 2D and 3D purpose of displaying an HTML button with a depth effect (screen multimedia renderer. We have also exposed some limitations in pop-out, back and forth bouncing at the screen surface) seems existing declarative technologies when designing content for auto- stereoscopic displays. Based on this analysis, we have derived quite an overhead for the author. This situation will only get some requirements for such a language and hope to contribute, in worse if a 2D area with a depth effect also has a 2D sub-area with the near future, to DEC3D activity, both in terms of requirements another depth effect. Simple extensions such as depth / z offset and developments. and scaling for 2D objects will be sufficient for most effects, but more powerful tools such as CSS 3D transforms could also help here. 6. ACKNOWLEDGMENTS Part of this work has been financed by the French-funded ANR REQ15: DEC3D shall support simple ways of assigning a depth project CALDER. or z value to a 2D HTML or SVG area; depth values shall be cumulated in a hierarchical way, as are regular 3D transformation matrices. 7. REFERENCES [1] WebGL, http://www.khronos.org/webgl/ Another interesting feature in the years to come will be the ability of the device hardware to use depth-image along with texture data [2] OpenGL ES 2.0, http://www.khronos.org/registry/gles/ to generate image-dependent viewpoints. Depth-data handling [3] XHR, http://www.w3.org/TR/XMLHttpRequest/ also make its ways into UI systems with devices such as the Microsoft Kinect, and it won't be long until TV are equipped with [4] http://www.khronos.org/webgl/wiki/User_Contributions such cameras. This naturally leads to believe that introducing [5] Le Feuvre, J., Concolato, C., and Moissinac, J. 2007. GPAC: DIBR (Depth-Image Base Rendering) into DEC3D seems an open source multimedia framework. In Proceedings of the interesting path. 15th international Conference on Multimedia (Augsburg, Germany, September 25 - 29, 2007). MULTIMEDIA '07. [6] Concolato, C. and Le Feuvre, J. 2008. Playback of mixed multimedia document. In Proceeding of the Eighth ACM Figure 3 - Synthesizing depth map from SVG gradients Symposium on Document Engineering (Sao Paulo, Brazil, As explained in [10], we believe that using SVG or canvas 2D to September 16 - 19, 2008). DocEng '08. ACM, New York, generate texture data that could be used as depth data for other 2D NY, 219-220. DOI= objects, through simple component transfer rules as shown in http://doi.acm.org/10.1145/1410140.1410185 Figure 3, is a powerful way of authoring transition effects for 3D [7] http://granular.cs.umu.se/browserphysics/?p=7 displays. [8] B Jovanova, M Preda, and F Preteux, “Mpeg-4 part 25: A REQ16: DEC3D shall have support for Depth-Image Based graphics compression framework for xml-based scene graph Representation, in order to allow for multiple view generation of formats,” Signal Processing: Image Communication, vol. 24, 2D objects or areas in the content. pp. 101–114, 2009. REQ17: DEC3D shall be able to generate synthetic depth maps [9] http://blog.n01se.net/?p=248 from the different graphical primitives in the content, whether 2D [10] http://svgopen.org/2010/papers/54- or 3D, and whether defined in DEC3D or external to its SVG_Extensions_for_3D_displays/ namespace.