Back to MARS: The unexplored possibilities in query result visualization Alfredo Ferreira Pedro B. Pascoal Manuel J. Fonseca INESC-ID/IST/TU Lisbon INESC-ID/IST/TU Lisbon INESC-ID/IST/TU Lisbon Lisbon, Portugal Lisbon, Portugal Lisboa, Portugal alfredo.ferreira@ist.utl.pt pmbp@ist.utl.pt mjf@inesc-id.pt ABSTRACT on visual queries. However, most existing solutions still face A decade ago, Nakazato proposed 3D MARS, an immer- major drawbacks and challenges to be tackled. Among oth- sive virtual reality environment for content-based image re- ers, extensively identified in Datta’s survey [5], we high- trieval. Even so, the idea of taking advantage of post-WIMP light two. First, queries rely mostly on meta-information, interfaces for multimedia retrieval was no further explored often keyword-based. This means that, in a closer analysis, for content-based retrieval. Considering the latest low-cost, searches can be reduced to text information retrieval of mul- off-the-shelf hardware for visualization and interaction, we timedia objects. Second, the result visualization follows the believe that is time to explore immersive virtual environ- traditional paradigm, where the results are presented as a ments for multimedia retrieval. In this paper we highlight list of items on a screen. These items are usually thumbnails, the advantages of such approach, identifying possibilities but can be just filenames or metadata. Such methodology and challenges. Focusing on a specific field, we introduce greatly hinders the interpretation of query results on collec- a preliminary immersive virtual reality prototype for 3D ob- tions of videos or 3D objects. ject retrieval. However, the concepts behind this prototype can be easily extended to the other media. Notably, a decade ago, a new visualization system for content- based image retrieval(CBIR) was proposed by Nakazato and Categories and Subject Descriptors Huang from the University of Illinois. The 3DMARS [11] H.3.3 [Information Storage and Retrieval]: Information was an immersive virtual reality (VR) environment to per- Search and Retrieval; H.5.2 [Information Interfaces and form image retrieval. It worked on the NCSA CAVE [4] Presentation]: User Interfaces—Interaction Styles, Input which provided fully immersive experience and later on desk- Devices and Strategies top VR systems. However, despite this ground-breaking work and recent developments in the interaction domain, little advantages have been taken by the multimedia infor- Keywords mation retrieval community from immersive virtual environ- Multimedia Information Retrieval, 3D Object Retrieval, Im- ments. mersive Virtual Environment In this paper we bring up the work of Nakazato and Huang 1. INTRODUCTION as a starting point to the exploration of new possibilities Despite advances on multimedia information retrieval (MIR), for result visualization in multimedia information retrieval. this field still on its infancy. Especially when compared to With the spreading of stereoscopic viewing and last gener- its textual counterpart. Actual textual search engines are ation interaction devices outside lab environment and into maturely developed and its widespread use makes them fa- our everyday lives, we believe that in a short time users will miliar to most users. The current scenario in MIR is quite expect richer results from multimedia search engines than different. Indeed, existing content-based MIR solutions are just a list of thumbnails. Following this rationale, and de- far from being largely used by the common user. spite it could be applied to any type of media, we will focus our approach on 3D object retrieval (3DOR). A few exceptional systems were able to strive with relative success, such as Retrievr1 , a search tool for Flickr2 based 1 http://labs.systemone.at/retrievr/ 2. TRADITIONAL 3DOR APPROACHES 2 The first and most noticeable 3D search engine, at least http://www.flickr.com/ within researchers working on this area, is the Princeton 3D Model Search Engine[8]. This remarkable work provide content-based retrieval of 3D models from a collection of more than 36000 objects. Four query specification options are available: text based; by example; by 2D sketch; and by 3D sketch. The results of this queries are presented as an array of model thumbnails. Copyright c 2011 for the individual papers by the papers’ authors. Copy- ing permitted only for private and academic purposes. This volume is pub- lished and copyrighted by the editors of euroHCIR2011. Additionally to queries by example and sketch-based queries, the FOX-MIIRE search engine[1] introduced the query by photo. This was the first tool capable of retrieve a 3D model from a photograph of a similar object. However, and similarly to Princeton engine, the results are displayed as a thumbnail list. Outside the research field, Google 3D Warehouse3 of- fers a text-based search engine for the common user. This online repository contains a very large number of different models, from monuments to cars and furniture, humans and spaceships. However, searching for models in this collection is limited by textual queries or, when models represent real objects, by its georeference. On the other hand, the results are displayed by model images in a list, with the opportunity to manipulate a 3D view of a selected model. Generally, the query specification and visualization of results in commercial tools for 3D object retrieval, usually associ- ated with 3D model online selling sites, did not differ much Figure 1: The interface of 3D MARS. from those presented above. The query is specified through keywords or by example and results are presented as a list of model thumbnails. Generally, post-WIMP approaches abandoned the traditional mouse and keyboard combination, favouring devices with six These traditional approaches to query specification and re- degrees of freedom (DoF). Unlike traditional WIMP interac- sult visualization do not take advantage of latest advances tion style, where it is necessary to map the inputs from a 2D of neither computer graphics or interaction paradigms. Cur- interaction space to a 3D visualization space, six DoF de- rent hardware and software are capable of handling mil- vices allow straightforward direct mapping between device lions of triangles per frame and generating complex effects in movements and rotations and corresponding effects on the real-time. Additionally, the growingly common use of new three-dimensional space. This represents an huge leap to the human-computer interaction (HCI) paradigms and devices concept of direct manipulation, which, according to Shnei- brought new possibilities for multi-modal systems. derman [14], rapidly increments operations and allows the immediate visualization of effects on an manipulated object. This helps making the interaction more comprehensible, pre- 3. NEW PARADIGMS IN HCI dictable and controllable. The recent dissemination among common users of new HCI paradigms and devices (e.g. Nintendo Wiimote4 or Mi- Combining six DoF devices with stereoscopy, it is possible crosoft Kinect5 ) brought new possibilities for multi-modal to make a multi-modal immersive interaction with direct systems. For decades, the “windows, icons, menus, pointing and natural manipulation of objects shapes within virtual device” (WIMP) interaction style prevailed outside the re- environments. This may be experienced using immersive search field, while post-WIMP interfaces were being devised displays (e.g., HMDs, CAVEs) [7] or desktop [15]. and explored [16], but without major impact in everyday use of computer systems. Despite the growing interest around the application of this new paradigms in HCI, no relevant efforts were made to Particularly, the use of gestures to interact with system has explore the latest technological advances for multimedia in- been part of the interface scene since the very early days. A formation retrieval. Indeed, to the extent of our knowledge, pioneering multimodal application was “Put-that-there” [2], there has not been presented any research or new solution by Bolt. In “Put-that-there”, the user commands simple that take advantage of immersive virtual environments for shapes on a large-screen graphics display surface. This ap- information retrieval since Nakazato’s 3D MARS [11] . proach combined gestures and voice commands to interact with the system. However, just recently such interaction paradigm have been introduced in off-the-shelf commodity 4. 3D MARS products. The 3D MARS system demonstrates that the use of 3D vi- sualization in multimedia retrieval has two benefit. First, Recent technological advances allowed development of low- more content can be displayed at the same time without cost, lightweight, easy to use systems. With limited re- occluding one another. Second, by assigning different mean- sources, novel and more natural HCI can be developed and ings to each axis, the user can determine which features are explored. For instance, Lee [10] used a Wiimote and took ad- important as well as examine the query result with respect vantage of its high resolution infra-red camera to implement to three different criteria at the same time. multipoint interactive whiteboard, finger tracking and head tracking for desktop virtual reality displays. Post-WIMP Nakazato focused his work on query result visualization. finally arrived to the masses. Thus 3D MARS supports only query-by-example mechanism to specify the search. The user select one image from a list 3 http://sketchup.google.com/3dwarehouse/ and the system retrieves and displays the most similar im- 4 http://www.nintendo.com/wii/console/controllers ages from the image database in a 3D virtual space. The 5 http://www.xbox.com/en-US/kinect image location on this space is determined by its distance to the query image, where more similar images are closer to the origin of the space. The distance in each coordinate axis depend on a pre-defined set of features. The X-axis, Y-axis and Z-axis represent color, texture and structure of images respectively. The interaction with the query results is done through a wand that the user holds while freely walking around the CAVE, as depicted in Figure 1. By wearing shutter glasses, the user can see a stereoscopic view of the world, which provides a full immersive experience. In such solution, vi- sualizing query results goes far beyond scrolling on a list of thumbnails. The user navigates among the results in a three-dimensional space. The 3D MARS was a catalyst for the incitement proposed in this paper: explore immersive visualization systems for Figure 2: User exploring query results in Im-O-Ret multimedia information retrieval. Following that idea, we devised an immersive 3D virtual reality system for the dis- play of query results of queries for 3D object Retrieval. even more the visualization since the user gains depth per- ception over the environment. 5. IMMERSIVE 3DOR The combined use of VE and devices with six DoF, provides Taking advantage of the new paradigms in HCI, we pro- a more complete visualization and makes interaction more pose an immersive VR system for 3D object retrieval (Im- natural, comprehensible and predictable. Their use, will also O-Ret). The version of the system presented in this pa- add some challenges to the implementation of such system. per relies on a large-screen display, the LEMe Wall [6], and the a six DoF interaction device, the SpacePoint Fusion, an off-the-shelf device developed by PNI Sensor Corporation. 5.2 Challenges While in traditional 3DOR systems the query results are However, minimal effort is required in order to have the sys- represented and ordered as a list of thumbnails ordered by tem working in a context with HMD glasses or stereoscopic a given similarity measure, when we move to a virtual envi- glasses, as well as using other input devices, such as Wiimote ronment, the distribution of results in a 3D space becomes a or Kinect. challenge. How query results should arranged in 3D space to be meaningful to the user remains an open question. In our Regardless of the hardware details, the Im-ORet allows the approach we select three shape descriptors and assigned each user to browse the results of a query to collection of 3D ob- one to a coordinate axis, but this is a preliminary approach. jects in an immersive virtual environment. The objects are We believe that a final solution is more complex that this. distributed in the virtual 3D space according to their sim- Further investigation on this topic is clearly required. ilarity. This is measured by the distance of each result to the query, which stands in the origin of the coordinates. To On the other hand, the way users navigate and interact with each of the three axis is assigned a different shape matching objects in an immersive environment and interact with it algorithm. The similarity to the query returned by the cor- still an open issue. Norman[12] stated that gesturing is a responding algorithm determines the coordinate. Current natural, automatic behaviour, but the unintended interpre- version of Im-O-Ret uses the Lightfield Descriptors [3] on tations of gestures can create undesirable states. Having this the X-axis, the Coord and Angle Histogram [13] for the Y- in mind, it is important to aim for an interface that is both axis, the Spherical Harmonics Descriptor [9] for the Z-axis. predictable and easy to learn. Figure 2 illustrates a user browsing the results of a query. Above all, an important challenge remains open. No easy 5.1 Possibilities query specification mechanism has been presented, neither Similar to the 3D MARS, this work opens a myriad of new in traditional search engines, nor with new HCI paradigms. possibilities. By assigning different shape matching algo- Although sketch-based queries apparently provide good re- rithms to each axis, one can adapt the query mechanism to sults, they greatly depend on the ability of the user to draw a specific domains, producing more precise results. Applying 3D model, which hinders the goal of a widely used, content- transparency to results, it is possible to overlay results of based, 3D search engine. distinct queries. Adding effects to results, such as glow or special colors, it order to convey additional information. 6. CONCLUSIONS We believe that recent advances in low-cost, post-WIMP en- Since query results are not images or thumbnails, but three- abler technology, can be seen as an opportunity to overcome dimensional models, it is possible to navigate around them in some drawbacks of current multimedia information retrieval the virtual environment and even manipulate them. More- solutions. Combined with the dissemination of stereoscopic over, instead of a static view of the result, displaying it as a visualization as a commodity, these interaction paradigms 3D object that can be rotating over one axis, offers a better will acquaint common users with immersive virtual reality perception of the model. Adding stereoscopy will improve environments. In this paper we highlight that such scenario is a fertile P. Otto, V. Petrovic, K. Ponto, A. Prudhomme, ground to be explored by search engines for multimedia in- R. Rao, L. Renambot, D. Sandin, J. Schulze, L. Smarr, formation retrieval. In that context, we identified two major M. Srinivasan, P. Weber, and G. Wickham. The future research topics: query result visualization and query speci- of the cave. Central European Journal of Engineering, fication. While the latest requires further study, we already 1:16–37, 2011. 10.2478/s13531-010-0002-5. started tackling the first one. [8] T. Funkhouser, P. Min, M. Kazhdan, J. Chen, A. Halderman, D. Dobkin, and D. Jacobs. A search We developed a novel visualization approach for 3D object engine for 3d models. ACM Trans. Graph., 22:83–105, retrieval. The Im-O-Ret offers the users an immersive vir- January 2003. tual environment for browsing results of a query to a col- [9] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. lection of 3D objects. The query results are displayed as Rotation invariant spherical harmonic representation 3D models in a 3D space, instead of the traditional list of of 3d shape descriptors. In Proceedings of the 2003 thumbnails. The user can explore the results, navigating in Eurographics/ACM SIGGRAPH symposium on that space and directly manipulating the objects. Geometry processing, SGP ’03, pages 156–164, Aire-la-Ville, Switzerland, Switzerland, 2003. Looking back to 3D MARS, the initial work proposed by Eurographics Association. Nakazaro, we realize it was a valid idea that fell almost into [10] J. Lee. Hacking the nintendo wii remote. Pervasive obliviousness. We expect that our preliminary work, which Computing, IEEE, 7(3):39 –45, july-sept. 2008. lies over concepts introduced by 3D MARS, could prove the [11] T. S. H. Munehiro Nakazato. 3d mars: Immersive goodness of our incitement to explore the possibilities of- virtual reality for content-based image retrieval. In fered by immersive virtual environments to the multimedia Proceedings of 2001 IEEE International Conference on information retrieval. Multimedia and Expo (ICME2001), 2001. [12] D. A. Norman. Natural user interfaces are not natural. 7. ACKNOWLEDGMENTS interactions, 17:6–10, May 2010. The work described in this paper was partially supported [13] E. Paquet and M. Rioux. Nefertiti: a query by content by the Portuguese Foundation for Science and Technology software for three-dimensional models databases (FCT) through the project 3DORuS, reference PTDC/EIA- management. In NRC 97: Proceedings of the EIA/102930/2008 and by the INESC-ID multiannual fund- International Conference on Recent Advances in 3-D ing, through the PIDDAC Program funds. Digital Imaging and Modeling, page 345, Washington, DC, USA, 1997. IEEE Computer Society. 8. REFERENCES [14] B. Shneiderman. Direct manipulation for [1] T. F. Ansary, J.-P. Vandeborre, and M. Daoudi. comprehensible, predictable and controllable user 3d-model search engine from photos. In Proceedings of interfaces. In Proceedings of the 2nd international the 6th ACM international conference on Image and conference on Intelligent user interfaces, IUI ’97, video retrieval, CIVR ’07, pages 89–92, New York, pages 33–39, New York, NY, USA, 1997. ACM. NY, USA, 2007. ACM. [15] B. Sousa Santos, P. Dias, A. Pimentel, J.-W. [2] R. A. Bolt. Put-that-there: Voice and gesture at the Baggerman, C. Ferreira, S. Silva, and J. Madeira. graphics interface. In Proceedings of the 7th annual Head-mounted display versus desktop for 3d conference on Computer graphics and interactive navigation in virtual reality: a user study. Multimedia techniques, SIGGRAPH ’80, pages 262–270, New Tools Appl., 41:161–181, January 2009. York, NY, USA, 1980. ACM. [16] A. van Dam. Post-wimp user interfaces. Commun. [3] D.-Y. Chen, X.-P. Tian, Y. te Shen, and ACM, 40:63–67, February 1997. M. Ouhyoung. On visual similarity based 3d model retrieval. volume 22 of EUROGRAPHICS 2003 Proceedings, pages 223–232, 2003. [4] C. Cruz-Neira, D. J. Sandin, and T. A. DeFanti. Surround-screen projection-based virtual reality: the design and implementation of the cave. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’93, pages 135–142, New York, NY, USA, 1993. ACM. [5] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40:5:1–5:60, May 2008. [6] B. R. de AraÞjo, T. Guerreiro, R. J. Costa, J. A. P. Jorge, and J. M. Pereira. Leme wall: Desenvolvendo um sistema de multi-projecção. 13ž Encontro PortuguÃls de ComputaÃğÃčo GrÃafica, , Vila Real, Portugal, 2005. [7] T. DeFanti, D. Acevedo, R. Ainsworth, M. Brown, S. Cutchin, G. Dawe, K.-U. Doerr, A. Johnson, C. Knox, R. Kooima, F. Kuester, J. Leigh, L. Long,