Playing Around the Eye Tracker: A Serious Game Based Dataset Michael Riegler1 , Ragnhild Eg1 , Lilian Calvet1 , Mathias Lux2 , Pål Halvorsen1 , Carsten Griwodz1 1 Media Performance Group, Simula Research Laboratory & University of Oslo, Norway 2 Institute for Information Technology, University of Klagenfurt, Austria {michael, rage, paalh, lcalvet, griff}@simula.no, mlux@itec.aau.at Abstract This work applies crowdsourcing and gamifi- cation approaches to the study of human vi- sual perception and attention. With the pre- sented dataset, we wish to contribute raw data on the salience of image segments. The data collection takes place in the designed game, where players are tasked with guessing the content of a gradually uncovered image. Be- cause the image is uncovered tile-by-tile, the game mechanics allow us to collect informa- Figure 1: The standard difficulty mode of the game, tion on the image segments that are most im- shown as an example round with two incorrect at- portant to identifying the image content. The tempts (red buttons) and the final correct response dataset can be applied to both computer vi- (green button). Accumulated points and remaining sion and image retrieval algorithms, aiming to time are presented in the upper right corner. build on the current understanding of human visual perception and attention. Moreover, that can only be completed by humans. Crowdsourc- the end objective is to test the game as a po- ing is often applied to this kind of data collection, al- tential substitute to professional eye tracking leviating the burden of running experiments, but at systems. the same time introducing a few methodological con- cerns. Moving experiments and user studies out of 1 Introduction the restricted environment of a laboratory means sur- rendering control over the test situation. Fortunately, In the ongoing quest to understand how humans think, there are means to compensate for the lack of con- perceive, and behave, human computation and related trol. Crowdsourcing makes it far easier and less time- fields contribute with new methodologies and mod- consuming to collect data from a large number of peo- els that can shed more light on the complex work- ple, improving both the internal validity and the gener- ings of the human mind. Researchers make use of hu- alisability of results [KCS08, NR10]. Another concern man computation to train machines, such as in semi- is linked to the motivation of crowdsourcing workers supervised learning, but also to collect data on tasks and their willingness to adhere strictly to the task at hand. However, this concern can be addressed through Copyright c 2015 for the individual papers by the paper’s au- the design of the experimental task [KCS08]. Gami- thors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. fication is a fairly recent development in crowdsourc- In: F. Hopfgartner, G. Kazai, U. Kruschwitz, and M. Meder ing. Games with a purpose (GWAP) provide an en- (eds.): Proceedings of the GamifIR’15 Workshop, Vienna, Aus- tertaining arena for participants, aiming to enhance tria, 29-March-2015, published at http://ceur-ws.org player motivation and improve performance. Hence, through a well-designed GWAP, researchers retain the lems of computational complexity in order to repli- benefit of reaching out to a large pool of participants, cate the mechanisms of human perception. By build- while increasing the likelihood of obtaining more reli- ing such systems, researchers in this field aim to solve able data on problems only humans can tackle. With problems related to object recognition and scene inter- this approach, researchers have succeeded in turning pretation, as well as other related challenges. When annotation tasks into enjoyable activities [VAD08], addressing human visual attention, one term becomes along with a range of other repetitive tasks, ranging particularly prominent in both cognitive psychology from information retrieval to security and exertion is- and computer vision. In psychology, visual saliency sues [JKM+ 11, BMI14, PMRS14, MBB10]. Further- can be determined by the low-level features that affect more, some games have been designed to tap directly where people move their gaze, such as contrast, colour, into processes that involve human visual perception intensity, brightness, and spatial frequency [Ray09]. and attention. For instance, Peekaboom is a two- Similar definitions have been proposed by the com- player game where one player is asked to guess the puter vision community. Saliency, as defined by Borji content of the image that the other player is gradually and colleagues [BI13], “intuitively characterizes some revealing [VLB06]; another approach presents players parts of a scene— which could be objects or regions with a short video that is subsequently masked by a — that appear to an observer to stand out relative to character chart [RGSZM12]. In both games, the col- their neighboring parts.”. Humans are able to iden- lected data is used to shed light on where and what tify salient areas in their visual fields with surprising people will look at in an image or a video. speed and accuracy before performing actual recog- Our gamification approach is similarly motivated nition. This remains a critical task in computer vi- by questions on how people regard and recognise a de- sion. To assist in this endeavour, we wish to supply picted object of interest. Human visual attention is the multimedia and computer vision communities with typically studied using eye tracking paradigms, with a dataset that can be useful: equipment that can map the movements and fixations • As input data for machine learning algorithms of the pupil, for instance across a presentation on a aiming to detect salient objects/regions. screen (e.g., [HRM11]). Researchers have used this technology for decades to study how the eyes move • As input data for scalable image understanding during reading, and this body of work has established systems: feeding a few salient regions into thou- important insights on the processing of written infor- sands of object classifiers (e.g., [NKRP10]) with- mation [Ray09]. Eye tracking methods are also used out running thousands of expensive object detec- to explore where and how people look when taking tors across the entire image. in a scene [CMH09], when performing a visual search • To evaluate computational methods of salience task [BB09], or when looking at the face of a some- (such as [BJD+ , JEDT09]). one talking [BPM08]. Humans are in fact quite adept at recognising people, objects and animals, even with Our game is designed to gradually reveal parts of fairly degraded global features [McMM11]. Although an animal picture (although the game can easily be human visual perception is facilitated by higher-level adapted to other types of images) and the player’s cognitive mechanisms, such as prototypes stored in task is to identify the animal as quickly and as ac- long-term memory, the visual system relies on at- curately as possible. Because the various elements are tended low-level features that may be unique to a par- revealed in a random pattern, the game makes it pos- ticular animal or object [McMM11]. Furthermore, at- sible to analyse response patterns and explore which tention is easily captured by visually distinct or unex- regions are most vital to the recognition of the ani- pected elements within a scene [BB09]. The limited mal. Furthermore, the crowdsourcing arena enables number of studies into salient regions and features comprehensive data collection, securing sufficient data involved in the identification of objects and animals for the analyses of the separate images. could very well be connected to the time needed for In the provided dataset, we have collected image un- such an undertaking. Running dozens of individual veiling patterns and the related subjective responses. eye-tracking sessions with hundreds of images seems a Through the design of our crowdsourcing study we daunting task, not to mention an expensive one (see have created a novel single-player game that entertains for instance [MCBT06, JEDT09, BJD+ ]). With this in and engages participants, aiming to increase motiva- mind, we planned our serious game as a time-efficient tion and divert attention and awareness away from the and economical alternative to traditional eye-tracking underlying research question. Along with the game paradigms. and the stimulus material, we provide data from our Inspired by research on human vision and attention, first rounds of experimentation. With this material, computer vision scientists work to overcome the prob- we wish to: • Make the Mobile Picture Guess game publicly can easily be adapted to other platforms. We decided available as a low-threshold experiment set-up. to use the Google Play functionality to distribute the game to a large amount of players. • Provide an openly available dataset for investi- The back end server is an Apache web server3 gations into human visual attention and salient hosted by our lab. It also runs a MySQL server. image features. HTTP requests over a PHP based script are used for the communication between the server and the game. • Provide a dataset that can be compared with re- To provide maximal security for the player and the sults collected from an eye tracker, and in turn data, we employ several techniques to avoid SQL in- explore the feasibility of our approach as an alter- jection, along with strong data encryption. The server native to these costly systems. retrieves the data from the game in a JSON file format • Finalise our investigations by establishing salient and stores it in the MySQL database. To make this features for individual animal images, hopefully possible, the player has to be on-line while playing the building on the current understanding of human game. The information is stored after each image’s visual perception. revelation, in order to avoid its loss through cancelled games or interrupted internet connections. The planning, design, launch and analysis of our serious game progressed over several stages that we 2.3 Gameplay describe in the paper, beginning with the technical With the overall aim to collect perceptual informa- design and the data collection. We then include de- tion, the game task needs to capture the full attention tails on our dataset and outline the experiment we of the player, necessitating a single-player design. The conducted to highlight the application of the game in- game mechanics involve: a puzzle to be solved, play cluding a preliminary analysis. Finally we draw our against time, adaptive difficulty for skilled players, and concluding remarks. a scoring expressed by points. A player starts a new game with a contingent of time, and for each round 2 Data Collection and Game Design the player is presented with a new image to guess the 2.1 The Game content of. One completed game consists of as many rounds as the player can complete in the given amount The game, designed to entertain while collecting data, of time. At the beginning of each round, the image is is called Mobile Picture Guess [RELS14]. It involves completely obscured by black mosaics. These black a puzzle, a gradually revealed image, that must be tiles commence to disappear in a random pattern as solved before time runs out. The way to solve the time counts down, illustrated in Figure 1. Thus, the puzzle is to guess the content of the image by choosing image surface becomes gradually more visible as the the correct option out of the four presented; in the mosaic is lifted. The longer the round runs, the eas- current set-up, all images portray an animal, thus all ier it becomes to guess the image content. If players response options provide an animal name. Based on cannot complete the task before time runs out and the feedback from initial user tests, parts of the game were image is completely unveiled, they receive no points modified over multiple permutations. The end result and the game skips ahead to the next round. is a game that is fun to play and that gathers data In order to provide a response on the image con- without intrusion. tent, the player chooses one of the four alternatives presented as buttons on the right side of the screen. 2.2 Technical Details Three of the buttons display incorrect answers and one The full data collection system consists of two parts. of them holds the right one. The player clicks on the One part is the game running on Android devices, the option presumed to be correct, then receives immedi- other is the back end server solution. The main devel- ate feedback. Incorrect answers will turn the selected opment platform of the game is Java, using libGDX1 option red, whereas correct answers will yield a green and the Android library2 . LibGDX is an open source button (Figure 1). With the right answer provided, the framework for cross platform game development. It picture is fully revealed and the player receives points provides an easy way to create 2D interactive programs and additional seconds of playtime. The number of based on OpenGL on MacOS, Windows, Linux, An- points is based on how much of the picture remains droid, iOS and Web applications. While it is mainly concealed, and thus depends on the swiftness of the targeted at Android platforms, Mobile Picture Guess response. Furthermore, wrong answers result in loss of playtime and this loss increase steadily with repeated 1 http://libgdx.badlogicgames.com/ 2 http://developer.android.com/develop/index.html 3 http://www.apache.org/ incorrect responses. This accumulated loss penalise at- tempts at choosing all options rapidly without focus- ing on the image. We implemented this reward and penalty system in order to motivate players to play as quickly and as accurately as they could. To ease the task of learning the game rules, the game becomes more difficult over time. The easy mode at game start involves a high rate of tiles revealed, meaning that tiles disappear more quickly. At the successful completion of the initial rounds, the rate of uncovery goes down. Moreover, a transformation is applied to the picture to make it harder to guess the content, exemplified in Fig- ure 3. This transformation is completed by flipping the image 180 degrees, changing the colour randomly, or Figure 2: An example image from the mini-game in changing the colour to greyscale. The uncovery reduc- Mobile Picture Guess. The image on the left starts tion and the transformation are applied solely to make out as a black rectangle that is uncovered by sliding the game harder, and consequently more interesting a finger across the tiles, on the right is the remaining for players who may want to play additional rounds. time. For variation and less monotony, we also implemented a mini-game. The game is simple, but requires some dexterity from the player. The task is to reveal a con- cealed image by sliding a finger across the screen to remove the black squares, as portrayed in Figure 2. If the player succeeds in uncovering the entire image, they receive additional time for the main game. Thus, the mini-game serves two purposes. On one hand, it introduces a new task to distract the player from the potential repetitiveness of the main game, hopefully improving the quality of experience. On the other hand, it works as an aid to improve performance on the original task. The mini-game is presented after five image-guessing rounds; if it is successfully completed, three bonus seconds are added to the play-time. Please Figure 3: Screenshot of a round from the difficult game note that data from the mini game is not collected for mode, where less information is revealed when the tiles the dataset. The game continues until time runs out, are lifted. This mode is introduced further into the at which point the player is presented with their fi- game, after a player has successfully completed several nal score. The game then returns to the start screen, rounds. where the overall high-score is listed and the player received and enjoyable to play. We ran the HIT for can choose whether to play another game or to end one week; based on recommendations by Microworkers the session. we paid workers 0.80 Euros per HIT. In total we spent 100 Euros on the whole experiment, including the fee 2.4 Human Intelligence Task for the Microworkers platform. Additional informa- Because a new game in the app store is easily over- tion collected about the HIT and the games played looked, we also made use of crowdsourcing in our re- are presented in Table 1. Sadly, our game did not run cruitment of players. As crowdsourcing platform we properly on some of the older Android devices; this used Microworkers4 . In the HIT we asked the workers required us to investigate our dataset manually and to download the game and play it. We added a token exclude scores collected from these devices. However, system to make sure the workers dedicated both time the workers were not affected by this issue. and effort to the game, and each worker had to report two tokens per task. Initially, one token required 2000 3 Dataset Description game points, but because of the observed difficulty in The publicly available dataset contains 200 images, in reaching this mark, we reduced it to 1500 points. Feed- addition to the SQL database file with the collected back from workers suggest that the game was well- player information. An overview of all dataset compo- 4 https://microworkers.com/ nents is illustrated in Figure 4. ID Unique ID for the played game Version Game version Image Image file name Time added Time of data submission for the completed game IP address Encrypted and secure version of the player’s IP address Vote Detailed information about the game played (in JSON format) - Picture Picture name - Answer Correct image label - Name Unique device ID Figure 4: Overview of dataset components. - Matrix Number of tiles removed - Transformation Applied image transformation Completed games 13.861 Submitted HITs 111 Table 2: Description of database fields. Unique workers 302 which is further divided into five sub-fields. Details Unique players 352 about information stored in the respective fields are Average games played 5 included in Table 2. Average game-time 10 minutes Table 1: General statistics about the dataset. 4 Application of the Dataset Selected images. To create our image dataset, we With the outlined dataset, we aim to provide informa- first settled on a list of 124 animals so that each image tion about images and responses presented and col- could be easily distinguished and described by a single lected in a puzzle game. In the game itself, players are label, such as albatross, alligator, and alpaca. Next, tasked with guessing the content of an image that is we used these terms to query images on Flickr, col- gradually revealed. The saliency of each image region lecting images categorised as free to use or published corresponds to its importance in identifying the con- under a Creative Commons attribution license5 . We tent. Specifically, the saliency of an image segment is made sure to select visually appealing scenes by rank- determined by the number of times the tile was un- ing the queried images according to Flickr’s interest- covered prior to a correct response, aggregated across ingness score and then keeping the highest ranked 25 all players and divided by the number of times the images for each term. The resulting dataset, with more image was presented in a game. The saliency scores than 3000 images, was further reduced to 200 by re- can be mapped out across the images, yielding visual moving all manipulated photos and all images that did heatmaps. Examples of the aggregated heatmaps are not clearly display the animal of interest. For each im- presented in Figure 5, where the density of the colour age presentation, we added three random terms to the red is inversely related to the saliency of the tile. Ad- correct label, yielding the four response options. ditionally, the saliency value is provided in the lower Statistics. By releasing the game on Google Play left corner of each tile. Store, we could easily keep track of the application While the dataset can provide insights directly and the data collection. It also allowed us to derive from the salient regions, which we have explored statistics on the games played, these are summarised in [RELS14], it can also be a useful addition to several in Table 1. As noted before [RELS14], several workers related research areas. As mentioned, computer vi- continued to play without payment, after completing sion scientists could use the data to feed into learning their HITs. This provided us with additional data; algorithms for saliency detection or into image under- more importantly, it further established the entertain- standing systems, or the data can be applied in evalu- ment value of the game. ations of existing computational methods. Moreover, Database. All image metadata are stored in an the dataset can serve as a ground for comparison to SQL database, which we have made publicly avail- eye-tracking paradigms. This is also the next step in able for download at http://goo.gl/CL24aV. Along our studies into human perception of image regions. with the database we include the code to calculate re- gion saliency. The database file consists of six fields; 5 Conclusion the players’ responses are contained in the ’vote’ field, In this paper we have presented a dataset that can 5 A license text was overlaid Creative Commons images. be applied to (i) improve understanding of human vi- Figure 5: Example images with the aggregated experiment results, with overlaid saliency heatmaps. The red colour density is inversely related to saliency and the exact value is provided in the bottom left hand corner. The images depict a) a bison, b) a gerbil, c) a sparrow, d) another bison, e) a dolphin, and f) a lynx. sual perception and attention for image scenes, and (ii) References lead a new direction on how information can be col- lected more efficiently, providing an alternative to ex- [BB09] James R Brockmole and Walter R Boot. pensive and time-consuming eye-tracking studies. Fur- Should I stay or should I go? Attentional thermore, we have described the game design and the disengagement from visually unique and data collection and provided an overview of potential unexpected items at fixation. Journal of application areas. Experimental Psychology: Human Per- ception and Performance, 35(3):808–815, We plan to extend on this work by comparing the June 2009. saliency scores from the game with saliency data col- lected using eye-tracking techniques. Through this [BI13] Ali Borji and Laurent Itti. State- endeavour, we will be able to explore whether our of-the-art in visual attention modeling. method yields comparable results and can be used as IEEE Transactions on Pattern Analysis an alternative to traditional eye-tracking studies. Fur- and Machine Intelligence, 35(1):185–207, thermore, future works should include images that de- 2013. pict different types of scenes and objects, hence ex- tending the exisitng dataset. [BJD+ ] Zoya Bylinskii, Tilke Judd, Frédo Durand, Aude Oliva, and Antonio 6 Acknowledgements Torralba. MIT saliency benchmark. http://saliency.mit.edu/. This work is partly funded by the FRINATEK project ”EONS” (#231687) and the iAD Centre for Research- [BMI14] Markus Brenner, Navid Mirza, and based Innovation (#174867) by the Norwegian Re- Ebroul Izquierdo. People recognition us- search Council and the Lakeside Labs GmbH, Kla- ing gamified ambiguous feedback. In genfurt, Austria and funding from the European Re- Proceedings of the First International gional Development Fund and the Carinthian Eco- Workshop on Gamification for Informa- nomic Promotion Fund (KWF) under grant KWF- tion Retrieval, pages 22–26, Amsterdam, 20214/25557/37319. 2014. ACM. [BPM08] Julie N Buchan, Martin Paré, and tice deformation. PloS One, 6(7):e22831, Kevin G Munhall. The effect of varying January 2011. talker identity and listening conditions on gaze behavior during audiovisual speech [NKRP10] Vidhya Navalpakkam, Christof Koch, perception. Brain Research, 1242:162– Antonio Rangel, and Pietro Perona. 171, November 2008. Optimal reward harvesting in complex perceptual environments. Proceedings [CMH09] Monica S Castelhano, Michael L Mack, of the National Academy of Sciences, and John M Henderson. Viewing task in- 107(11):5232–5237, 2010. fluences eye movement control during ac- tive scene perception. Journal of Vision, [NR10] Stefanie Nowak and Stefan Rüger. How 9(3):1–15, 2009. reliable are annotations via crowdsourc- ing? A study about inter-annotator [HRM11] Falk Huettig, Joost Rommers, and An- agreement for multi-label image anno- tje S Meyer. Using the visual world tation. In MIR ’10 - Proceedings of paradigm to study language processing: the International Conference on Multi- A review and critical evaluation. Acta media Information Retrieval, pages 557– Psychologica, 137(2):151–171, June 2011. 566, Philadelphia, 2010. [JEDT09] Tilke Judd, Krista Ehinger, Frédo Du- [PMRS14] Dinesh Pothineni, Pratik Mishra, Aadil rand, and Antonio Torralba. Learning to Rasheed, and Deepak Sundararajan. In- predict where humans look. In IEEE In- centive design to mould online behav- ternational Conference on Computer Vi- ior: a game mechanics perspective. In sion (ICCV), pages 2106–2113, Kyoto, Proceedings of the First International 2009. Workshop on Gamification for Informa- tion Retrieval, pages 27–32, Amsterdam, [JKM+ 11] Craig Jordan, Matt Knapp, Dan 2014. ACM. Mitchell, Mark Claypool, and Kathi Fisler. Countermeasures: a game for [Ray09] Keith Rayner. Eye movements and at- teaching computer security. In Pro- tention in reading, scene perception, and ceedings of the 10th Annual Workshop visual search. Quarterly Journal of Ex- on Network and Systems Support for perimental Psychology, 62(8):1457–1506, Games, page 7, Ottawa, 2011. IEEE August 2009. Press. [RELS14] Michael Riegler, Ragnhild Eg, Mathias [KCS08] Aniket Kittur, Ed H Chi, and Bongwon Lux, and Makrus Schicho. Mobile pic- Suh. Crowdsourcing user studies with ture guess: A crowdsourced serious game mechanical turk. In Proceedings of the for simulating human perception. In Pro- SIGCHI Conference on Human Factors ceedings of the SoHuman Workshop 2014, in Computing Systems, pages 453–456, Barcelona, 2014. Springer. Florence, 2008. [RGSZM12] Dmitry Rudoy, Dan B Goldman, Eli [MBB10] Florian ”Floyd” Mueller and Nadia Shechtman, and Lihi Zelnik-Manor. Bianchi-Berthouze. Evaluating exertion Crowdsourcing gaze data collection. In games. Evaluating User Experience in Proceedings of the Conference on Collec- Games, pages 187–207, 2010. tive Intelligence, Cambridge, MA, 2012. [MCBT06] Olivier Le Meur, Patrick Le Cal- [VAD08] Luis Von Ahn and Laura Dabbish. De- let, Dominique Barba, and Dominique signing games with a purpose. Commu- Thoreau. A coherent computational ap- nications of the ACM, 51(8):58–67, 2008. proach to model bottom-up visual at- tention. IEEE Transactions on Pat- [VLB06] Luis Von Ahn, Ruoran Liu, and Manuel tern Analysis and Machine Intelligence, Blum. Peekaboom: A game for locat- 28(5):802–817, 2006. ing objects in images. In Proceedings of the SIGCHI Conference on Human Fac- [McMM11] Vasile V Moca, Ioana Ţincaş, Lucia Mel- tors in Computing Systems, pages 55–64, loni, and Raul C Mureşan. Visual ex- Montréal, 2006. ploration and object recognition by lat-