EyeGrab: A Gaze-based Game with a Purpose to Enrich
                 Image Context Information
               Tina Walber                                  Chantal Neuhaus                               Ansgar Scherp
     University of Koblenz-Landau                      University of Koblenz-Landau                University of Koblenz-Landau
     Institute for Web Science and                     Institute for Web Science and               Institute for Web Science and
              Technologies                                      Technologies                                Technologies
           Koblenz, Germany                                  Koblenz, Germany                            Koblenz, Germany
   walber@uni-koblenz.de                             cneuhaus@uni-koblenz.de                        scherp@unikoblenz.de
ABSTRACT                                                                 The game is controlled by eye movements, which on the one hand
                                                                         enhances the user satisfaction and on the other hand allows for
We present EyeGrab, a game for image classification that is              collecting gaze information that will be analyzed to gain
controlled by the users' gaze. The players classify images               information about the image content. The overall goal of EyeGrab
according to their relevance for a given tag. Besides entertaining       is to enrich the images with contextual information in order to
the players, the aim is to enrich the image context information to       improve future search tasks.
improve the image search in the future. During the game,
information about the shown images is collected. It includes the         The players look at images falling down the screen. The task is to
classification concerning the tag, a rating of the given images by       classify the images as relevant or irrelevant to a given tag.
the user (“like” or “not like”) and the eye tracking information         Relevant images are rated by the participants into “I like it” or “I
recorded when viewing the images. In this work, we present the           do not like it”. We have compared two different interaction
design of the game and compare two design variants – one with            designs of the game in a study with 24 subjects and measured
and one without visual aid – concerning the suitability of the           effectiveness, efficiency, and satisfaction. While the first variant
game for image annotation. The variants of the game are                  provides visual aids in form of highlighting the interactive regions
evaluated in a study with 24 participants. We measured the user          and a gaze-cursor, which visualizes the subjects’ fixations on the
satisfaction, efficiency and effectiveness of the game. Overall,         screen, the second variant does not. Overall, we can state that the
83% of the users enjoyed playing the game. The results show that         vast majority of the participants enjoyed playing the game and
the visual aid is not helping the users in our application; it even      that the gaze-based control of the game was experienced as an
increases the error rate. The best classification precision we           improvement on the entertainment. The players which received
achieve is 92% for the game variant without visual aid.                  the visual aids had the impression to be supported by it. However,
                                                                         the results show that the visual aids lead to significantly more
                                                                         incorrect and missing classifications. In fact, given a ground truth
Categories and Subject Descriptors                                       image data set, we have achieved the best classification precision
H.5.2 [User Interfaces]: Input devices and strategies                    with 92% for the game variant without visual aid.
                                                                         The satisfaction and the precision of the results gained in our
General Terms                                                            experiment are very satisfactory. Based on this outcome, we will
Human Factors                                                            continue with the evaluation and conduct information extraction
                                                                         from the gaze paths in a next step. Based on a prior
Keywords                                                                 experiment [8], we can use this gaze information to add region-
Eye tracking, Game with purpose                                          based annotations to the images.


1. INTRODUCTION                                                          2. RELATED WORK
The search for digital images is still a challenging task. It is often   A large number of applications were introduced in the past that
performed based on context information, e.g. tags describing the         use eye movements as input medium, often for people with
image content. Tags assigned to specific image regions instead of        disabilities, e.g., for a drawing system [1]. Also the use of gaze
the image can improve the search results [9]. Also the ratings of        information as relevance feedback in image retrieval was
the images can be used to deliver good search results, as it is done     investigated with promising results, e.g. [6]. Walber et al. [8]
on some image stocking pages like Photo.net.                             showed that specific image regions can be identified, using gaze
                                                                         information. The development of sensor hardware like cameras in
The game EyeGrab has been developed as a game with a purpose             computers is continuously progressing. Already now eye tracking
(GWAP) to improve or collect these information: the description          can be performed with a commodity web camera. San Agustin et
by tags, a personal rating and information about image regions.          al. [7] compare a commercial eye tracker and a webcam system.
                                                                         The results for the webcam system are satisfactory and
                                                                         comparable to the commercial system, although still with
 Presented at EuroHCIR2012. Copyright © 2012 for the individual          limitations concerning the comfort of use. Based on this
 papers by the papers' authors. Copying permitted only for private and
                                                                         development, one can assume that eye tracking could be
 academic purposes. This volume is published and copyrighted by its
 editors.                                                                performed for more users in the future and it will be possible to
                                                                         use the technology also in playing games.
                                                                         Data obtained from eye tracking is less accurate than, e.g., from a
                                                                         computer mouse, due to natural movements of the eyes. It can be
difficult for the users to focus the gaze on a specific region to
select a button. One possibility for supporting the users in
controlling an application by gaze is to visualize the gaze as a
cursor. Some related work indicates the problem of distraction
from this kind of visualization [2], others see the chance of such a
natural “pointer” [5].
Some years ago, a new class of applications appeared, the so-
called games with a purpose (GWAPs) [3, 4]. The goal of
GWAPs is to gain information from humans in an entertaining
way. One example is the game Peekaboom [4] where two users
play together for labeling image regions. Another is the ESP-
Game [3] with two randomly assigned players, each tagging one                Figure 1. Start screen with gaze-selected “male”.
image and trying to provide the same tags as the team mate. Tobii
recently introduced the game EyeAsteroids and claims to be the
first purely eye-controlled arcade game. It is entertaining, but
does not have the goal to benefit from the users' activities. Eye
tracking fascinates users as an unusual kind of input device. One
can benefit from this curiosity by offering entertaining
applications that also gain some information from the users.
Despite the variety of eye tracking applications and games,
EyeGrab combines – to the best of our knowledge – for the first
time both, the aspects of leveraging from user activities like in
GWAPs and controlling the application by the use of eye
movements.
                                                                                  Figure 2. Presentation of the category.
3. DESCRIPTION OF THE GAME
EyeGrab is a single-player game that takes place in a galactic         For each category, the images were chosen from the 100 most
dimension. The task is to clean up the aliens' universe by             relevant Flickr-pictures. 20 of them were randomly selected and
categorizing and rating images. Before the game starts, the user is    combined with 10 pictures of a different category. An inter-rater
asked to enter his or her nickname using the keyboard (Figure 1).      agreement with 3 neutral persons was used to confirm the
The rest of the game is then played exclusively by the use of the      categorization and to create the ground truth.
eye tracker. Every gaze-based selection takes place after a dwell
time of 450 milliseconds to avoid random choices. For example,
the selection of the gender is done by focusing a male or female
character as shown in Figure 1. Subsequently, the player is shown
a small introduction to the game's rules (no screenshot).
The game has three rounds, with three categories (“car”, “house”,
and “mouse”). First, the category is shown (see Figure 2), next the
round starts. 30 images fall down the screen as depicted in
Figure 3. The player categorizes the falling images into one of
three categories. He or she can select an image by fixating it for
more than 450 milliseconds. When an image is selected, it is
highlighted by a thin, red frame. Next the image is classified into
“like it”, “don't like it”, or “not relevant”, where the first two
imply that the image is described by the named tag and the third                 Figure 3. Gaze-based image classification.
specifies that the image does not belong to that tag. To classify an
image, the user looks at the area of the intended classification on
the screen as shown in Figure 3 (same dwell time as above). The
player receives points for each correctly categorized image,           Two versions of the game have been implemented, one offering
negative points for each false one and no points for images that       visual aids (see Figure 4a) to the user and the other one without
fell off the screen without classifying them. To further challenge     such help (see Figure 4b). The visual aids include a highlighting
the user, the speed is increased with higher levels. A high score      of the “action areas”, i.e., areas which perform an action when
list is presented to the user at the end of the game.                  being fixated, and the visualization of the gaze point on the screen
                                                                       (gaze-cursor). Examples are the classification buttons as shown in
                                                                       Figure 3 (details in Figures 4a and 4b).
                                                                       In total, 1440 pictures were shown to the subjects within the game
                                                                       (30 pictures per category “house” and “mouse” times 24 subjects).
                                                                       Only in 42 cases, the image passed without classification,
                                                                       resulting in a total of 1398 classified images. 1162 images were
                                                                       correctly classified (83%). Thus, only 236 images were
                                                                       incorrectly classified. Overall we had 897 true-positive
                                                                       classifications, 128 false-negative and 108 false-positive
                                                                       classification, which leads to a precision of 89% and a recall of
                                                                       88% over all users. For the group with the better results (the
                                                                       group without visual aid, see next section) we obtain a precision
Figure 4a. Visual aids              Figure 4b. No visual aids.         of 92%.
(rectangle: action area, here:
“not like”, circle: gaze cursor).                                      5.3 Visual aid
                                                                       The subjective perception of the users in group B (the group that
                                                                       was provided with visual aids) was that the visual aids supported
                                                                       them in the classification tasks. The question regarding the visual
4. EVALUATION DESIGN
                                                                       highlighting of the active areas was rated as very helpful with an
In order to evaluate EyeGrab, 24 subjects (7 female) played the
                                                                       average of 4.67 (SD: 0.49). The subjects also answered that
game. The subjects’ age was between 15 and 32 years (mean: 24,
                                                                       displaying the gaze point was very helpful and scored this
SD: 3.9). 19 subjects were students, 2 research assistant, one pupil
                                                                       question on average with 4.5 (SD: 0.67). However, to our
and 2 had other professions. Most of the players had experience in
                                                                       surprise, the following statistical analysis of the data shows that
gaming (mean: 3.5, SD: 1.31). Only a few were familiar with eye
                                                                       group B with the visual aids misclassified significantly more
tracking, this is indicated by 19 subjects rating the question
                                                                       images than group A did.
concerning their eye tracking experience with one (mean: 1.63,
SD: 1.38). The subjects were randomly divided into two groups          Group A correctly classified 296 images for category “house”
A and B. Group A had no visual aids during the game, whereas           whereas group B correctly assigned 264 images for this category.
group B did. 8 users were wearer of glasses or contact lenses (4 in    Regarding the category “mouse”, in total 317 correct assignments
each group). There were no problems using the eye tracker for          were made by group A whereas group B correctly assigned 287
those subjects.                                                        images. Regarding the misclassified images, group A
                                                                       misclassified 59 images for category “house”, whereas group B
To avoid distractions, the game was played in our eye-tracking
                                                                       wrongly assigned the image category in 81 cases. For the
lab providing a chair, a desk with an eye tracker, and a standard
                                                                       category “mouse”, the number of incorrect assignments is 37 for
monitor. The first step was a calibration of the eye tracker. After
                                                                       group A and 57 for group B. We compared the values for correct
this was done, the game was started without further instructions
                                                                       and incorrect assignments for group A and B in a 2x2 Chi-square
and was played with 30 images in each of the three rounds. The
                                                                       test for both categories. The differences are significant regarding
data from the first round is not used in the later analysis, because
                                                                       a significance level of α= 0.05 with χ2 (1, N = 700) = 5.14, p =
it has only served for getting the subjects acquainted with the
                                                                       0.023 for the category “house” and χ2 (1, N = 698) = 5.6, p =
usage of the eye tracker as input device. At the end of the
                                                                       0.018 for “mouse”. In group B 31, images passed without
experiment, every user filled out a questionaire, including
                                                                       classification, in group A only 11 images were not classified.
personal information and questions about the performance of the
game. The answers were given on a 5-point Likert scale.                These results indicate that the visual support is not improving the
                                                                       classification. Despite the good impression of the visual support
                                                                       that group B expressed, the following question might be an
5. EVALUATION RESULTS                                                  indicator that this group felt less comfortable with the eye tracker-
5.1 Satisfaction                                                       based interaction than group A did: we asked the subjects to state
                                                                       if they preferred a mouse-based interaction instead of the eye
The questionnaires show that the subjects enjoyed playing the
                                                                       tracker-based one. On average, subjects of group A scored this
game. On average, the statement “It was fun playing the game.” is
                                                                       question with 2.17 (SD: 1.47) and group B with 3.25 (SD: 1.48).
rated 3.46 (SD: 0.93) considering all 24 subjects. 20 of the 24
                                                                       Using a two-tailed Mann-Whitney U-Test, a weakly significant
users agreed to this statement. One of the following questions was
                                                                       difference was determined stating a preference of group B over
if the participants felt like the interaction with the eye tracker
                                                                       group A to use the mouse to play EyeGrab (U = 43, z = -1.719, n1
increases the fun of the game. 14 subjects agreed or strongly
                                                                       = n2 = 12, p = .085).
agreed to this statement (mean: 3.5, SD: 1.25). Also, most of the
subjects did not feel disturbed by the eye tracker (mean: 2.25,
SD: 1.5).                                                              6. FUTURE WORK
                                                                       For the current version of our EyeGrab game, we have used pre-
5.2 Effectiveness and Efficiency                                       classified images in order to verify the classification performance
                                                                       of the subjects. We plan to use images without annotations in
One round of the game comprises 30 images and it takes about
                                                                       future extensions of the game.
two minutes including the introduction and the input form. Each
level has a different pace at which the images fall down the           Also the detailed analysis of the gaze information will be
screen. Thus, the classification per image takes between 2.6 and 4     performed in a next step. In a small sample of 5 images classified
seconds.                                                               by one user, we received 231 gaze points on the images. An
                                                                       example of a gaze path visualization is shown in Figure 5. We
expect a sufficient number of fixations and correct classification      evaluation of the gained information and to use it for improving
to allow a detailed analysis.                                           image search tasks.
We received 897 ratings for the shown images. 556 of them were
positive. The quality of these ratings has to be investigated in a      8. REFERENCES
future experiment, e.g., by repeating the ratings in another context
with the same users or by using a ground truth set with images,         [1] Hornof, A.J. and Cavender, A. 2005. EyeDraw: enabling
often liked by a big number of other users. However it has to be            children with severe motor impairments to draw with their
clear, that a subjective rating can never be “correct” or not. These        eyes. Proceedings of the SIGCHI conference on Human
investigations can only provide an indication of the worth of the           factors in computing systems, 170.
rating. Overall, this detailed analysis will allow us to identify the
regions that correspond to the category given in the EyeGrab            [2] Jacob, R.J.K. 1993. Eye movement-based human-computer
game. Such region-based annotations will allow for a better                 interaction techniques: Toward non-command interfaces.
retrieval of the images in the future.                                      Advances in human-computer interaction, 151–190.
                                                                        [3] Von Ahn, L. and Dabbish, L. 2004. Labeling images with a
                                                                            computer game. In Proceedings of the SIGCHI conference
                                                                            on Human factors in computing systems, 319–326.
                                                                        [4] Von Ahn, L., Liu, R. and Blum, M. 2006. Peekaboom : A
                                                                            Game for Locating Objects in Images. SIGCHI conference
                                                                            on Human Factors in computing systems, 55–64.
                                                                        [5] Zhang, X. and Ren, X., and Zha, H. 2008. Improving eye
                                                                            cursor’s stability for eye pointing tasks. SIGCHI conference
                                                                            on Human factors in computing systems, 525–534.
                                                                        [6] Kozma, L., Klami, A. and Kaski, S. 2009. GaZIR: gaze-
                                                                            based zooming interface for image retrieval. In Proceedings
                                                                            of the 2009 international conference on Multimodal
                                                                            interfaces.
                                                                        [7] J. San Agustin, H. Skovsgaard, J.P. Hansen, and D.W.
                                                                            Hansen. Low-cost gaze interaction: ready to deliver the
                                                                            promises. In Proceedings of the 27th international
                                                                            conference extended abstracts on Human factors in
   Figure 5. Visualization of fixations on a classified image.              computing systems, pages 4453–4458. ACM, 2009.
                                                                        [8] Walber, T. and Scherp, A. and Staab, S. 2012. Identifying
                                                                            Objects in Images from Analyzing the Users’ Gaze
7. SUMMARY                                                                  Movements for Provided Tags. Advances in Multimedia
We have introduced the gaze-based game with a purpose
                                                                            Modeling, 138-148.
EyeGrab to classify images using an eye tracker. We have shown
that the game has the potential to entertain the players and that the   [9] Carson, C., Thomas, M., Belongie, S., Hellerstein, J. and
classification results are good enough to advance beyond the gaze           Malik, J. 1999. Blobworld: A system for region-based image
analysis. This analysis is the first step in the direction of               indexing and retrieval. Visual Information and Information
extending image context information with information gained in              Systems.
an eye tracking game. The next step will be the analysis and