=Paper=
{{Paper
|id=None
|storemode=property
|title=EyeGrab: A Gaze-based Game with a Purpose to Enrich Image Context Information
|pdfUrl=https://ceur-ws.org/Vol-909/poster8.pdf
|volume=Vol-909
|dblpUrl=https://dblp.org/rec/conf/eurohcir/WalberNS12
}}
==EyeGrab: A Gaze-based Game with a Purpose to Enrich Image Context Information==
EyeGrab: A Gaze-based Game with a Purpose to Enrich
Image Context Information
Tina Walber Chantal Neuhaus Ansgar Scherp
University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau
Institute for Web Science and Institute for Web Science and Institute for Web Science and
Technologies Technologies Technologies
Koblenz, Germany Koblenz, Germany Koblenz, Germany
walber@uni-koblenz.de cneuhaus@uni-koblenz.de scherp@unikoblenz.de
ABSTRACT The game is controlled by eye movements, which on the one hand
enhances the user satisfaction and on the other hand allows for
We present EyeGrab, a game for image classification that is collecting gaze information that will be analyzed to gain
controlled by the users' gaze. The players classify images information about the image content. The overall goal of EyeGrab
according to their relevance for a given tag. Besides entertaining is to enrich the images with contextual information in order to
the players, the aim is to enrich the image context information to improve future search tasks.
improve the image search in the future. During the game,
information about the shown images is collected. It includes the The players look at images falling down the screen. The task is to
classification concerning the tag, a rating of the given images by classify the images as relevant or irrelevant to a given tag.
the user (“like” or “not like”) and the eye tracking information Relevant images are rated by the participants into “I like it” or “I
recorded when viewing the images. In this work, we present the do not like it”. We have compared two different interaction
design of the game and compare two design variants – one with designs of the game in a study with 24 subjects and measured
and one without visual aid – concerning the suitability of the effectiveness, efficiency, and satisfaction. While the first variant
game for image annotation. The variants of the game are provides visual aids in form of highlighting the interactive regions
evaluated in a study with 24 participants. We measured the user and a gaze-cursor, which visualizes the subjects’ fixations on the
satisfaction, efficiency and effectiveness of the game. Overall, screen, the second variant does not. Overall, we can state that the
83% of the users enjoyed playing the game. The results show that vast majority of the participants enjoyed playing the game and
the visual aid is not helping the users in our application; it even that the gaze-based control of the game was experienced as an
increases the error rate. The best classification precision we improvement on the entertainment. The players which received
achieve is 92% for the game variant without visual aid. the visual aids had the impression to be supported by it. However,
the results show that the visual aids lead to significantly more
incorrect and missing classifications. In fact, given a ground truth
Categories and Subject Descriptors image data set, we have achieved the best classification precision
H.5.2 [User Interfaces]: Input devices and strategies with 92% for the game variant without visual aid.
The satisfaction and the precision of the results gained in our
General Terms experiment are very satisfactory. Based on this outcome, we will
Human Factors continue with the evaluation and conduct information extraction
from the gaze paths in a next step. Based on a prior
Keywords experiment [8], we can use this gaze information to add region-
Eye tracking, Game with purpose based annotations to the images.
1. INTRODUCTION 2. RELATED WORK
The search for digital images is still a challenging task. It is often A large number of applications were introduced in the past that
performed based on context information, e.g. tags describing the use eye movements as input medium, often for people with
image content. Tags assigned to specific image regions instead of disabilities, e.g., for a drawing system [1]. Also the use of gaze
the image can improve the search results [9]. Also the ratings of information as relevance feedback in image retrieval was
the images can be used to deliver good search results, as it is done investigated with promising results, e.g. [6]. Walber et al. [8]
on some image stocking pages like Photo.net. showed that specific image regions can be identified, using gaze
information. The development of sensor hardware like cameras in
The game EyeGrab has been developed as a game with a purpose computers is continuously progressing. Already now eye tracking
(GWAP) to improve or collect these information: the description can be performed with a commodity web camera. San Agustin et
by tags, a personal rating and information about image regions. al. [7] compare a commercial eye tracker and a webcam system.
The results for the webcam system are satisfactory and
comparable to the commercial system, although still with
Presented at EuroHCIR2012. Copyright © 2012 for the individual limitations concerning the comfort of use. Based on this
papers by the papers' authors. Copying permitted only for private and
development, one can assume that eye tracking could be
academic purposes. This volume is published and copyrighted by its
editors. performed for more users in the future and it will be possible to
use the technology also in playing games.
Data obtained from eye tracking is less accurate than, e.g., from a
computer mouse, due to natural movements of the eyes. It can be
difficult for the users to focus the gaze on a specific region to
select a button. One possibility for supporting the users in
controlling an application by gaze is to visualize the gaze as a
cursor. Some related work indicates the problem of distraction
from this kind of visualization [2], others see the chance of such a
natural “pointer” [5].
Some years ago, a new class of applications appeared, the so-
called games with a purpose (GWAPs) [3, 4]. The goal of
GWAPs is to gain information from humans in an entertaining
way. One example is the game Peekaboom [4] where two users
play together for labeling image regions. Another is the ESP-
Game [3] with two randomly assigned players, each tagging one Figure 1. Start screen with gaze-selected “male”.
image and trying to provide the same tags as the team mate. Tobii
recently introduced the game EyeAsteroids and claims to be the
first purely eye-controlled arcade game. It is entertaining, but
does not have the goal to benefit from the users' activities. Eye
tracking fascinates users as an unusual kind of input device. One
can benefit from this curiosity by offering entertaining
applications that also gain some information from the users.
Despite the variety of eye tracking applications and games,
EyeGrab combines – to the best of our knowledge – for the first
time both, the aspects of leveraging from user activities like in
GWAPs and controlling the application by the use of eye
movements.
Figure 2. Presentation of the category.
3. DESCRIPTION OF THE GAME
EyeGrab is a single-player game that takes place in a galactic For each category, the images were chosen from the 100 most
dimension. The task is to clean up the aliens' universe by relevant Flickr-pictures. 20 of them were randomly selected and
categorizing and rating images. Before the game starts, the user is combined with 10 pictures of a different category. An inter-rater
asked to enter his or her nickname using the keyboard (Figure 1). agreement with 3 neutral persons was used to confirm the
The rest of the game is then played exclusively by the use of the categorization and to create the ground truth.
eye tracker. Every gaze-based selection takes place after a dwell
time of 450 milliseconds to avoid random choices. For example,
the selection of the gender is done by focusing a male or female
character as shown in Figure 1. Subsequently, the player is shown
a small introduction to the game's rules (no screenshot).
The game has three rounds, with three categories (“car”, “house”,
and “mouse”). First, the category is shown (see Figure 2), next the
round starts. 30 images fall down the screen as depicted in
Figure 3. The player categorizes the falling images into one of
three categories. He or she can select an image by fixating it for
more than 450 milliseconds. When an image is selected, it is
highlighted by a thin, red frame. Next the image is classified into
“like it”, “don't like it”, or “not relevant”, where the first two
imply that the image is described by the named tag and the third Figure 3. Gaze-based image classification.
specifies that the image does not belong to that tag. To classify an
image, the user looks at the area of the intended classification on
the screen as shown in Figure 3 (same dwell time as above). The
player receives points for each correctly categorized image, Two versions of the game have been implemented, one offering
negative points for each false one and no points for images that visual aids (see Figure 4a) to the user and the other one without
fell off the screen without classifying them. To further challenge such help (see Figure 4b). The visual aids include a highlighting
the user, the speed is increased with higher levels. A high score of the “action areas”, i.e., areas which perform an action when
list is presented to the user at the end of the game. being fixated, and the visualization of the gaze point on the screen
(gaze-cursor). Examples are the classification buttons as shown in
Figure 3 (details in Figures 4a and 4b).
In total, 1440 pictures were shown to the subjects within the game
(30 pictures per category “house” and “mouse” times 24 subjects).
Only in 42 cases, the image passed without classification,
resulting in a total of 1398 classified images. 1162 images were
correctly classified (83%). Thus, only 236 images were
incorrectly classified. Overall we had 897 true-positive
classifications, 128 false-negative and 108 false-positive
classification, which leads to a precision of 89% and a recall of
88% over all users. For the group with the better results (the
group without visual aid, see next section) we obtain a precision
Figure 4a. Visual aids Figure 4b. No visual aids. of 92%.
(rectangle: action area, here:
“not like”, circle: gaze cursor). 5.3 Visual aid
The subjective perception of the users in group B (the group that
was provided with visual aids) was that the visual aids supported
them in the classification tasks. The question regarding the visual
4. EVALUATION DESIGN
highlighting of the active areas was rated as very helpful with an
In order to evaluate EyeGrab, 24 subjects (7 female) played the
average of 4.67 (SD: 0.49). The subjects also answered that
game. The subjects’ age was between 15 and 32 years (mean: 24,
displaying the gaze point was very helpful and scored this
SD: 3.9). 19 subjects were students, 2 research assistant, one pupil
question on average with 4.5 (SD: 0.67). However, to our
and 2 had other professions. Most of the players had experience in
surprise, the following statistical analysis of the data shows that
gaming (mean: 3.5, SD: 1.31). Only a few were familiar with eye
group B with the visual aids misclassified significantly more
tracking, this is indicated by 19 subjects rating the question
images than group A did.
concerning their eye tracking experience with one (mean: 1.63,
SD: 1.38). The subjects were randomly divided into two groups Group A correctly classified 296 images for category “house”
A and B. Group A had no visual aids during the game, whereas whereas group B correctly assigned 264 images for this category.
group B did. 8 users were wearer of glasses or contact lenses (4 in Regarding the category “mouse”, in total 317 correct assignments
each group). There were no problems using the eye tracker for were made by group A whereas group B correctly assigned 287
those subjects. images. Regarding the misclassified images, group A
misclassified 59 images for category “house”, whereas group B
To avoid distractions, the game was played in our eye-tracking
wrongly assigned the image category in 81 cases. For the
lab providing a chair, a desk with an eye tracker, and a standard
category “mouse”, the number of incorrect assignments is 37 for
monitor. The first step was a calibration of the eye tracker. After
group A and 57 for group B. We compared the values for correct
this was done, the game was started without further instructions
and incorrect assignments for group A and B in a 2x2 Chi-square
and was played with 30 images in each of the three rounds. The
test for both categories. The differences are significant regarding
data from the first round is not used in the later analysis, because
a significance level of α= 0.05 with χ2 (1, N = 700) = 5.14, p =
it has only served for getting the subjects acquainted with the
0.023 for the category “house” and χ2 (1, N = 698) = 5.6, p =
usage of the eye tracker as input device. At the end of the
0.018 for “mouse”. In group B 31, images passed without
experiment, every user filled out a questionaire, including
classification, in group A only 11 images were not classified.
personal information and questions about the performance of the
game. The answers were given on a 5-point Likert scale. These results indicate that the visual support is not improving the
classification. Despite the good impression of the visual support
that group B expressed, the following question might be an
5. EVALUATION RESULTS indicator that this group felt less comfortable with the eye tracker-
5.1 Satisfaction based interaction than group A did: we asked the subjects to state
if they preferred a mouse-based interaction instead of the eye
The questionnaires show that the subjects enjoyed playing the
tracker-based one. On average, subjects of group A scored this
game. On average, the statement “It was fun playing the game.” is
question with 2.17 (SD: 1.47) and group B with 3.25 (SD: 1.48).
rated 3.46 (SD: 0.93) considering all 24 subjects. 20 of the 24
Using a two-tailed Mann-Whitney U-Test, a weakly significant
users agreed to this statement. One of the following questions was
difference was determined stating a preference of group B over
if the participants felt like the interaction with the eye tracker
group A to use the mouse to play EyeGrab (U = 43, z = -1.719, n1
increases the fun of the game. 14 subjects agreed or strongly
= n2 = 12, p = .085).
agreed to this statement (mean: 3.5, SD: 1.25). Also, most of the
subjects did not feel disturbed by the eye tracker (mean: 2.25,
SD: 1.5). 6. FUTURE WORK
For the current version of our EyeGrab game, we have used pre-
5.2 Effectiveness and Efficiency classified images in order to verify the classification performance
of the subjects. We plan to use images without annotations in
One round of the game comprises 30 images and it takes about
future extensions of the game.
two minutes including the introduction and the input form. Each
level has a different pace at which the images fall down the Also the detailed analysis of the gaze information will be
screen. Thus, the classification per image takes between 2.6 and 4 performed in a next step. In a small sample of 5 images classified
seconds. by one user, we received 231 gaze points on the images. An
example of a gaze path visualization is shown in Figure 5. We
expect a sufficient number of fixations and correct classification evaluation of the gained information and to use it for improving
to allow a detailed analysis. image search tasks.
We received 897 ratings for the shown images. 556 of them were
positive. The quality of these ratings has to be investigated in a 8. REFERENCES
future experiment, e.g., by repeating the ratings in another context
with the same users or by using a ground truth set with images, [1] Hornof, A.J. and Cavender, A. 2005. EyeDraw: enabling
often liked by a big number of other users. However it has to be children with severe motor impairments to draw with their
clear, that a subjective rating can never be “correct” or not. These eyes. Proceedings of the SIGCHI conference on Human
investigations can only provide an indication of the worth of the factors in computing systems, 170.
rating. Overall, this detailed analysis will allow us to identify the
regions that correspond to the category given in the EyeGrab [2] Jacob, R.J.K. 1993. Eye movement-based human-computer
game. Such region-based annotations will allow for a better interaction techniques: Toward non-command interfaces.
retrieval of the images in the future. Advances in human-computer interaction, 151–190.
[3] Von Ahn, L. and Dabbish, L. 2004. Labeling images with a
computer game. In Proceedings of the SIGCHI conference
on Human factors in computing systems, 319–326.
[4] Von Ahn, L., Liu, R. and Blum, M. 2006. Peekaboom : A
Game for Locating Objects in Images. SIGCHI conference
on Human Factors in computing systems, 55–64.
[5] Zhang, X. and Ren, X., and Zha, H. 2008. Improving eye
cursor’s stability for eye pointing tasks. SIGCHI conference
on Human factors in computing systems, 525–534.
[6] Kozma, L., Klami, A. and Kaski, S. 2009. GaZIR: gaze-
based zooming interface for image retrieval. In Proceedings
of the 2009 international conference on Multimodal
interfaces.
[7] J. San Agustin, H. Skovsgaard, J.P. Hansen, and D.W.
Hansen. Low-cost gaze interaction: ready to deliver the
promises. In Proceedings of the 27th international
conference extended abstracts on Human factors in
Figure 5. Visualization of fixations on a classified image. computing systems, pages 4453–4458. ACM, 2009.
[8] Walber, T. and Scherp, A. and Staab, S. 2012. Identifying
Objects in Images from Analyzing the Users’ Gaze
7. SUMMARY Movements for Provided Tags. Advances in Multimedia
We have introduced the gaze-based game with a purpose
Modeling, 138-148.
EyeGrab to classify images using an eye tracker. We have shown
that the game has the potential to entertain the players and that the [9] Carson, C., Thomas, M., Belongie, S., Hellerstein, J. and
classification results are good enough to advance beyond the gaze Malik, J. 1999. Blobworld: A system for region-based image
analysis. This analysis is the first step in the direction of indexing and retrieval. Visual Information and Information
extending image context information with information gained in Systems.
an eye tracking game. The next step will be the analysis and