<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EyeGrab: A Gaze-based Game with a Purpose to Enrich Image Context Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tina Walber</string-name>
          <email>walber@uni-koblenz.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chantal Neuhaus</string-name>
          <email>cneuhaus@uni-koblenz.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ansgar Scherp</string-name>
          <email>scherp@unikoblenz.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Koblenz-Landau, Institute for Web Science and</institution>
          ,
          <addr-line>Technologies, Koblenz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present EyeGrab, a game for image classification that is controlled by the users' gaze. The players classify images according to their relevance for a given tag. Besides entertaining the players, the aim is to enrich the image context information to improve the image search in the future. During the game, information about the shown images is collected. It includes the classification concerning the tag, a rating of the given images by the user (“like” or “not like”) and the eye tracking information recorded when viewing the images. In this work, we present the design of the game and compare two design variants - one with and one without visual aid - concerning the suitability of the game for image annotation. The variants of the game are evaluated in a study with 24 participants. We measured the user satisfaction, efficiency and effectiveness of the game. Overall, 83% of the users enjoyed playing the game. The results show that the visual aid is not helping the users in our application; it even increases the error rate. The best classification precision we achieve is 92% for the game variant without visual aid. The game EyeGrab has been developed as a game with a purpose (GWAP) to improve or collect these information: the description by tags, a personal rating and information about image regions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Eye tracking</kwd>
        <kwd>Game with purpose</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Presented at EuroHCIR2012. Copyright © 2012 for the individual
papers by the papers' authors. Copying permitted only for private and
academic purposes. This volume is published and copyrighted by its
editors.</p>
      <p>The game is controlled by eye movements, which on the one hand
enhances the user satisfaction and on the other hand allows for
collecting gaze information that will be analyzed to gain
information about the image content. The overall goal of EyeGrab
is to enrich the images with contextual information in order to
improve future search tasks.</p>
      <p>The players look at images falling down the screen. The task is to
classify the images as relevant or irrelevant to a given tag.
Relevant images are rated by the participants into “I like it” or “I
do not like it”. We have compared two different interaction
designs of the game in a study with 24 subjects and measured
effectiveness, efficiency, and satisfaction. While the first variant
provides visual aids in form of highlighting the interactive regions
and a gaze-cursor, which visualizes the subjects’ fixations on the
screen, the second variant does not. Overall, we can state that the
vast majority of the participants enjoyed playing the game and
that the gaze-based control of the game was experienced as an
improvement on the entertainment. The players which received
the visual aids had the impression to be supported by it. However,
the results show that the visual aids lead to significantly more
incorrect and missing classifications. In fact, given a ground truth
image data set, we have achieved the best classification precision
with 92% for the game variant without visual aid.</p>
      <p>
        The satisfaction and the precision of the results gained in our
experiment are very satisfactory. Based on this outcome, we will
continue with the evaluation and conduct information extraction
from the gaze paths in a next step. Based on a prior
experiment [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ], we can use this gaze information to add
regionbased annotations to the images.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        A large number of applications were introduced in the past that
use eye movements as input medium, often for people with
disabilities, e.g., for a drawing system [1]. Also the use of gaze
information as relevance feedback in image retrieval was
investigated with promising results, e.g. [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. Walber et al. [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ]
showed that specific image regions can be identified, using gaze
information. The development of sensor hardware like cameras in
computers is continuously progressing. Already now eye tracking
can be performed with a commodity web camera. San Agustin et
al. [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ] compare a commercial eye tracker and a webcam system.
The results for the webcam system are satisfactory and
comparable to the commercial system, although still with
limitations concerning the comfort of use. Based on this
development, one can assume that eye tracking could be
performed for more users in the future and it will be possible to
use the technology also in playing games.
      </p>
      <p>
        Data obtained from eye tracking is less accurate than, e.g., from a
computer mouse, due to natural movements of the eyes. It can be
difficult for the users to focus the gaze on a specific region to
select a button. One possibility for supporting the users in
controlling an application by gaze is to visualize the gaze as a
cursor. Some related work indicates the problem of distraction
from this kind of visualization [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ], others see the chance of such a
natural “pointer” [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ].
      </p>
      <p>
        Some years ago, a new class of applications appeared, the
socalled games with a purpose (GWAPs) [
        <xref ref-type="bibr" rid="ref2 ref3">3, 4</xref>
        ]. The goal of
GWAPs is to gain information from humans in an entertaining
way. One example is the game Peekaboom [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ] where two users
play together for labeling image regions. Another is the
ESPGame [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] with two randomly assigned players, each tagging one
image and trying to provide the same tags as the team mate. Tobii
recently introduced the game EyeAsteroids and claims to be the
first purely eye-controlled arcade game. It is entertaining, but
does not have the goal to benefit from the users' activities. Eye
tracking fascinates users as an unusual kind of input device. One
can benefit from this curiosity by offering entertaining
applications that also gain some information from the users.
Despite the variety of eye tracking applications and games,
EyeGrab combines – to the best of our knowledge – for the first
time both, the aspects of leveraging from user activities like in
GWAPs and controlling the application by the use of eye
movements.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. DESCRIPTION OF THE GAME</title>
      <p>EyeGrab is a single-player game that takes place in a galactic
dimension. The task is to clean up the aliens' universe by
categorizing and rating images. Before the game starts, the user is
asked to enter his or her nickname using the keyboard (Figure 1).
The rest of the game is then played exclusively by the use of the
eye tracker. Every gaze-based selection takes place after a dwell
time of 450 milliseconds to avoid random choices. For example,
the selection of the gender is done by focusing a male or female
character as shown in Figure 1. Subsequently, the player is shown
a small introduction to the game's rules (no screenshot).
The game has three rounds, with three categories (“car”, “house”,
and “mouse”). First, the category is shown (see Figure 2), next the
round starts. 30 images fall down the screen as depicted in
Figure 3. The player categorizes the falling images into one of
three categories. He or she can select an image by fixating it for
more than 450 milliseconds. When an image is selected, it is
highlighted by a thin, red frame. Next the image is classified into
“like it”, “don't like it”, or “not relevant”, where the first two
imply that the image is described by the named tag and the third
specifies that the image does not belong to that tag. To classify an
image, the user looks at the area of the intended classification on
the screen as shown in Figure 3 (same dwell time as above). The
player receives points for each correctly categorized image,
negative points for each false one and no points for images that
fell off the screen without classifying them. To further challenge
the user, the speed is increased with higher levels. A high score
list is presented to the user at the end of the game.</p>
      <p>For each category, the images were chosen from the 100 most
relevant Flickr-pictures. 20 of them were randomly selected and
combined with 10 pictures of a different category. An inter-rater
agreement with 3 neutral persons was used to confirm the
categorization and to create the ground truth.</p>
      <p>Two versions of the game have been implemented, one offering
visual aids (see Figure 4a) to the user and the other one without
such help (see Figure 4b). The visual aids include a highlighting
of the “action areas”, i.e., areas which perform an action when
being fixated, and the visualization of the gaze point on the screen
(gaze-cursor). Examples are the classification buttons as shown in
Figure 3 (details in Figures 4a and 4b).</p>
      <p>Figure 4a. Visual aids
(rectangle: action area, here:
“not like”, circle: gaze cursor).</p>
    </sec>
    <sec id="sec-4">
      <title>4. EVALUATION DESIGN</title>
      <p>In order to evaluate EyeGrab, 24 subjects (7 female) played the
game. The subjects’ age was between 15 and 32 years (mean: 24,
SD: 3.9). 19 subjects were students, 2 research assistant, one pupil
and 2 had other professions. Most of the players had experience in
gaming (mean: 3.5, SD: 1.31). Only a few were familiar with eye
tracking, this is indicated by 19 subjects rating the question
concerning their eye tracking experience with one (mean: 1.63,
SD: 1.38). The subjects were randomly divided into two groups
A and B. Group A had no visual aids during the game, whereas
group B did. 8 users were wearer of glasses or contact lenses (4 in
each group). There were no problems using the eye tracker for
those subjects.</p>
      <p>To avoid distractions, the game was played in our eye-tracking
lab providing a chair, a desk with an eye tracker, and a standard
monitor. The first step was a calibration of the eye tracker. After
this was done, the game was started without further instructions
and was played with 30 images in each of the three rounds. The
data from the first round is not used in the later analysis, because
it has only served for getting the subjects acquainted with the
usage of the eye tracker as input device. At the end of the
experiment, every user filled out a questionaire, including
personal information and questions about the performance of the
game. The answers were given on a 5-point Likert scale.</p>
    </sec>
    <sec id="sec-5">
      <title>5. EVALUATION RESULTS</title>
    </sec>
    <sec id="sec-6">
      <title>5.1 Satisfaction</title>
      <p>The questionnaires show that the subjects enjoyed playing the
game. On average, the statement “It was fun playing the game.” is
rated 3.46 (SD: 0.93) considering all 24 subjects. 20 of the 24
users agreed to this statement. One of the following questions was
if the participants felt like the interaction with the eye tracker
increases the fun of the game. 14 subjects agreed or strongly
agreed to this statement (mean: 3.5, SD: 1.25). Also, most of the
subjects did not feel disturbed by the eye tracker (mean: 2.25,
SD: 1.5).</p>
    </sec>
    <sec id="sec-7">
      <title>5.2 Effectiveness and Efficiency</title>
      <p>One round of the game comprises 30 images and it takes about
two minutes including the introduction and the input form. Each
level has a different pace at which the images fall down the
screen. Thus, the classification per image takes between 2.6 and 4
seconds.</p>
      <p>In total, 1440 pictures were shown to the subjects within the game
(30 pictures per category “house” and “mouse” times 24 subjects).
Only in 42 cases, the image passed without classification,
resulting in a total of 1398 classified images. 1162 images were
correctly classified (83%). Thus, only 236 images were
incorrectly classified. Overall we had 897 true-positive
classifications, 128 false-negative and 108 false-positive
classification, which leads to a precision of 89% and a recall of
88% over all users. For the group with the better results (the
group without visual aid, see next section) we obtain a precision
of 92%.</p>
    </sec>
    <sec id="sec-8">
      <title>5.3 Visual aid</title>
      <p>The subjective perception of the users in group B (the group that
was provided with visual aids) was that the visual aids supported
them in the classification tasks. The question regarding the visual
highlighting of the active areas was rated as very helpful with an
average of 4.67 (SD: 0.49). The subjects also answered that
displaying the gaze point was very helpful and scored this
question on average with 4.5 (SD: 0.67). However, to our
surprise, the following statistical analysis of the data shows that
group B with the visual aids misclassified significantly more
images than group A did.</p>
      <p>Group A correctly classified 296 images for category “house”
whereas group B correctly assigned 264 images for this category.
Regarding the category “mouse”, in total 317 correct assignments
were made by group A whereas group B correctly assigned 287
images. Regarding the misclassified images, group A
misclassified 59 images for category “house”, whereas group B
wrongly assigned the image category in 81 cases. For the
category “mouse”, the number of incorrect assignments is 37 for
group A and 57 for group B. We compared the values for correct
and incorrect assignments for group A and B in a 2x2 Chi-square
test for both categories. The differences are significant regarding
a significance level of α= 0.05 with χ2 (1, N = 700) = 5.14, p =
0.023 for the category “house” and χ2 (1, N = 698) = 5.6, p =
0.018 for “mouse”. In group B 31, images passed without
classification, in group A only 11 images were not classified.
These results indicate that the visual support is not improving the
classification. Despite the good impression of the visual support
that group B expressed, the following question might be an
indicator that this group felt less comfortable with the eye
trackerbased interaction than group A did: we asked the subjects to state
if they preferred a mouse-based interaction instead of the eye
tracker-based one. On average, subjects of group A scored this
question with 2.17 (SD: 1.47) and group B with 3.25 (SD: 1.48).
Using a two-tailed Mann-Whitney U-Test, a weakly significant
difference was determined stating a preference of group B over
group A to use the mouse to play EyeGrab (U = 43, z = -1.719, n1
= n2 = 12, p = .085).</p>
    </sec>
    <sec id="sec-9">
      <title>6. FUTURE WORK</title>
      <p>For the current version of our EyeGrab game, we have used
preclassified images in order to verify the classification performance
of the subjects. We plan to use images without annotations in
future extensions of the game.</p>
      <p>Also the detailed analysis of the gaze information will be
performed in a next step. In a small sample of 5 images classified
by one user, we received 231 gaze points on the images. An
example of a gaze path visualization is shown in Figure 5. We
expect a sufficient number of fixations and correct classification
to allow a detailed analysis.</p>
      <p>We received 897 ratings for the shown images. 556 of them were
positive. The quality of these ratings has to be investigated in a
future experiment, e.g., by repeating the ratings in another context
with the same users or by using a ground truth set with images,
often liked by a big number of other users. However it has to be
clear, that a subjective rating can never be “correct” or not. These
investigations can only provide an indication of the worth of the
rating. Overall, this detailed analysis will allow us to identify the
regions that correspond to the category given in the EyeGrab
game. Such region-based annotations will allow for a better
retrieval of the images in the future.</p>
    </sec>
    <sec id="sec-10">
      <title>7. SUMMARY</title>
      <p>We have introduced the gaze-based game with a purpose
EyeGrab to classify images using an eye tracker. We have shown
that the game has the potential to entertain the players and that the
classification results are good enough to advance beyond the gaze
analysis. This analysis is the first step in the direction of
extending image context information with information gained in
an eye tracking game. The next step will be the analysis and
evaluation of the gained information and to use it for improving
image search tasks.
8. REFERENCES
[1] Hornof, A.J. and Cavender, A. 2005. EyeDraw: enabling
children with severe motor impairments to draw with their
eyes. Proceedings of the SIGCHI conference on Human
factors in computing systems, 170.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Jacob</surname>
            ,
            <given-names>R.J.K.</given-names>
          </string-name>
          <year>1993</year>
          .
          <article-title>Eye movement-based human-computer interaction techniques: Toward non-command interfaces</article-title>
          .
          <source>Advances in human-computer interaction</source>
          ,
          <fpage>151</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Von</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            and
            <surname>Dabbish</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2004</year>
          .
          <article-title>Labeling images with a computer game</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          ,
          <volume>319</volume>
          -
          <fpage>326</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Von</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            and
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>Peekaboom : A Game for Locating Objects in Images</article-title>
          .
          <source>SIGCHI conference on Human Factors in computing systems</source>
          ,
          <volume>55</volume>
          -
          <fpage>64</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zha</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Improving eye cursor's stability for eye pointing tasks</article-title>
          .
          <source>SIGCHI conference on Human factors in computing systems</source>
          ,
          <volume>525</volume>
          -
          <fpage>534</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kozma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kaski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>GaZIR: gazebased zooming interface for image retrieval</article-title>
          .
          <source>In Proceedings of the 2009 international conference on Multimodal interfaces.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. San</given-names>
            <surname>Agustin</surname>
          </string-name>
          , H. Skovsgaard,
          <string-name>
            <given-names>J.P.</given-names>
            <surname>Hansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.W.</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>Low-cost gaze interaction: ready to deliver the promises</article-title>
          .
          <source>In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems</source>
          , pages
          <fpage>4453</fpage>
          -
          <lpage>4458</lpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Walber</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Identifying Objects in Images from Analyzing the Users' Gaze Movements for Provided Tags</article-title>
          .
          <source>Advances in Multimedia Modeling</source>
          ,
          <fpage>138</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Carson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellerstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Malik</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Blobworld: A system for region-based image indexing and retrieval</article-title>
          .
          <source>Visual Information and Information Systems.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>