A Saliency Model Predicts Fixations in Web Interfaces
                       Jeremiah D. Still                                                               Christopher M. Masciocchi
              Missouri Western State University                                                           Iowa State University
                 Department of Psychology                                                               Department of Psychology
                jstill2@missouriwestern.edu                                                              cmascioc@iastate.edu


ABSTRACT                                                                                   communicating to users where they ought to start their
User interfaces are visually rich and complex.                                             visual search [16]. In order to be considered salient, a
Consequently, it is difficult for designers to predict which                               feature must be visually unique relative to its surroundings.
locations will be attended to first within a display.                                      For example, text that is underlined amongst non-
Designers currently depend on eye tracking data to                                         underlined text “pulls” the reader’s attention to it. However,
determine fixated locations, which are naturally associated                                many interfaces, like web pages, are rich with visual media,
with the allocation of attention. A computational saliency                                 such as text, pictures, logos and bullets, making the
model can make predictions about where individuals are                                     determination of salient features a complicated task. Given
likely to fixate. Thus, we propose that the saliency model                                 this complexity, designers are often left making best
may facilitate successful interface development during the                                 guesses about which spatial regions are salient within an
iterative design process by providing information about an                                 interface. Previous research on visual search in web pages
interface’s stimulus-driven properties. To test its predictive                             defines entry points as regions within a page where users
power, the saliency model was used to render 50 web page                                   typically begin their visual search. In this article, we will
screenshots; eye tracking data were gathered from                                          argue that these entry points are heavily influenced by
participants on the same images. We found that the saliency                                visual saliency, that is, users will often begin searching web
model predicted fixated locations within web page                                          pages at the location of highest saliency. In related research
interfaces. Thus, using computational models to determine                                  examining cognitive processing these implicit and low level
regions high in visual saliency during web page                                            cues that guide a viewer’s visual search are referred to as
development may be a cost effective alternative to eye                                     stimulus-driven properties – certain characteristics of the
tracking.                                                                                  stimulus quickly “drive”, or direct attention to certain
                                                                                           locations over others. Currently, no consensus has been
Author Keywords                                                                            reached as to which visual characteristics, or stimulus-
Saliency, Interface Development, Design, Model                                             driven properties, make for effective entry points.

ACM Classification Keywords                                                                Measuring Overt Attention through Eye Tracking
H.1.2 User/Machine Systems; I.2.10 Vision and Scene                                        Given the over abundance of visual information in our
Understanding                                                                              environments and our working memory limitations,
INTRODUCTION                                                                               attention must be selective, only allowing a limited amount
                                                                                           of information into consciousness, for our cognitive system
Saliency, Search and Design                                                                to function properly [8]. It has been suggested that the
Some visual designs guide users to the locations of                                        programming of eye movements has a direct and natural
important information, while others mislead users. Visual                                  relationship with visual attention in that attention is often
saliency, inherent in a complex interface, cues users to                                   directed to whichever item is fixated [10]. Only information
certain spatial regions over others. If employed correctly by                              that falls directly on the fovea during a fixation is encoded
designers, salient cues may reduce information search times                                with high resolution and only a limited amount of this high
and facilitate task completion [cf. 18] by implicitly                                      resolution information is processed, while the rest falls into
                                                                                           rapid decay [see 4]. Thus, it is critical that users fixate on
                                                                                           relevant visual information or that content will not reach
 Permission to make digital or hard copies of all or part of this work for                 users' awareness.
 personal or classroom use is granted without fee provided that copies are
Pre-proceedings   of the 5th International
 not made or distributed                   Workshop
                               for profit or          on Model
                                             commercial         Driven and
                                                           advantage     Development of
                                                                             that copies   It is no surprise then, that designers often monitor eye
Advanced User Interfaces (MDDAUI 2010): Bridging between User Experience and
 bear this notice and the full citation on the first page. To copy otherwise,
UI Engineering, organized at the 28th ACM Conference on Human Factors in                   movements to evaluate a web page’s saliency, or entry
 or republish,
Computing         to post
             Systems  (CHI on    servers
                             2010),       or Georgia,
                                    Atlanta, to redistribute   to 10,
                                                      USA, April   lists, requires prior
                                                                       2010.               points. Eye tracking systems allow designers to test whether
 specific permission and/or a fee.
 MDDAUI© 2010,        April  10, 2010,papers
                                        Atlanta,  Georgia,   USA. Copying permitted        their web pages actually guide users' fixations to important
Copyright     2010 for  the individual        by the papers' authors.
 Copyright
for           2009
    private and      ACMpurposes.
                academic      978-1-60558-246-7/09/04...$5.00.
                                      Re-publication of material from this volume          locations. However, eye tracking has a number of
requires permission by the copyright owners. This volume is published by its editors.      recognized costs. Eye tracking systems are often expensive,


                                                                                                                                               25
not easily accessible, time consuming to employ and they          image's saliency map provides predictions of where spatial
gradually lose calibration [1, 2, 7, 15].                         attention should be deployed [for detailed explanations
                                                                  refer to 6, 13]. In essence, the model makes predictions
Stimulus and Goal Driven Searches
                                                                  about which regions in an image have the most and least
In this article we investigate the influence of stimulus-         likely chance to be attended based purely on stimulus-
driven saliency on attention within the context of a web          driven properties. The saliency model is available for
page. Stimulus-driven saliency guides attention quickly and       download from <SaliencyToolbox.net> as a collection of
without explicit intention, thus some might question its role     Matlab functions and scripts [17].
during a purposeful search on a web page. There is ample
                                                                  Testing a Saliency Model within Web Pages
evidence to suggest that goals do influence the guidance of
attention. For example, web page eye tracking research has        Designers recognize the need to predict and identify where
shown that changing the task (or goal) during a search, or        users’ attention will be guided on a web page. For example,
seeking navigational or informational indicators, changes         it is well known that one should avoid using poor designs
observers’ fixation patterns [3]. Additional research has         that increase the likelihood of users missing important
shown that, given enough time, expectations can cause a           interface features such as branding, navigational or
consistent pattern of fixations – F-shaped pattern or reading     informational symbols. But, using an eye tracking system to
patterns (e.g., left-right/top-bottom) [14]. However, these       monitor guidance of attention – as is traditional – can be
goal-driven effects interact with stimulus-driven effects,        expensive, difficult to employ and time consuming within
making the stimulus-driven influences more difficult to           the context of a practical iterative design process. Thus, we
examine [cf. 11]. Also, it is often the case that only a few      investigated the utility of a computational saliency model in
seconds are spent on a web page (even with a goal in mind)        predicting the guidance of attention in web page
making the understanding of stimulus-driven processing,           screenshots. This new method is benchmarked and
which is believed to influence attention very rapidly,            compared to another set of data in which participants’ eye
critical. For instance, when searching for information            movements were tracked while they viewed the same web
observers often only skim through approximately 18 words,         page screenshots.
and spend 4 to 9 seconds, per web page [2, 12]. One way to
                                                                  METHOD
investigate the pure influence of stimulus-driven guidance
is to use a computational saliency model designed to make         Participants
predictions about what properties or features of a web page       The data from eight undergraduate participants are
attention ought to select within complex media, or scenes.        examined. All participants reported extensive web site
Predicting Fixations through a Saliency Model                     experience.
Visually salient items often draw observers' attention. To        Stimuli and Equipment
better understand the influences of saliency, or stimulus-        The images were 50 screenshots of various web pages.
driven selection, on attention, Koch and Ullman (1985)            Each participant saw each screenshot only once.
developed a model to compute an image's visual saliency
without any semantic input (i.e., meaning of objects). Their      Participants' eye movements were recorded by an ASL eye
model is based on the assumption that eye movement                tracker with a sampling rate of 120 Hz. Screenshots were
programming is driven by local image contrast leading to          shown on a Samsung LCD monitor, which had a viewing
logical serial searches through complex spatial                   area of approximately 38.0 cm × 30.0 cm. A chin rest
environments. These serial searches are guided by low level       maintained a viewing distance of approximately 80 cm.
primitives extracted from a scene. The saliency model was         Images subtended approximately 26.70 x 21.20 visual angle.
developed under the pretense that low level visual features       Procedure
(i.e., color, light intensity, orientation) are processed pre-
attentively in humans and, in turn, rapidly influence overt       Participants first read and signed an informed consent
attention. Thus, the underlying assumption is that visual         document, and were then seated in front of the monitor with
saliency is used to guide the fovea to unique areas within a      their chin in the chin rest. The experiment began and
scene that might provide the most efficient processing [5].       concluded with a 9-point calibration sequence to calibrate
                                                                  the eye tracker and estimate the amount of tracking error.
The computational model is implemented on a computer
using digital pictures as stimuli to produce a pre-attentional    Participants were told that they would view a series of web
or “saliency” map [9]. To create a saliency map, the model        page screenshots, and that they should, "look around the
receives input from pixels within a digital picture. Then, it     image like you normally would if you were surfing the
extracts three feature channels – color, intensity, orientation   internet." A fixation cross was presented at the center of the
– at eight different spatial scales. These three channels are     screen to signal the beginning of a trial. After a delay of
normalized and differences of center-surround are                 approximately 1 second, a randomly selected web page
calculated for each separate channel. The separate channels       screenshot was presented for 5 seconds. The fixation cross
are additively combined to form a single saliency map. An         then reappeared to signal the beginning of the next trial.


                                                                                                                      26
The experiment took approximately 15 minutes to                 extracted at the fixated locations from screenshot 1. The
complete.                                                       saliency values of all other screenshots were extracted at
                                                                the location of the first ten fixations for all subjects for each
                                                                screenshot. These values formed the Shuffled Distribution.
                                                                The method used to create this distribution controls for
                                                                spatial biases that may inflate correlations between
                                                                fixations and salient regions. If the values of the Shuffled
                                                                Distribution are larger than those of the Observed
                                                                Distribution, it would indicate that participants fixated on
                                                                regions that are lower in saliency than what is expected by
                                                                chance. If, however, the values of the Observed
                                                                Distribution are larger than those in the Shuffled
                                                                Distribution, it would indicate that participants fixated
                                                                regions that are higher in saliency than what is expected by
                                                                chance.
                                                                Figure 2 shows the means for the Observed and Shuffled
                                                                Distributions of the first ten fixations for each screenshot.
                                                                An analysis of variance was conducted with fixation
                                                                number (1-10) as a within-subjects variable and distribution
  Figure 1. Two examples of web page screenshots and their      (observed, shuffled) as a between-subjects variable, to
               corresponding saliency maps.                     determine whether any differences between the
                                                                distributions varied as a function of fixation number. The
Creation of saliency maps                                       main effect of fixation number was reliable, F(9, 882) =
Saliency maps were created using the algorithms developed       6.39, MSE = 19.03, p < .001. Pairwise comparisons
by Itti, Koch, and Niebur (1998). The model was run on          revealed that the values for the first fixation were higher
each image individually and the output was normalized by        than all other values, and that the values of the tenth
dividing all values by the maximum value for that map, and      fixation were lower than all other values. This indicates that
multiplying all values by 100. To simplify data analysis, the   early fixations tend to occur at regions of higher salience
size of the saliency maps was increased to be identical to      than those of later fixations. More importantly, the main
the size of the screenshots (1024 x 768 pixels). As             effect of distribution was also reliable, F(1, 98) = 4.86,
described in the Introduction, these saliency maps are 2-D      MSE = 397.95, p < .05, indicating that the values of
representations of areas in the screenshot that show the        Observed Distribution were larger than those of the
relative saliency of locations in the image. Figure 1 shows     Shuffled Distribution. This difference confirms that
an example of two web page screenshots and their                participants fixated regions higher in saliency than would
corresponding saliency maps. Low values (dark areas in the      be expected by chance, showing that the saliency model is
image) indicate regions of the image that are low in            effective at predicting fixations. Distribution x Fixation
saliency, while high values (light areas in the image)          number was not significant, F < 1.
indicate regions high in saliency.
RESULTS
We used a similar technique to Parkhurst, Law, and Niebur
(2002) to determine whether salient regions in web pages
were fixated more often than would be expected by chance.
Specifically, the values of the saliency map at the location
of each participant's first ten fixations were extracted. For
example, the x, y coordinates of the first fixation for each
participant was determined for every screenshot and the
value at the same location in the corresponding saliency
map was extracted. This process was repeated for fixations
two through ten. These values formed the Observed
Distribution of participant responses (Figure 2).
To determine the likelihood that salient regions would be
fixated by chance, we repeated the process used to find the
Observed Distribution after rearranging the fixations and         Figure 2. Mean saliency values for the observed ('X') and
                                                                     shuffled ('o') distributions for the first ten fixations.
saliency maps for all screenshots. For example, the values
from the saliency map for screenshots 2 to 50 were


                                                                                                                       27
DISCUSSION                                                          Proceedings of the SIGCHI conference on human
Eye tracking is a commonly employed method for                      factors in computing systems (2007), 407-416.
examining the guidance of overt attention within interfaces     4. Egeth, H. E. & Yantis, S. Visual attention: Control
(e.g., web pages). However, it has several drawbacks. We            representation and time course. Annual Review of
propose that a web page’s saliency, stimulus-driven                 Psychology (1997), 48, 269-297.
properties, may be revealed through the use of a                5. Itti, L. & Koch, C. A saliency-based search mechanism
computational saliency model. Therefore, we compared the            for overt and covert shifts of visual attention. Vision
performance of the model to eye tracking data collected             Research, 40(10-12) (2000), 1489-1506.
from human observers. We were able to demonstrate that,         6. Itti, L., Koch, C. & Niebur, E. A model of saliency-
indeed, the saliency model predicts the deployment of overt         based fast visual attention for rapid scene analysis.
attention within a web page interface.                              IEEE Transactions on Pattern Analysis and Machine
                                                                    Intelligence, 20(11): 1254-1259, November 1998.
Previous research has shown a modest correlation between        7. Johansen, S. A. & Hansen, J. P. Do we need eye
saliency and eye fixations in natural and artificial scenes         trackers to tell where people look? Proceedings of
[13]. We have extended this research by showing that even           Computer-Human Interaction extended abstracts on
in web pages, which may contain more semantic                       human factors in computing systems (2006), 923-928.
information (e.g., meaningful: text or images) than nature      8. Johnston, W. A. & Dark, V. J. Selective attention.
scenes, fixations are correlated with saliency. Specifically,       Annual Review of Psychology(1986), 37, 43-75.
participants were more likely to fixate on regions in the web   9. Koch, C. & Ullman, S. Shifts in selective visual
pages with a higher saliency value than predicted by                attention: Towards the underlying neural circuitry.
chance.                                                             Human Neurobiology (1985), 4, 219-227.
                                                                10. Kowler, E., Anderson, E., Dosher, B. & Blaser, E. The
Our data suggest that saliency maps alone can provide
                                                                    role of attention in the programming of saccades.
reasonable predictions of overt attention. In addition,
                                                                    Vision Research (1995), 35, 1897-1916.
saliency maps can be generated quickly, and require no
                                                                11. McCarthy, J. D., Sasse, M. A. & Riegelsberger, J.
additional equipment or participants. Even with these
                                                                    (2003). Can I have the menu please? An eyetracking
positive attributes, one may be hesitant to abandon eye
                                                                    study of design conventions. Proceedings of Human-
tracking altogether. Our recommendation to designers is to
                                                                    Computer Interaction, 401-414.
choose the method most appropriate for your project given
                                                                12. Nielsen, J. (2008, May). How little do users read?
your constraints and needs. It is often the case that
                                                                    Retrieved May 12, 2009 from
developing effective interfaces requires many levels of
                                                                    http://www.useit.com/alertbox/percent-text-read.html.
analysis. For example, during the early formative testing
                                                                13. Parkhurst, D., Law, K. & Niebur, E. Modeling the role
process it would be appropriate to begin by using the
                                                                    of salience in the allocation of overt visual attention.
saliency model to ensure that regions identified as being
                                                                    Vision Research (2002), 42, 107-123.
important are also visually salient. Then, during the ‘final’
                                                                14. Rayner, K. Eye movements in reading and information
prototype development stage, employ the eye tracking
                                                                    processing: 20 years of research. Psychological
method to verify that your participants are actually looking
                                                                    Bulletin (1998), 124(3), 372-422.
at the critical elements in the design.
                                                                15. Tarasewich, P., Pomplun, M., Fillion, S. & Broberg, D.
                                                                    The enhanced restricted focus viewer. International
REFERENCES                                                          Journal of Human-Computer Interaction (2005), 19(1),
                                                                    35-54.
1.   Arroyo, E., Selker, T. & Wei, W. Usability tool for        16. Treisman, A. M. Perceptual grouping and attention in
     analysis of web designs using mouse tracks. Computer-          visual search for features and for objects. Journal of
     Human Interaction extended abstracts on human                  Experimental Psychology: Human Perception and
     factors in computing systems (2006), 484-489.                  Performance (1982), 8, 194-214.
2.   Chen, M., Anderson, J. R. & Sohn, M. What can a            17. Walther, D. & Koch, C. Modeling attention to salient
     mouse cursor tell us more?: Correlation of eye/mouse           proto-objects. Neural Networks (2006), 19, 1395-1407.
     movements on web browsing. Computer-Human                  18. Wolfe, J. M. Guided Search 4.0: Current Progress with
     Interactions extended abstracts on human factors in            a model of visual search. In W. Gray (Ed.), Integrated
     computing systems (2001), 281-282.                             Models of Cognitive Systems (pp. 99-119). New York:
3.   Cutrell, E. & Guan, Z. What are you looking for? An            Oxford, 2007.
     eye-tracking study of information usage in web search.


                                                                                                                   28