=Paper= {{Paper |id=None |storemode=property |title=A Saliency Model Predicts Fixations in Web Interfaces |pdfUrl=https://ceur-ws.org/Vol-617/MDDAUI2010_Paper07.pdf |volume=Vol-617 }} ==A Saliency Model Predicts Fixations in Web Interfaces== https://ceur-ws.org/Vol-617/MDDAUI2010_Paper07.pdf

A Saliency Model Predicts Fixations in Web Interfaces
Jeremiah D. Still Christopher M. Masciocchi
Missouri Western State University Iowa State University
Department of Psychology Department of Psychology
jstill2@missouriwestern.edu cmascioc@iastate.edu

ABSTRACT communicating to users where they ought to start their
User interfaces are visually rich and complex. visual search [16]. In order to be considered salient, a
Consequently, it is difficult for designers to predict which feature must be visually unique relative to its surroundings.
locations will be attended to first within a display. For example, text that is underlined amongst non-
Designers currently depend on eye tracking data to underlined text “pulls” the reader’s attention to it. However,
determine fixated locations, which are naturally associated many interfaces, like web pages, are rich with visual media,
with the allocation of attention. A computational saliency such as text, pictures, logos and bullets, making the
model can make predictions about where individuals are determination of salient features a complicated task. Given
likely to fixate. Thus, we propose that the saliency model this complexity, designers are often left making best
may facilitate successful interface development during the guesses about which spatial regions are salient within an
iterative design process by providing information about an interface. Previous research on visual search in web pages
interface’s stimulus-driven properties. To test its predictive defines entry points as regions within a page where users
power, the saliency model was used to render 50 web page typically begin their visual search. In this article, we will
screenshots; eye tracking data were gathered from argue that these entry points are heavily influenced by
participants on the same images. We found that the saliency visual saliency, that is, users will often begin searching web
model predicted fixated locations within web page pages at the location of highest saliency. In related research
interfaces. Thus, using computational models to determine examining cognitive processing these implicit and low level
regions high in visual saliency during web page cues that guide a viewer’s visual search are referred to as
development may be a cost effective alternative to eye stimulus-driven properties – certain characteristics of the
tracking. stimulus quickly “drive”, or direct attention to certain
locations over others. Currently, no consensus has been
Author Keywords reached as to which visual characteristics, or stimulus-
Saliency, Interface Development, Design, Model driven properties, make for effective entry points.

ACM Classification Keywords Measuring Overt Attention through Eye Tracking
H.1.2 User/Machine Systems; I.2.10 Vision and Scene Given the over abundance of visual information in our
Understanding environments and our working memory limitations,
INTRODUCTION attention must be selective, only allowing a limited amount
of information into consciousness, for our cognitive system
Saliency, Search and Design to function properly [8]. It has been suggested that the
Some visual designs guide users to the locations of programming of eye movements has a direct and natural
important information, while others mislead users. Visual relationship with visual attention in that attention is often
saliency, inherent in a complex interface, cues users to directed to whichever item is fixated [10]. Only information
certain spatial regions over others. If employed correctly by that falls directly on the fovea during a fixation is encoded
designers, salient cues may reduce information search times with high resolution and only a limited amount of this high
and facilitate task completion [cf. 18] by implicitly resolution information is processed, while the rest falls into
rapid decay [see 4]. Thus, it is critical that users fixate on
relevant visual information or that content will not reach
Permission to make digital or hard copies of all or part of this work for users' awareness.
personal or classroom use is granted without fee provided that copies are
Pre-proceedings of the 5th International
not made or distributed Workshop
for profit or on Model
commercial Driven and
advantage Development of
that copies It is no surprise then, that designers often monitor eye
Advanced User Interfaces (MDDAUI 2010): Bridging between User Experience and
bear this notice and the full citation on the first page. To copy otherwise,
UI Engineering, organized at the 28th ACM Conference on Human Factors in movements to evaluate a web page’s saliency, or entry
or republish,
Computing to post
Systems (CHI on servers
2010), or Georgia,
Atlanta, to redistribute to 10,
USA, April lists, requires prior
2010. points. Eye tracking systems allow designers to test whether
specific permission and/or a fee.
MDDAUI© 2010, April 10, 2010,papers
Atlanta, Georgia, USA. Copying permitted their web pages actually guide users' fixations to important
Copyright 2010 for the individual by the papers' authors.
Copyright
for 2009
private and ACMpurposes.
academic 978-1-60558-246-7/09/04...$5.00.
Re-publication of material from this volume locations. However, eye tracking has a number of
requires permission by the copyright owners. This volume is published by its editors. recognized costs. Eye tracking systems are often expensive,

25
not easily accessible, time consuming to employ and they image's saliency map provides predictions of where spatial
gradually lose calibration [1, 2, 7, 15]. attention should be deployed [for detailed explanations
refer to 6, 13]. In essence, the model makes predictions
Stimulus and Goal Driven Searches
about which regions in an image have the most and least
In this article we investigate the influence of stimulus- likely chance to be attended based purely on stimulus-
driven saliency on attention within the context of a web driven properties. The saliency model is available for
page. Stimulus-driven saliency guides attention quickly and download from as a collection of
without explicit intention, thus some might question its role Matlab functions and scripts [17].
during a purposeful search on a web page. There is ample
Testing a Saliency Model within Web Pages
evidence to suggest that goals do influence the guidance of
attention. For example, web page eye tracking research has Designers recognize the need to predict and identify where
shown that changing the task (or goal) during a search, or users’ attention will be guided on a web page. For example,
seeking navigational or informational indicators, changes it is well known that one should avoid using poor designs
observers’ fixation patterns [3]. Additional research has that increase the likelihood of users missing important
shown that, given enough time, expectations can cause a interface features such as branding, navigational or
consistent pattern of fixations – F-shaped pattern or reading informational symbols. But, using an eye tracking system to
patterns (e.g., left-right/top-bottom) [14]. However, these monitor guidance of attention – as is traditional – can be
goal-driven effects interact with stimulus-driven effects, expensive, difficult to employ and time consuming within
making the stimulus-driven influences more difficult to the context of a practical iterative design process. Thus, we
examine [cf. 11]. Also, it is often the case that only a few investigated the utility of a computational saliency model in
seconds are spent on a web page (even with a goal in mind) predicting the guidance of attention in web page
making the understanding of stimulus-driven processing, screenshots. This new method is benchmarked and
which is believed to influence attention very rapidly, compared to another set of data in which participants’ eye
critical. For instance, when searching for information movements were tracked while they viewed the same web
observers often only skim through approximately 18 words, page screenshots.
and spend 4 to 9 seconds, per web page [2, 12]. One way to
METHOD
investigate the pure influence of stimulus-driven guidance
is to use a computational saliency model designed to make Participants
predictions about what properties or features of a web page The data from eight undergraduate participants are
attention ought to select within complex media, or scenes. examined. All participants reported extensive web site
Predicting Fixations through a Saliency Model experience.
Visually salient items often draw observers' attention. To Stimuli and Equipment
better understand the influences of saliency, or stimulus- The images were 50 screenshots of various web pages.
driven selection, on attention, Koch and Ullman (1985) Each participant saw each screenshot only once.
developed a model to compute an image's visual saliency
without any semantic input (i.e., meaning of objects). Their Participants' eye movements were recorded by an ASL eye
model is based on the assumption that eye movement tracker with a sampling rate of 120 Hz. Screenshots were
programming is driven by local image contrast leading to shown on a Samsung LCD monitor, which had a viewing
logical serial searches through complex spatial area of approximately 38.0 cm × 30.0 cm. A chin rest
environments. These serial searches are guided by low level maintained a viewing distance of approximately 80 cm.
primitives extracted from a scene. The saliency model was Images subtended approximately 26.70 x 21.20 visual angle.
developed under the pretense that low level visual features Procedure
(i.e., color, light intensity, orientation) are processed pre-
attentively in humans and, in turn, rapidly influence overt Participants first read and signed an informed consent
attention. Thus, the underlying assumption is that visual document, and were then seated in front of the monitor with
saliency is used to guide the fovea to unique areas within a their chin in the chin rest. The experiment began and
scene that might provide the most efficient processing [5]. concluded with a 9-point calibration sequence to calibrate
the eye tracker and estimate the amount of tracking error.
The computational model is implemented on a computer
using digital pictures as stimuli to produce a pre-attentional Participants were told that they would view a series of web
or “saliency” map [9]. To create a saliency map, the model page screenshots, and that they should, "look around the
receives input from pixels within a digital picture. Then, it image like you normally would if you were surfing the
extracts three feature channels – color, intensity, orientation internet." A fixation cross was presented at the center of the
– at eight different spatial scales. These three channels are screen to signal the beginning of a trial. After a delay of
normalized and differences of center-surround are approximately 1 second, a randomly selected web page
calculated for each separate channel. The separate channels screenshot was presented for 5 seconds. The fixation cross
are additively combined to form a single saliency map. An then reappeared to signal the beginning of the next trial.

26
The experiment took approximately 15 minutes to extracted at the fixated locations from screenshot 1. The
complete. saliency values of all other screenshots were extracted at
the location of the first ten fixations for all subjects for each
screenshot. These values formed the Shuffled Distribution.
The method used to create this distribution controls for
spatial biases that may inflate correlations between
fixations and salient regions. If the values of the Shuffled
Distribution are larger than those of the Observed
Distribution, it would indicate that participants fixated on
regions that are lower in saliency than what is expected by
chance. If, however, the values of the Observed
Distribution are larger than those in the Shuffled
Distribution, it would indicate that participants fixated
regions that are higher in saliency than what is expected by
chance.
Figure 2 shows the means for the Observed and Shuffled
Distributions of the first ten fixations for each screenshot.
An analysis of variance was conducted with fixation
number (1-10) as a within-subjects variable and distribution
Figure 1. Two examples of web page screenshots and their (observed, shuffled) as a between-subjects variable, to
corresponding saliency maps. determine whether any differences between the
distributions varied as a function of fixation number. The
Creation of saliency maps main effect of fixation number was reliable, F(9, 882) =
Saliency maps were created using the algorithms developed 6.39, MSE = 19.03, p < .001. Pairwise comparisons
by Itti, Koch, and Niebur (1998). The model was run on revealed that the values for the first fixation were higher
each image individually and the output was normalized by than all other values, and that the values of the tenth
dividing all values by the maximum value for that map, and fixation were lower than all other values. This indicates that
multiplying all values by 100. To simplify data analysis, the early fixations tend to occur at regions of higher salience
size of the saliency maps was increased to be identical to than those of later fixations. More importantly, the main
the size of the screenshots (1024 x 768 pixels). As effect of distribution was also reliable, F(1, 98) = 4.86,
described in the Introduction, these saliency maps are 2-D MSE = 397.95, p < .05, indicating that the values of
representations of areas in the screenshot that show the Observed Distribution were larger than those of the
relative saliency of locations in the image. Figure 1 shows Shuffled Distribution. This difference confirms that
an example of two web page screenshots and their participants fixated regions higher in saliency than would
corresponding saliency maps. Low values (dark areas in the be expected by chance, showing that the saliency model is
image) indicate regions of the image that are low in effective at predicting fixations. Distribution x Fixation
saliency, while high values (light areas in the image) number was not significant, F < 1.
indicate regions high in saliency.
RESULTS
We used a similar technique to Parkhurst, Law, and Niebur
(2002) to determine whether salient regions in web pages
were fixated more often than would be expected by chance.
Specifically, the values of the saliency map at the location
of each participant's first ten fixations were extracted. For
example, the x, y coordinates of the first fixation for each
participant was determined for every screenshot and the
value at the same location in the corresponding saliency
map was extracted. This process was repeated for fixations
two through ten. These values formed the Observed
Distribution of participant responses (Figure 2).
To determine the likelihood that salient regions would be
fixated by chance, we repeated the process used to find the
Observed Distribution after rearranging the fixations and Figure 2. Mean saliency values for the observed ('X') and
shuffled ('o') distributions for the first ten fixations.
saliency maps for all screenshots. For example, the values
from the saliency map for screenshots 2 to 50 were

27
DISCUSSION Proceedings of the SIGCHI conference on human
Eye tracking is a commonly employed method for factors in computing systems (2007), 407-416.
examining the guidance of overt attention within interfaces 4. Egeth, H. E. & Yantis, S. Visual attention: Control
(e.g., web pages). However, it has several drawbacks. We representation and time course. Annual Review of
propose that a web page’s saliency, stimulus-driven Psychology (1997), 48, 269-297.
properties, may be revealed through the use of a 5. Itti, L. & Koch, C. A saliency-based search mechanism
computational saliency model. Therefore, we compared the for overt and covert shifts of visual attention. Vision
performance of the model to eye tracking data collected Research, 40(10-12) (2000), 1489-1506.
from human observers. We were able to demonstrate that, 6. Itti, L., Koch, C. & Niebur, E. A model of saliency-
indeed, the saliency model predicts the deployment of overt based fast visual attention for rapid scene analysis.
attention within a web page interface. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 20(11): 1254-1259, November 1998.
Previous research has shown a modest correlation between 7. Johansen, S. A. & Hansen, J. P. Do we need eye
saliency and eye fixations in natural and artificial scenes trackers to tell where people look? Proceedings of
[13]. We have extended this research by showing that even Computer-Human Interaction extended abstracts on
in web pages, which may contain more semantic human factors in computing systems (2006), 923-928.
information (e.g., meaningful: text or images) than nature 8. Johnston, W. A. & Dark, V. J. Selective attention.
scenes, fixations are correlated with saliency. Specifically, Annual Review of Psychology(1986), 37, 43-75.
participants were more likely to fixate on regions in the web 9. Koch, C. & Ullman, S. Shifts in selective visual
pages with a higher saliency value than predicted by attention: Towards the underlying neural circuitry.
chance. Human Neurobiology (1985), 4, 219-227.
10. Kowler, E., Anderson, E., Dosher, B. & Blaser, E. The
Our data suggest that saliency maps alone can provide
role of attention in the programming of saccades.
reasonable predictions of overt attention. In addition,
Vision Research (1995), 35, 1897-1916.
saliency maps can be generated quickly, and require no
11. McCarthy, J. D., Sasse, M. A. & Riegelsberger, J.
additional equipment or participants. Even with these
(2003). Can I have the menu please? An eyetracking
positive attributes, one may be hesitant to abandon eye
study of design conventions. Proceedings of Human-
tracking altogether. Our recommendation to designers is to
Computer Interaction, 401-414.
choose the method most appropriate for your project given
12. Nielsen, J. (2008, May). How little do users read?
your constraints and needs. It is often the case that
Retrieved May 12, 2009 from
developing effective interfaces requires many levels of
http://www.useit.com/alertbox/percent-text-read.html.
analysis. For example, during the early formative testing
13. Parkhurst, D., Law, K. & Niebur, E. Modeling the role
process it would be appropriate to begin by using the
of salience in the allocation of overt visual attention.
saliency model to ensure that regions identified as being
Vision Research (2002), 42, 107-123.
important are also visually salient. Then, during the ‘final’
14. Rayner, K. Eye movements in reading and information
prototype development stage, employ the eye tracking
processing: 20 years of research. Psychological
method to verify that your participants are actually looking
Bulletin (1998), 124(3), 372-422.
at the critical elements in the design.
15. Tarasewich, P., Pomplun, M., Fillion, S. & Broberg, D.
The enhanced restricted focus viewer. International
REFERENCES Journal of Human-Computer Interaction (2005), 19(1),
35-54.
1. Arroyo, E., Selker, T. & Wei, W. Usability tool for 16. Treisman, A. M. Perceptual grouping and attention in
analysis of web designs using mouse tracks. Computer- visual search for features and for objects. Journal of
Human Interaction extended abstracts on human Experimental Psychology: Human Perception and
factors in computing systems (2006), 484-489. Performance (1982), 8, 194-214.
2. Chen, M., Anderson, J. R. & Sohn, M. What can a 17. Walther, D. & Koch, C. Modeling attention to salient
mouse cursor tell us more?: Correlation of eye/mouse proto-objects. Neural Networks (2006), 19, 1395-1407.
movements on web browsing. Computer-Human 18. Wolfe, J. M. Guided Search 4.0: Current Progress with
Interactions extended abstracts on human factors in a model of visual search. In W. Gray (Ed.), Integrated
computing systems (2001), 281-282. Models of Cognitive Systems (pp. 99-119). New York:
3. Cutrell, E. & Guan, Z. What are you looking for? An Oxford, 2007.
eye-tracking study of information usage in web search.