A Saliency Model Predicts Fixations in Web Interfaces Jeremiah D. Still Christopher M. Masciocchi Missouri Western State University Iowa State University Department of Psychology Department of Psychology jstill2@missouriwestern.edu cmascioc@iastate.edu ABSTRACT communicating to users where they ought to start their User interfaces are visually rich and complex. visual search [16]. In order to be considered salient, a Consequently, it is difficult for designers to predict which feature must be visually unique relative to its surroundings. locations will be attended to first within a display. For example, text that is underlined amongst non- Designers currently depend on eye tracking data to underlined text “pulls” the reader’s attention to it. However, determine fixated locations, which are naturally associated many interfaces, like web pages, are rich with visual media, with the allocation of attention. A computational saliency such as text, pictures, logos and bullets, making the model can make predictions about where individuals are determination of salient features a complicated task. Given likely to fixate. Thus, we propose that the saliency model this complexity, designers are often left making best may facilitate successful interface development during the guesses about which spatial regions are salient within an iterative design process by providing information about an interface. Previous research on visual search in web pages interface’s stimulus-driven properties. To test its predictive defines entry points as regions within a page where users power, the saliency model was used to render 50 web page typically begin their visual search. In this article, we will screenshots; eye tracking data were gathered from argue that these entry points are heavily influenced by participants on the same images. We found that the saliency visual saliency, that is, users will often begin searching web model predicted fixated locations within web page pages at the location of highest saliency. In related research interfaces. Thus, using computational models to determine examining cognitive processing these implicit and low level regions high in visual saliency during web page cues that guide a viewer’s visual search are referred to as development may be a cost effective alternative to eye stimulus-driven properties – certain characteristics of the tracking. stimulus quickly “drive”, or direct attention to certain locations over others. Currently, no consensus has been Author Keywords reached as to which visual characteristics, or stimulus- Saliency, Interface Development, Design, Model driven properties, make for effective entry points. ACM Classification Keywords Measuring Overt Attention through Eye Tracking H.1.2 User/Machine Systems; I.2.10 Vision and Scene Given the over abundance of visual information in our Understanding environments and our working memory limitations, INTRODUCTION attention must be selective, only allowing a limited amount of information into consciousness, for our cognitive system Saliency, Search and Design to function properly [8]. It has been suggested that the Some visual designs guide users to the locations of programming of eye movements has a direct and natural important information, while others mislead users. Visual relationship with visual attention in that attention is often saliency, inherent in a complex interface, cues users to directed to whichever item is fixated [10]. Only information certain spatial regions over others. If employed correctly by that falls directly on the fovea during a fixation is encoded designers, salient cues may reduce information search times with high resolution and only a limited amount of this high and facilitate task completion [cf. 18] by implicitly resolution information is processed, while the rest falls into rapid decay [see 4]. Thus, it is critical that users fixate on relevant visual information or that content will not reach Permission to make digital or hard copies of all or part of this work for users' awareness. personal or classroom use is granted without fee provided that copies are Pre-proceedings of the 5th International not made or distributed Workshop for profit or on Model commercial Driven and advantage Development of that copies It is no surprise then, that designers often monitor eye Advanced User Interfaces (MDDAUI 2010): Bridging between User Experience and bear this notice and the full citation on the first page. To copy otherwise, UI Engineering, organized at the 28th ACM Conference on Human Factors in movements to evaluate a web page’s saliency, or entry or republish, Computing to post Systems (CHI on servers 2010), or Georgia, Atlanta, to redistribute to 10, USA, April lists, requires prior 2010. points. Eye tracking systems allow designers to test whether specific permission and/or a fee. MDDAUI© 2010, April 10, 2010,papers Atlanta, Georgia, USA. Copying permitted their web pages actually guide users' fixations to important Copyright 2010 for the individual by the papers' authors. Copyright for 2009 private and ACMpurposes. academic 978-1-60558-246-7/09/04...$5.00. Re-publication of material from this volume locations. However, eye tracking has a number of requires permission by the copyright owners. This volume is published by its editors. recognized costs. Eye tracking systems are often expensive, 25 not easily accessible, time consuming to employ and they image's saliency map provides predictions of where spatial gradually lose calibration [1, 2, 7, 15]. attention should be deployed [for detailed explanations refer to 6, 13]. In essence, the model makes predictions Stimulus and Goal Driven Searches about which regions in an image have the most and least In this article we investigate the influence of stimulus- likely chance to be attended based purely on stimulus- driven saliency on attention within the context of a web driven properties. The saliency model is available for page. Stimulus-driven saliency guides attention quickly and download from as a collection of without explicit intention, thus some might question its role Matlab functions and scripts [17]. during a purposeful search on a web page. There is ample Testing a Saliency Model within Web Pages evidence to suggest that goals do influence the guidance of attention. For example, web page eye tracking research has Designers recognize the need to predict and identify where shown that changing the task (or goal) during a search, or users’ attention will be guided on a web page. For example, seeking navigational or informational indicators, changes it is well known that one should avoid using poor designs observers’ fixation patterns [3]. Additional research has that increase the likelihood of users missing important shown that, given enough time, expectations can cause a interface features such as branding, navigational or consistent pattern of fixations – F-shaped pattern or reading informational symbols. But, using an eye tracking system to patterns (e.g., left-right/top-bottom) [14]. However, these monitor guidance of attention – as is traditional – can be goal-driven effects interact with stimulus-driven effects, expensive, difficult to employ and time consuming within making the stimulus-driven influences more difficult to the context of a practical iterative design process. Thus, we examine [cf. 11]. Also, it is often the case that only a few investigated the utility of a computational saliency model in seconds are spent on a web page (even with a goal in mind) predicting the guidance of attention in web page making the understanding of stimulus-driven processing, screenshots. This new method is benchmarked and which is believed to influence attention very rapidly, compared to another set of data in which participants’ eye critical. For instance, when searching for information movements were tracked while they viewed the same web observers often only skim through approximately 18 words, page screenshots. and spend 4 to 9 seconds, per web page [2, 12]. One way to METHOD investigate the pure influence of stimulus-driven guidance is to use a computational saliency model designed to make Participants predictions about what properties or features of a web page The data from eight undergraduate participants are attention ought to select within complex media, or scenes. examined. All participants reported extensive web site Predicting Fixations through a Saliency Model experience. Visually salient items often draw observers' attention. To Stimuli and Equipment better understand the influences of saliency, or stimulus- The images were 50 screenshots of various web pages. driven selection, on attention, Koch and Ullman (1985) Each participant saw each screenshot only once. developed a model to compute an image's visual saliency without any semantic input (i.e., meaning of objects). Their Participants' eye movements were recorded by an ASL eye model is based on the assumption that eye movement tracker with a sampling rate of 120 Hz. Screenshots were programming is driven by local image contrast leading to shown on a Samsung LCD monitor, which had a viewing logical serial searches through complex spatial area of approximately 38.0 cm × 30.0 cm. A chin rest environments. These serial searches are guided by low level maintained a viewing distance of approximately 80 cm. primitives extracted from a scene. The saliency model was Images subtended approximately 26.70 x 21.20 visual angle. developed under the pretense that low level visual features Procedure (i.e., color, light intensity, orientation) are processed pre- attentively in humans and, in turn, rapidly influence overt Participants first read and signed an informed consent attention. Thus, the underlying assumption is that visual document, and were then seated in front of the monitor with saliency is used to guide the fovea to unique areas within a their chin in the chin rest. The experiment began and scene that might provide the most efficient processing [5]. concluded with a 9-point calibration sequence to calibrate the eye tracker and estimate the amount of tracking error. The computational model is implemented on a computer using digital pictures as stimuli to produce a pre-attentional Participants were told that they would view a series of web or “saliency” map [9]. To create a saliency map, the model page screenshots, and that they should, "look around the receives input from pixels within a digital picture. Then, it image like you normally would if you were surfing the extracts three feature channels – color, intensity, orientation internet." A fixation cross was presented at the center of the – at eight different spatial scales. These three channels are screen to signal the beginning of a trial. After a delay of normalized and differences of center-surround are approximately 1 second, a randomly selected web page calculated for each separate channel. The separate channels screenshot was presented for 5 seconds. The fixation cross are additively combined to form a single saliency map. An then reappeared to signal the beginning of the next trial. 26 The experiment took approximately 15 minutes to extracted at the fixated locations from screenshot 1. The complete. saliency values of all other screenshots were extracted at the location of the first ten fixations for all subjects for each screenshot. These values formed the Shuffled Distribution. The method used to create this distribution controls for spatial biases that may inflate correlations between fixations and salient regions. If the values of the Shuffled Distribution are larger than those of the Observed Distribution, it would indicate that participants fixated on regions that are lower in saliency than what is expected by chance. If, however, the values of the Observed Distribution are larger than those in the Shuffled Distribution, it would indicate that participants fixated regions that are higher in saliency than what is expected by chance. Figure 2 shows the means for the Observed and Shuffled Distributions of the first ten fixations for each screenshot. An analysis of variance was conducted with fixation number (1-10) as a within-subjects variable and distribution Figure 1. Two examples of web page screenshots and their (observed, shuffled) as a between-subjects variable, to corresponding saliency maps. determine whether any differences between the distributions varied as a function of fixation number. The Creation of saliency maps main effect of fixation number was reliable, F(9, 882) = Saliency maps were created using the algorithms developed 6.39, MSE = 19.03, p < .001. Pairwise comparisons by Itti, Koch, and Niebur (1998). The model was run on revealed that the values for the first fixation were higher each image individually and the output was normalized by than all other values, and that the values of the tenth dividing all values by the maximum value for that map, and fixation were lower than all other values. This indicates that multiplying all values by 100. To simplify data analysis, the early fixations tend to occur at regions of higher salience size of the saliency maps was increased to be identical to than those of later fixations. More importantly, the main the size of the screenshots (1024 x 768 pixels). As effect of distribution was also reliable, F(1, 98) = 4.86, described in the Introduction, these saliency maps are 2-D MSE = 397.95, p < .05, indicating that the values of representations of areas in the screenshot that show the Observed Distribution were larger than those of the relative saliency of locations in the image. Figure 1 shows Shuffled Distribution. This difference confirms that an example of two web page screenshots and their participants fixated regions higher in saliency than would corresponding saliency maps. Low values (dark areas in the be expected by chance, showing that the saliency model is image) indicate regions of the image that are low in effective at predicting fixations. Distribution x Fixation saliency, while high values (light areas in the image) number was not significant, F < 1. indicate regions high in saliency. RESULTS We used a similar technique to Parkhurst, Law, and Niebur (2002) to determine whether salient regions in web pages were fixated more often than would be expected by chance. Specifically, the values of the saliency map at the location of each participant's first ten fixations were extracted. For example, the x, y coordinates of the first fixation for each participant was determined for every screenshot and the value at the same location in the corresponding saliency map was extracted. This process was repeated for fixations two through ten. These values formed the Observed Distribution of participant responses (Figure 2). To determine the likelihood that salient regions would be fixated by chance, we repeated the process used to find the Observed Distribution after rearranging the fixations and Figure 2. Mean saliency values for the observed ('X') and shuffled ('o') distributions for the first ten fixations. saliency maps for all screenshots. For example, the values from the saliency map for screenshots 2 to 50 were 27 DISCUSSION Proceedings of the SIGCHI conference on human Eye tracking is a commonly employed method for factors in computing systems (2007), 407-416. examining the guidance of overt attention within interfaces 4. Egeth, H. E. & Yantis, S. Visual attention: Control (e.g., web pages). However, it has several drawbacks. We representation and time course. Annual Review of propose that a web page’s saliency, stimulus-driven Psychology (1997), 48, 269-297. properties, may be revealed through the use of a 5. Itti, L. & Koch, C. A saliency-based search mechanism computational saliency model. Therefore, we compared the for overt and covert shifts of visual attention. Vision performance of the model to eye tracking data collected Research, 40(10-12) (2000), 1489-1506. from human observers. We were able to demonstrate that, 6. Itti, L., Koch, C. & Niebur, E. A model of saliency- indeed, the saliency model predicts the deployment of overt based fast visual attention for rapid scene analysis. attention within a web page interface. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11): 1254-1259, November 1998. Previous research has shown a modest correlation between 7. Johansen, S. A. & Hansen, J. P. Do we need eye saliency and eye fixations in natural and artificial scenes trackers to tell where people look? Proceedings of [13]. We have extended this research by showing that even Computer-Human Interaction extended abstracts on in web pages, which may contain more semantic human factors in computing systems (2006), 923-928. information (e.g., meaningful: text or images) than nature 8. Johnston, W. A. & Dark, V. J. Selective attention. scenes, fixations are correlated with saliency. Specifically, Annual Review of Psychology(1986), 37, 43-75. participants were more likely to fixate on regions in the web 9. Koch, C. & Ullman, S. Shifts in selective visual pages with a higher saliency value than predicted by attention: Towards the underlying neural circuitry. chance. Human Neurobiology (1985), 4, 219-227. 10. Kowler, E., Anderson, E., Dosher, B. & Blaser, E. The Our data suggest that saliency maps alone can provide role of attention in the programming of saccades. reasonable predictions of overt attention. In addition, Vision Research (1995), 35, 1897-1916. saliency maps can be generated quickly, and require no 11. McCarthy, J. D., Sasse, M. A. & Riegelsberger, J. additional equipment or participants. Even with these (2003). Can I have the menu please? An eyetracking positive attributes, one may be hesitant to abandon eye study of design conventions. Proceedings of Human- tracking altogether. Our recommendation to designers is to Computer Interaction, 401-414. choose the method most appropriate for your project given 12. Nielsen, J. (2008, May). How little do users read? your constraints and needs. It is often the case that Retrieved May 12, 2009 from developing effective interfaces requires many levels of http://www.useit.com/alertbox/percent-text-read.html. analysis. For example, during the early formative testing 13. Parkhurst, D., Law, K. & Niebur, E. Modeling the role process it would be appropriate to begin by using the of salience in the allocation of overt visual attention. saliency model to ensure that regions identified as being Vision Research (2002), 42, 107-123. important are also visually salient. Then, during the ‘final’ 14. Rayner, K. Eye movements in reading and information prototype development stage, employ the eye tracking processing: 20 years of research. Psychological method to verify that your participants are actually looking Bulletin (1998), 124(3), 372-422. at the critical elements in the design. 15. Tarasewich, P., Pomplun, M., Fillion, S. & Broberg, D. The enhanced restricted focus viewer. International REFERENCES Journal of Human-Computer Interaction (2005), 19(1), 35-54. 1. Arroyo, E., Selker, T. & Wei, W. Usability tool for 16. Treisman, A. M. Perceptual grouping and attention in analysis of web designs using mouse tracks. Computer- visual search for features and for objects. Journal of Human Interaction extended abstracts on human Experimental Psychology: Human Perception and factors in computing systems (2006), 484-489. Performance (1982), 8, 194-214. 2. Chen, M., Anderson, J. R. & Sohn, M. What can a 17. Walther, D. & Koch, C. Modeling attention to salient mouse cursor tell us more?: Correlation of eye/mouse proto-objects. Neural Networks (2006), 19, 1395-1407. movements on web browsing. Computer-Human 18. Wolfe, J. M. Guided Search 4.0: Current Progress with Interactions extended abstracts on human factors in a model of visual search. In W. Gray (Ed.), Integrated computing systems (2001), 281-282. Models of Cognitive Systems (pp. 99-119). New York: 3. Cutrell, E. & Guan, Z. What are you looking for? An Oxford, 2007. eye-tracking study of information usage in web search. 28