Paving the Path to Automatic User Task Identification Anne Gutschmidt Clemens H. Cap Friedemann W. Nerdinger Graduate School 466∗ Computer Science Institute Business Science Dept. University of Rostock {anne.gutschmidt, clemens.cap, friedemann.nerdinger}@uni-rostock.de ABSTRACT 1. An exploratory study was conducted in which the user Web site personalization could be immensely improved if tasks were given in the form of concrete exercises, so that the user’s current intentions could be recognized by the surf- the resulting behavior can be analyzed knowing the real task. ing behavior. The latter can be captured in the form of events Thus, ideal conditions are created: the setting of goals pre- occurring in the browser, like mouse moves or opening Web vents task switching and distractions and makes noise in the pages. But which aspects of the user’s behavior best con- data unlikely. It is of particular interest to deduce which task tribute to the recognition of the task a user is performing? Is triggers which behavioral pattern. it the number of mouse clicks, the amount of time spent on each page, the use of the back button or anything else? 2. All events are captured which might possibly be of inter- First results of an exploratory study give hint that already est, including mouse and scroll moves. Moreover, the selec- simple attributes, such as the average page view duration, tion of browser tabs and the appearance of pop-up windows the number of page views per minute and the number of were considered. Cached pages are also included in the log. different URLs requested, may be usable for the automatic This allows a better insight into which Web pages a user re- user task identification. 20 participants solved exemplary ally viewed and not only loaded as it is usually the case with exercises which corresponded to the user tasks Fact Finding, Web server logs. Information Gathering and Just Browsing. Due to the event logging, true display times were identified, even cached pag- 3. It was decided to conduct this study as a pilot study be- es and the use of browser tabs were recorded. fore an extensive field study. That way, a preselection of ex- pedient hypotheses about the relationship between task and surfing behavior can be made under controlled conditions. Author Keywords In comparison with a field study, this pilot study produces user tasks, user behavior, interaction, task identification, ex- a manageable amount of “clean” data allowing a detailed ploratory study, personalization analysis. When starting with a field study right away it may come to the point that the identification of the user task does ACM Classification Keywords not work, but the reason is unclear: is it the identification J.4 Computer Applications: Social and Behavioral Sciences— method which does not work or is the input data not appli- Psychology; H.5.2 Information Interfaces and Presentations: cable or insufficient? User Interfaces—Interaction Styles The experiment was conducted using one of Germany’s most popular on-line newspapers, Spiegel Online.† The exercises INTRODUCTION the participants of the study had to solve represented the user What if we could recognize the task a Web user is currently tasks Fact Finding, Information Gathering and Just Brows- performing just by the surfing behavior? The automatic iden- ing, following existing publications like [3]. tification of user tasks would improve existing personaliza- tion methods by adding a semantic component without ex- First investigations showed that there exist differences re- plicitly asking the users to give away personal information. garding the page view attributes average page view duration, Furthermore, users with handicaps, like visually impaired number of page views per minute, number of unique URLs people, would profit from this approach: as soon as the ob- in proportion to the total number of page views and the time jective is recognized, the user can be supported by a wizard- proportion spent on the start page of the newspaper. like program which leads through the next steps to the in- tended target. The total output of the study will be used to formulate hy- potheses describing task-dependent behavioral patterns, like This paper describes a work in progress. It deals with the “Fact Finding exhibits significantly shorter dwell times than question which aspects of the behavior are influenced by the Information Gathering and Just Browsing” etc. Not until task so that eventually an automatic identification of the user enough hypotheses are derived from the study’s findings, a task will be possible. This work differs from many existing field study can be conducted where the surfing behavior of approaches in the field of interaction tracking, such as [3, 6], the users will be recorded in a natural surrounding with ev- in three ways: ∗ † Supported by the German Research Council (DFG). http://www.spiegel-online.de 1 © 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. Re-publication of material from this volume requires permission by the copyright owners. eryday activities. Thus, the exploratory study described in the activities correctly. The authors claim that this result is this paper is an indispensable prerequisite on the way to au- caused by individual differences with regard to the surfing tomatic user task recognition. behavior [4]. Another reason the authors did not take into consideration is that the differences between Web sites con- The subsequent content is structured as follows: in the next cerning content and structure also have an influence on the section, user tasks in general and in the context of on-line surfing behavior. newspapers are defined based on existing user task taxono- mies. Moreover, the state of the art concerning the automatic THE EXPLORATORY STUDY identification of user tasks is briefly described. The third Participants section is dedicated to the design of the exploratory study. Twenty students and employees from various institutes of Afterwards, first results of the study are presented followed the University of Rostock took part in the test. Their average by a section about related work. The last section summarizes age was 26.6 years. the findings and gives an outlook to future research. Setting DEFINITION OF USER TASKS The participants had to perform exercises on one version of According to Paternò, tasks are “activities that have to be a German on-line newspaper called Spiegel Online. The ac- performed to reach a goal.” A goal is described as “either cess to external Web pages was blocked. So, the influence a desired modification of the state of an application or an different kinds of Web sites can have on the behavior was attempt to retrieve some information from an application.” eliminated and test conditions were as similar as possible Tasks can be divided into subtasks of lower complexity and for everyone. Each participant underwent the experiment the relationship between the tasks can be modeled in various separately, but on the same computer. The Mozilla Firefox ways [7]. browser was used with a software extension for recording all events occurring during the surfing. In the past, several authors tried to systemize high level user tasks on the Internet [1, 5, 10], but they did not seek a con- Procedure nection to task modeling as suggested by Paternò. Kellar In the experiment, the user task was the only parameter which et al. merged these taxonomies of high level tasks with the was changed in order to check in which way the behavior results of their own study and arrived at the following taxon- changes depending on the task. The participants had to per- omy [3]: form exercises which correspond to the user tasks Fact Find- ing, Information Gathering and Just Browsing following the Fact Finding: The users are looking for a fact in the form example of [3]. The category Transaction was not adapted as of a keyword or a sentence like checking the date of birth this kind of activity occurs rather infrequently on the news- of Johann Sebastian Bach; i.e. their target is clearly defined. paper Web site we used. Transactions usually concern article Such activities are usually of short duration. purchases or a newspaper subscription which seemed inap- propriate for the test. Information Gathering: The users are collecting informa- tion about a certain topic, thus, their target is more open. An At the beginning, the participants were asked to get familiar exemplary goal might be learning something about baroque with the pages of the Web site. They were allowed to surf music. Due to its research-like character, Information Gath- the Web site as they liked. The maximal duration of this ering may take longer, even last for more than one session. warm-up phase was ten minutes, but the participants were free to decide whether they wanted to finish earlier. This Just Browsing: This category describes surfing the Internet already represented the first exercise and the user task Just with no certain target in mind. It is often of long duration. Browsing. Transaction: Activities like on-line banking or checking e- The two following exercises both corresponded to Fact Find- mails on-line are summarized in this category. ing. The first exercise was to look up a certain weather fore- cast and the second one to find a football result. After having Other: The last category comprises all activities which can- read what they were expected to do, the search was started not be assigned to any of the other four categories. from the start page of Spiegel Online. When the information was found, the participants told the investigator the answer The recognition of these high level tasks would already be aloud and turned back to the start page for the next exercise. a significant breakthrough. Kellar et al. logged the surfing behavior of a group of users who were asked to document The last exercise was to collect information on the G8 sum- their activities during that time period. Events like the usage mit and thus represented Information Gathering. The partic- of the back button, hyperlinks, bookmarks and the history ipants were informed that after 10 minutes a few questions were captured and gathered in log files [3]. One part of the pertaining to the topic would have to be answered. This was data was used to build a classification rule, the other part was supposed to act as motivation. dedicated to testing the rule. The rule was supposed to as- sign a user to one of the above-mentioned categories (Fact During the experiment, the users’ behavior was recorded by Finding etc.). However, the rule identified only 53.8% of capturing every event occurring in the browser. A Mozilla 2 Firefox extension was developed to log the following events: deviation depicted as error bar. The t-test reveals that the dif- ference between each pair of tasks is significant as the signif- • mouse events (clicks, moves and touching page elements) icance value is always less than a minimal level of p = 0.05 (see Table 1). The degree of freedom is always df = 19. • scrolling • keystrokes • tab events (open, select and close) • browser events (reload, stop etc.) • page events (show and hide) Besides the time of occurrence of the event, further details were saved; e.g. for each mouse click event the information about the element that was clicked was stored; i.e. whether it was a hyperlink and to which page it leads, whether it was a simple text paragraph, a headline, an image or a browser button etc. At the end, the participants were presented with a question- naire which ascertained demographic information as well as the level of experience concerning computer and Inter- net usage, the familiarity with on-line newspapers and their Figure 1. A comparison of the average page view duration. (σF F = browser preferences. 0.1, σIG = 0.7, σJB = 0.5) DATA ANALYSIS FF & IG FF & JB IG & JB The log files gained from the experiment represent lists of T -3.729 -3.143 2.414 events from which behavioral attributes have to be extracted. p 0.001 0.005 0.026 Significance tests are done to find differences between the Table 1. Results of the t-test for the average page view duration. user tasks regarding these attributes. When such a signif- icant difference is found, this attribute may be suitable for the automatic identification of user tasks. The first investigations concentrated on page views which represent the time a user was looking at a Web page. Page views are derived from the log by considering the page events, i.e. when a page is shown and hidden, but also tab events. Sometimes, users like to load a page in a tab in the back- ground, but this does not actually start a page view as the user cannot yet see the page. The page view only starts when the user selects the according tab. The page views were examined concerning their average du- ration as well as the number of page views per task and per minute. Furthermore, the time proportion of start page visits during the task and the number of unique URLs (page views without repetition) in proportion to the total number of page views were of interest. A t-test with pairwise samples was done to measure the significance of the difference between the tasks. Usually, t-tests require a normal distribution which Figure 2. The frequency distributions for the average page view dura- tion. cannot be guaranteed here, as the sample size is too small. However, due to the exploratory character of the study this This allows the assumption that Fact Finding takes less time uncertainty is accepted. than the other tasks. The high values of the standard de- viations suggest that a differentiation between Information RESULTS Gathering and Just Browsing might turn out difficult. To The Average Duration of a Page View draw generally valid conclusions and to see if this attribute For each user and each task an average page view duration is reliable for the identification of the tasks, it is necessary was calculated. Figure 1 shows the results for the three to collect data of more participants in a natural surrounding tasks Fact Finding (FF), Information Gathering (IG) and with arbitrary activities. However, these results show that Just Browsing (JB) including average value and the standard low average duration of a page view is a good clue for Fact 3 Finding. The investigations will be extended in order to re- veal possible patterns; e.g. if there is a development during the session like page views becoming gradually shorter or longer depending on the task. Figure 2 illustrates the frequency distributions for the three tasks. Their shape allows the presumption of a normal distri- bution. However, the size of the sample is too small to allow generally valid statements. The Number of Page Views With regard to the number of Web pages viewed during one task, a significant difference is evident between Fact Finding and the other two goals. However, a significant difference between Information Gathering and Just Browsing could not be found as Table 2 shows (see also Figure 3). Figure 4. A comparison of the number of page views per minute. (σF F = 1.1, σIG = 0.9, σJB = 1.2) Figure 3. A comparison of the number of page views per task. (σF F = 0.7, σIG = 6.7, σJB = 9.9) Figure 5. The frequency distributions for the number of page views per minute. FF & IG FF & JB IG & JB T -5.565 -5.384 -1.400 p < 0.001 < 0.001 0.178 Unique URLs and Start Page Visits Table 2. Results of the t-test for the number of page views per task. Two further behavioral aspects were investigated: the num- ber of unique URLs in proportion to the total number of page views and the time proportion of start page visits. The URL FF & IG FF & JB IG & JB proportion reflects whether pages have been visited several T 7.000 5.293 -1.673 times. If the value is high most of the pages have been vis- p < 0.001 < 0.001 0.111 ited only once. A lower value indicates more repetitions. Again, Fact Finding turns out to be significantly different Table 3. Results of the t-test for the number of page views per minute. from the other two user tasks with a high average value. In contrast, Information Gathering and Just Browsing are less A similar result turns out when normalizing this attribute by easy to differentiate (see Figure 6 and Table 4). The investi- referring it to a time period of one minute as depicted in Fig- gations have to be extended in order to find out what causes ure 4 and Table 3. This seems quite evident as Fact Finding these repetitions. Maybe one reason is that users navigate is characterized as a quick search for a well-defined piece deep into the structure and navigate back on the same path of information. Again, this attribute seems to be very suit- to their starting point several times. able for the recognition of Fact Finding whereas Information Gathering and Just Browsing appear similar. Figure 5 indi- The time proportion of start page visits was expected to bring cates a normal distribution for the number of page views per a difference between Just Browsing and Information Gath- minute. ering. The t-test confirms this assumption (see Figure 7 and 4 FF & IG FF & JB IG & JB T 2.711 0.255 3.051 p 0.014 0.802 0.007 Table 5. Results of the t-test for the time proportion of start page visits. the following hypotheses: • Fact Finding shows a smaller average page view duration than Just Browsing and Information Gathering. • Users performing Fact Finding look at more pages dur- ing one minute than with Information Gathering and Just Browsing. • Fact Finders do not tend to page revisits. Figure 6. A comparison of the number of unique URLs in proportion to the total number of page views. (σF F = 5%, σIG = 14%, σJB = • Users performing Information Gathering spend little time 14%) on the start page of the newspaper compared to Fact Find- ing and Just Browsing. FF & IG FF & JB IG & JB T 8.518 7.765 -0.637 This list is not complete as the similarities between the tasks p < 0.001 < 0.001 0.531 also have to be included. Furthermore, more detailed inves- tigations have do be done, e.g. on the page view duration as Table 4. Results of the t-test for the number of unique URLs in propor- it has already been mentioned. tion to the total number of page views. An issue of concern are the high values of standard devi- ation for some attributes. Until now, one can only guess Table 5). The start page is the starting point for every task, that their origin lies in the different motivation of the par- but for Just Browsing it was expected to be very important as ticipants. Some, for example, read very carefully whereas the page contains various headlines from different news cat- others seemed to be keen on finishing the test. If, however, egories. Users with different topic interests may spend more these differences between the participants are caused by in- time here. In contrast to this, Information Gathering is more dividual surfing habits, this might cause problems with the specialized. It is probable that users performing Information task identification. Gathering use the start page less. This time, it is Informa- tion Gathering which is best to differentiate from the other two tasks. Limitations The study collected data of only few users under very con- trolled conditions. Clearly, this causes some limitations, but these were not accepted without reason. As it was explained at the beginning, we wanted to guarantee that the users are performing exactly the tasks we were expecting in order to have an unambiguous picture of the task-dependent behav- ior. If the participants were allowed to do what they wanted they could have become distracted or constantly switched between tasks. That way, we gained ideal data in which the behavioral patterns could be searched. It was decided to conduct the test on only one Web site and exactly one version of it to keep external influences as little as possible. The participants were supposed to see the same and to do the same. However, further tests must include more Web sites. This will show if the differences found so far also occur on other newspaper sites and eventually also on totally different Web sites. We will also include the cat- Figure 7. A comparison of the time proportion of start page visits. egory Transaction as there will certainly be some on-line (σF F = 14%, σIG = 19%, σJB = 24%) newspapers requiring this kind of activity. Conclusion The list of evaluated attributes is still limited to very sim- With regard to the behavioral attributes described above, dif- ple attributes. More evaluations will be done on the scroll ferences between the three user tasks occurred. They allow behavior and, connected to this, the information whether a 5 page was viewed completely or only its beginning. Further- or the information is unobtrusively gained by observing the more, the use of browser buttons like the back button will user’s actions as it is suggested in [9] with the EPOS project. be analyzed as well as mouse moves; e.g. do Just Browsers Using different plugins, the behavior within several applica- click on images rather than on text? This leads to a very im- tions such as e-mail clients, web browsers or text editors is portant point: until now, the content has not yet been taken observed to create individual context models. According to into consideration. This information will highly enrich the [9], these models can be used to infer a user’s needs and behavior analysis. goals. In contrast to [9] where events from different applica- tions are integrated, the Microsoft Office Assistant refers to Moreover, the range of evaluation methods will have to be user needs within only one application. In Lumière, the Of- extended. The attributes we have so far derived from the fice Assistant’s predecessor, every action concerning mouse, event log will also be used for testing machine learning tech- scrolling or keyboard is captured [2]. Furthermore the selec- niques such as classification trees, clustering or Bayesian tion of objects within the application, the browsing of menus models to examine the expressiveness of the attributes. etc. are observed. A Bayesian model is used to estimate the probability of a user having a certain need. The subsequent step will be a field study in which the hy- potheses gained from the pilot study will be tested under re- Most of these approaches do not only deal with the question alistic conditions. The event recording tool will be installed about what the user is doing, they are, above all, interested on the participants’ computers and log their behavior for a in what the user needs, which is closely connected with the period of four weeks. The users will have to document their user task and the user’s goal. However, a final satisfying activities to maintain a connection between the log and the solution of gaining this information has yet to be found. user task. Until now, it was not possible to draw generally valid conclusions due to the small sample size in the pilot study. The field study will bring the amount of data neces- SUMMARY AND OUTLOOK sary to create and test a method for identifying the user task. An exploratory study was conducted to examine the con- nection between the task a user is currently performing and the resulting behavior. The investigations referred to on-line RELATED WORK newspapers to avoid the influence different kinds of Web In the second section some state of the art concerning user sites could have on the surfing behavior. Exercises were set tasks and user task taxonomies has already been presented. that each corresponded to a user task: Fact Finding, Infor- There are, however, further areas besides the Web where the mation Gathering or Just Browsing. During the test, every identification of user tasks is of interest. event was recorded in a log file. The first evaluations re- ferred to attributes of page views and already led to promis- In the area of workflow management, for example, processes ing results. Fact Finding was distinguishable from the other are to be identified which can be compared to user tasks as two user tasks in four of five attributes, whereas Information we have described them here. In [11] it was explained how Gathering and Just Browsing showed a difference for only event logs can be used to create a process model in the form two attributes. The next step will be to examine if the at- of a Petri net. One event data set shows which kind of work- tributes are suitable for the prediction of the task by applying flow the event refers to and the workflow instance as well as different machine learning techniques such as classification the task which is in this context one step in the workflow. An trees or Bayesian models. algorithm is proposed which automatically creates process models from such event logs. That way the process struc- The study was to gain hints on which aspects of the be- ture is revealed enabling its analysis and evaluation as well havior might be useful to enable automatic task identifica- as appropriate user support through a workflow management tion. These hints have to be wrapped in clear hypotheses like system. “Fact Finders usually view Web pages only briefly, scroll fast, and often do not scroll down to view a page in total.” [8] also deals with workflows and processes, but focuses on These hypotheses have to be tested in another study which the special character of knowledge intensive work. Knowl- must be more extensive concerning participant number and edge workers cannot profit from the support of workflow time. Furthermore, the data should be collected when the management systems as the processes they perform do not users are involved in their everyday activities and not in a fit into rigid process models. Much more flexibility and laboratory situation. room for creativity is required. However, similar tasks may reoccur in the future and the information about past work- As soon as a feasible task identification can be implemented, flow instances should thus be maintained and made avail- a variety of user support mechanisms can be realized. A able for all members of an organization. As the process recommender system for an on-line newspaper could, for is not predictable and a fixed model cannot be created, the example, use different algorithms for the determination of identification of process patterns is suggested. This is done link suggestions depending on the user task. When dealing by performing process mining using data about the actions with Fact Finding, text mining could be preferred: the user that have been performed and the data structures which were is looking for a small piece of information like a keyword, involved. This information is obtained by considering the so we could look for documents containing the same key- personal knowledge management. The users either have to words as those Web pages the user has just visited. When input the information about their processes on their own, dealing with Information Gathering, it would be advisable 6 to search for recommendations within the same topic frame 7. Fabio Paternò. Task models in interactive software whereas Just Browsing seems eligible for a method like as- systems. In: Chang, S.K. (Ed.), Handbook of Software sociation mining where articles are suggested that were of- Engineering Knowledge Engineering. World Scientific ten read together with those articles the current user has just Publishing Co, 2001. seen. Depending on the task, different services could also be offered, e.g. a Fact Finder would probably welcome a search 8. Uwe V. Riss, Alan Rickayzen, Heiko Maus, and Wil functionality provided with the next Web page opened. Fur- M. P. van der Aalst. Challenges for business process thermore, one could examine whether certain task types reg- and task management. Journal of Universal Knowledge ularly occur at certain times of day for a specific user, e.g. Management, 0(2):77–100, 2005. in the morning the person often performs Information Gath- 9. Sven Schwarz. A context model for personal ering, at noon Just Browsing and during the afternoon Fact knowledge management. Lecture Notes in Computer Finding. Thus, the time of day could already give hint on the Science, 3946/2006:18–33, April 2006. task. 10. Abigail J. Sellen, Rachel Murphy, and Kate L. Shaw. The utility of identifying user tasks is, however, not restricted How knowledge workers use the web. In CHI ’02: to recommender systems. A live assistant can be developed Proceedings of the SIGCHI conference on Human that recognizes the user task and, connected with this, the factors in computing systems, pages 227–234, New user’s needs and gives according hints. York, NY, USA, 2002. ACM Press. The list of utilities that automatic user task recognition could 11. W.M.P. van der Aalst, A.J.M.M Weijters, and make possible is by far not complete. This underlines even L.Maruster. Workflow mining: Discovering process more how important this topic is and will still be in the fu- models from event logs. IEEE Transactions on ture. Knowledge and Data Engineering, 16(9):1128–1142, 2004. REFERENCES 1. Chun Wei Choo, Brian Detlor, and Don Turnbull. Information seeking on the web: An integrated model of browsing and searching. First Monday, 5(2), 2000. 2. Eric Horvitz, Jack Breese, David Heckerman, David Hovel, and Koos Rommelse. The Lumière project: Bayesian user modeling for inferring the goals and needs of software users. In In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pages 256–265, Madison, WI, 1998. Morgan Kaufmann: San Francisco. 3. M. Kellar, C. Watters, and Shepherd. The impact of task on the usage of web browser navigation mechanisms. In Proceedings of Graphics Interface (GI 2006), pages 235–242, Quebec City, Canada, 2006. Canadian Information Processing Society. 4. Melanie Kellar and Carolyn Watters. Using web browser interactions to predict task. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 843–844, New York, NY, USA, 2006. ACM Press. 5. Julie B. Morrison, Peter Pirolli, and Stuart K. Card. A taxonomic analysis of what world wide web activities significantly impact people’s decisions and actions. In CHI ’01: CHI ’01 extended abstracts on Human factors in computing systems, pages 163–164, New York, NY, USA, 2001. ACM Press. 6. Laila Paganelli and Fabio Paternò. Intelligent analysis of user interactions with web applications. In IUI ’02: Proceedings of the 7th international conference on Intelligent user interfaces, pages 111–118, New York, NY, USA, 2002. ACM Press. 7