Analysis of Web Usage Patterns in Consideration of Various Contextual Factors Jinhyuk Choi Jeongseok Seo Geehyuk Lee Korea Advanced Institute of Science and Information and Communications Korea Advanced Institute of Science and Technology (KAIST) University (ICU) Technology (KAIST) 119, Munjiro, Yuseong-gu 119, Munjiro, Yuseong-gu 119, Munjiro, Yuseong-gu Daejeon, 305-732, Republic of Korea Daejeon, 305-732, Republic of Korea Daejeon, 305-732, Republic of Korea demon@kaist.ac.kr chaoticblue1@icu.ac.kr geehyuk@kaist.ac.kr Abstract necessary to learn more about the user and to build a user It is important to analyze user’s Web usage logs for model based on this knowledge. This personalization developing personalized Web services. However, there are process is a main topic of research on Web usage mining several inherent difficulties in analyzing usage logs because (Mobasher et al. 2000; Gauch et al. 2007). However, it is the kinds of available logs are very limited and the logs not easy to learn more user information because we cannot show uncertain patterns due to the influences of various explicitly ask the user about his/her characteristics or what contextual factors. Therefore, speculating that it is necessary he/she is thinking at any particular time we want to know. to find what contextual factors exert influences on the usage This means that we have to find another way to learn more logs prior to designing personalized services, we conducted information about them. From this perspective, many several experiments in-series not only in situations of researchers have looked for effective implicit methods to performing designed tasks during short time periods but also in users’ natural Web environments during a period of learn more about users, and many intelligent methods have several days. From the results of our experiments, we found been actively suggested by several researchers (Kelly and that interest levels, credibility levels, page types, task types, Teevan 2003; Kelly 2004; Kelly and Belkin 2004; Kelly and languages are influential contextual factors in a natural and Cool 2002; Choi et al. 2007; Hofgesang 2006; Seo and Web environment. Moreover, some historical and Zhang 2000; Badi et al. 2006; Al halabi et al. 2007; Kellar experiential patterns that could not be observed in short time et al. 2005). In their researches, usage logs that are stored analysis were discovered in the results of long time analysis. while users visit Web pages have been used to learn about These findings will be useful for other researchers, particular user interests. For examples, the URLs of visited practitioners, and especially for developers of adaptive Web pages, visit period, dwelling time, mouse clicks, personalization services. mouse movement, keyboard typing, and visit frequencies on each Web page have been applied as implicit interest indicators. Introduction Although many successful results have been provided so The World Wide Web has a unique characteristic in that far, there are several inherent difficulties in analyzing the amount of contained information is continuously usage logs and extracting necessary information from them. increasing and yet can still be reached easily by users The first difficulty comes from the fact that the kinds of through various Web services. Moreover, it provides available usage logs are very limited, and there are no various types of media so that users can use it for multiple standard ways to interpret the meaning of usage patterns. purposes. Therefore, it is very important for researchers This means that we have to carefully investigate usage and practitioners to make the Web even more effective for patterns prior to using the logs as effective indicators. finding necessary information. Secondly, Web users are under the influence of various One of various means by which we can make the Web contextual factors while they use the Web, as it has more useful is to develop intelligent information delivery multiple aspects as a simple information tool, social in order to allow users to find their target information more communication mediator, entertainment source, and so on. effectively. A core part of intelligent information delivery Therefore, usage logs will show very uncertain patterns is to search through personalized contents without the because various contextual factors will exert their user’s explicit participation. For personalization, it is influence on the usage patterns concurrently (Kelly and Belkin 2004). The third difficulty is related with the historical aspect in that a user’s experiences also exert Copyright © 2009, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. influences on the variation of usage patterns. Therefore, a Web usage pattern analysis should be a long-term process because it cannot be adequately performed by studying is considered to be influenced by the particular task, only short-time usage. In addition, to analyze a user’s information need, knowledge state, cognitive style, various characteristics, the usage data should be collected affective state, and so on. They measured users’ cognitive at the browser side in the user’s real Web environment for styles and affective states before a user study, applying a a long period without any constraint on a specific Web process-tracing technique while users were conducting server. information-seeking tasks, and found various types of This paper details the results of our experiments in relationships among the elements of the dimensions. In which we initially tried to find the possibilities of (Fogg et al. 2003), based on results of an online qualitative overcoming the above difficulties. For our experiments, study, the credibility for Web contents were considered various usage logs have been collected at the browser side and important factors of credibility were suggested to and carefully analyzed not only in situations of performing formulate Web design guidance. In (Wathen and Burkell designed tasks during short time periods but also in users’ 2002), the authors asserted that users filter out most of the natural Web environments during a period of several days. gathered information and retain only useful information. In We obtained several interesting findings from the results. addition, they concluded that the credibility or believability We think that these findings will be useful for other of information is one of the most important criteria for the researchers, practitioners, and especially for developers of filtering. In (Rieh 2002), the authors found that users judge personalization services. cognitive authority and information quality by two types of This paper is organized as follows. In section 2, we judgment - predictive judgment and evaluative judgment – review some of the related researches. In section 3, we and they also identified the main facets and keywords of describe our experimental procedure and the results that the judgments through a user study. have been obtained so far are given in section 4. In section From these researches, we found that human information 5, summary and future works are introduced. behavior cannot be studied without the consideration of the influences of various types of contextual factors. However, because the purposes of these researches were not to Related Work develop an intelligent system but to construct theoretical models, they did not study quantitatively how the Web Human Information Behavior usage patterns reflect the influences of various contextual factors. There have been a lot of studies that have focused on human information behavior analyses in various research fields. In those studies, the researchers have focused on Web Usage Mining several contextual factors that affect a user’s behavior, There has been a lot of effort to quantitatively measure the conceptualizing the relationships between information- influences of contextual factors on Web user behaviors seeking behavior and contextual factors. In (Sonnenwald based on various usage logs in the field of Web usage 1999), the authors proposed an evolving framework in mining. Among the various factors, user interest toward which cognitive, social, and system perspectives are content has been the main focus of researchers. The incorporated. In the framework, human information various implicit indicators of user interest can be found in behavior including information exploration, seeking, (Kelly and Teevan 2003). In (Kelly 2004), the familiarity filtering, use, and communication were included. Based on of a topic has been discussed, and the authors concluded the framework, various influential factors - physical, that as one’s familiarity with a topic increases, his/her cognitive, affective, economic, social, and political – and searching efficacy increases and reading time decreases. their implications were investigated. In (Johnson 2003), the For user characteristics, cognitive and problem-solving needs of an information-seeking behavior analysis in a styles were studied in (Kim and Allen 2002). In their study, multi-contextual environment were presented and a the authors observed various user activities - average time theoretical framework was suggested. The authors of (Kari spent, average number of Websites viewed, average and Savolainen 2007) asserted that users are also number of bookmarks made, and average number of times improving along with the change of information a search/navigational tool was used for completing a search environment, and they found 11 relationships between task – while the users performed two types of given tasks individual developmental objectives and information in an experimental environment, and the authors found that searching via the Internet. In (Byström and Järvelin 1995; there are significant differences among user activities Borlund and Ingwersen 1997; Bystrom 2002; Vakkari according to the type of task and user’s problem solving 1999; Vakkari 2001), the influence of task complexity on style. For usage logs, the display time was discussed most information seeking behaviors was investigated. An actively. In (Kelly 2004; Kelly and Belkin 2004), based on overview of the nature of trust, and a framework of trust- gathered data from 7 subjects for 14 weeks, the inducing interface design features, were given in (Wang relationships between display time and various factors – and Emurian 2005). Particularly in (Wang et al. 2000), the task, topic, usefulness, endurance, frequency, stage, authors introduced a multidimensional model of user-web persistence, familiarity, and retention – were investigated, interaction, and three dimensions – user, interface, and the and the authors concluded that the display time is not Web – were considered. In the model, the user dimension suitable for inferring a user’s interest because there is great variation between display time and interest according to Task Contextual Usage logs Period the user; large differences according to the task at hand factors also appear. On the contrary, in (Choi et al. 2007), the Ex1 Visit Interest Viewing time 2 hrs collected Mouse movement viewing time has been used as a good implicit indicator, pages Mouse wheel and in (Hofgesang 2006), the authors made an assertion (text only) Mouse clicks that time spent on a Web page is more important than visit WM_PAINT frequency in inferring a user’s interest. In (Seo and Zhang Ex2 Visit Interest Viewing time 2 hrs 2000), bookmarking, time for reading, following up the collected Complexity Mouse movement HTML document, and scrolling were used as relevant pages Difficulty Mouse wheel Credibility Mouse clicks activities, and a machine learning algorithm was applied to WM_PAINT learn the user’s characteristics. In (Badi et al. 2006), Ex3 Free visits Interest Viewing time 2 hrs various parameters of document attributes, document / given tasks Complexity Mouse movement reading activities, and document organizing activities were Difficulty Mouse wheel investigated to recognize user interest and document values. Credibility Mouse clicks Task type WM_PAINT In (Kellar et al. 2005), the authors found that the time spent Ex4 Free visit Interest Viewing time 2 wks is more useful for more complex Web searching tasks. In / free tasks Credibility Mouse movement (Nakamichi et al. 2006), the authors also used several Task type Mouse wheel quantitative data of user behavior – browsing time and Mouse clicks moving distance, moving speed, and wheel rolling of the Keyboard typing Visit frequency mouse – to detect low usable Web pages. Day frequency Most of the researches have analyzed usage logs with the intention of developing an intelligent system that learns user characteristics and builds a user model. However, Table 1. The environment and gathered data of experiments most of the studies did not fully consider the influences of various contextual factors, or they focused only on a user’s considered the users’ attitudes toward the current task as interest without consideration of other types of subjective one of the contextual factors. Actually, the types of user feedback together. Moreover, most researches except task can be classified into detailed categories – information (Kelly, 2004; Kelly and Belkin 2004) did not consider the seeking, fact-finding, transaction, and browsing (Kellar et historical aspects of usage data that can only be gathered al. 2007). However, we classified user tasks into only two by a long-time analysis in a user’s natural Web categories – careful searching and casual searching - environment. according to the users’ attitudes toward the current task. A detailed description of the task categorization appears in section 3.5. There are more contextual factors that cause Our Approach users to interact with Web pages. For example, a user may stay for a relatively long time at a specific Web page Before everything else, we reviewed previous related because there are interesting contents there, or the user researches carefully and collected contextual factors for feels that the contents are more useful than others. consideration and usage logs that can be obtained at the Sometimes, the user may roll the mouse wheel more browser side. The contextual factors and usage logs that we frequently on one Web page than on others because he/she considered are given in Table 1. wants to read the entire content of the page carefully. In We carried out not only a qualitative analysis but also a this regard, we selected some further factors that may exert quantitative one. For ecological validity, we also observed an influence on user interactions with Web pages. The users in their own personal places. Because some of the factors are interest, credibility, complexity, and difficulty. contextual factors are inherently subjective and cannot be The complexity factor tells us how users feel about the measured with only usage logs, we collected various types layout structure of a Web page, and hence it may include a of feedback regarding the current context directly from user’s subjective viewpoint of usability and familiarity. We users. However, to minimize the burden on the users in this also included the difficulty factor because we thought that study, we tried to minimize the number of feedback user behavior is subject to variation according to a questions as much as possible. We developed software that subjective assessment of the difficulty of the contents runs on each user’s PC in order to collect their behavior displayed. logs and feedback in their Web browsing environments. Web Usage Log Contextual Factor Implicit user interest analysis has shown good performance Contextual factors include subjective assessments about at the server-side especially for commercial Websites. contents, situational factors, a user’s individual However, in spite of the fact that it is easier to analyze user characteristics, and so on. Because these factors cannot be interest at the server-side, currently many researchers have measured systemically, we designed a process in which we focused on browser-side analyses because user interest can can obtain the users’ subjective feedback directly. First, we be analyzed from various Websites, and a user model can links on a Web page, the Web page type, and the language presented (e.g., Korean or English). We also considered carefully some historical factors that can be analyzed only through relatively long periods of monitoring. The historical factors include visit frequencies and day frequency. Among those factors, day frequency is a new concept that has not been introduced before. A detailed description of day frequency will be given in a later section. Data Collection Software In some of the previous researches, custom-built browsers Figure 1. The feedback window consists of a browser control have been used (Kellar et al. 2007), as have some to view the contents of visited web pages, a list window to specialized logging software that works “in stealth mode” choose a visited URL, radio buttons to choose the answer of (Kelly and Belkin, 2004). Although there are several merits some questions, and so on in using custom-built browsers, because various data can be collected easily, we developed a browser-monitoring be constructed using a wealth of information through a module (BMM) that runs behind Internet Explorer without browser side analysis. In order to analyze users’ implicit any modification to the browser, as we wanted to preserve interest at the browser side, we have to monitor several the natural state of the Web browsing environment as usage logs, for example, the viewing time, scroll much as possible. movement, sequences of visited URLs, keyboard typing, BMM is a type of monitoring software that was and so on. In our research, we have chosen several usage developed to detect Windows GUI messages while users logs to record while users view different Web pages. The read Web pages, and thus it is possible to measure user viewing time that has mainly been investigated in the activities in real-time without any interruption to the users. related researches so far is the time during which users BMM uses a global hooker library, written in C++, which remain on a particular web page. The mouse wheel counts runs in the background and hooks all Windows operating the number of WM-MOUSEWHEEL messages (Choi et al. system events. In addition, using Windows Shell API, 2007). For mouse and scrollbar movement, we measured BMM can access all instances of currently running Internet the distance between two consecutive positions of the Explorers through the COM object. In addition, necessary mouse cursor and scroll bar at regular intervals and properties of Web pages can be obtained from the COM summed the distances. We also counted the number of object. BMM is written in C#, running under a Windows processed WM-PAINT messages, as WM-PAINT platform with .NET Framework 2.0. messages are processed when users change the size of their BMM consists of four components - hooker, data browser window, scroll within the window, move their recorder, data aggregator, and feedback window. The data mouse cursor, and so on. The number of mouse clicks and to hook are the number of keys pressed, events of program keyboard typing were also considered. We believe that focus changes, number of WM_PAINT events, mouse these activities are good indicators of user interest click and mouse wheel messages, and so on. Basically, the regarding the contents of Web pages. We have chosen hooker catches every message passed within the operating these logs because they can be measured without much system, so we should filter out irrelevant messages to effort. However, for scroll movement, we were unable to record only necessary data for our studies. For instance, obtain the position of the scrollbar on some of the Web because a WM_PAINT message is invoked whenever the pages, and the WM-PAINT messages can be affected by O/S needs to re-draw some parts of a window, we have to the dynamic content of certain Web pages. This means that be able to ignore the messages from unfocused windows we have to be careful when using these data as logs for and count the number of messages that are invoked for measuring user activities. only the currently focused browser window. The We did not record some of the behaviors that have been aggregator can acquire several properties of a Web page by considered by other researches – bookmarking, saving, using a Document Object Model (DOM). Acquired printing, and coping and pasting – because users do not properties are the viewing size of a document (in pixels), always show those behaviors on every valuable Web page, file size (in bytes), current location of the scrollbar, and and hence their records do not suit our purpose. character set of the page. The location of scroll bar is We collected some physical data of Web pages - the periodically updated so that the total displacement of the scroll height, file size, and URL information (top-level scrollbar can be estimated. However, a critical issue arises URL and depth of URL) - of each visited Web page. at several 'fancy' Web pages that have different structures Moreover, in the course of the experiments, some from standard Web documents, eventually yielding no data additional factors were included when they were required while accessing the DOM property. The data aggregator for analysis. The additional factors were the number of out- also aggregates all data from these multiple components, and the data recorder stores the aggregated data in a human-readable XML format for future analysis. After politics, economics, education, engineering, entertainment, Web searching, using the feedback window, users can science, health, and sports – with varying content size. The review the visited Web pages and choose radio buttons that twenty-five subjects read each page in their own desired ask about several types of assessments about the contents manner from the list of collected Web pages. Because we of each Web page. If the users do not want to answer wanted to exclude any effect of information clues, we questions regarding some of the Web pages, they can even simply provided numbers on the list without showing any remove the records easily. In figure 1, the structures of the information about the contents of the Web pages in feedback windows are shown. advance. Thus, the subjects were supposed to click the numbers in order to view the contents. To obtain the Subject appropriate data, the subjects were not told that some activities would be measured while they were viewing the We conducted 4 experiments, each with its own purpose. Web pages. During the experiments, the subjects’ activities The detailed concept of the experiments will be described while reading the Web pages, and some measurable data, in the next section. For each experiment, we recruited were recorded in a log file for future analysis. In addition, some graduate students who are majoring in computer whenever a subject finished reading a Web page, a small science for our subjects. Twenty-five students participated window appeared wherein the subject recorded his/her in the first experiment, 23 in the second, 19 in the third, interest level for the contents of the page. There were 5 and 12 students in the fourth. Among the students, 11 got levels of interest, and the subjects recorded their interest through the second, third, and fourth experiments, and one for the contents of a Web page accordingly. Due to some new subject volunteered for the fourth experiment. All of malfunctions of the BMM in the users’ browsing the students have a high level of knowledge and experience environment and a failure to properly obtain user feedback, about the Internet and the Web. We chose these students as the log files of 5 users were excluded. Therefore, we subjects because all of them use the Web not only for their analyzed 20 users’ log files. For the first experiment, we work but also for entertainment or distraction. Most of all, formulated the following simple hypotheses. they use the Web for a relatively long time each day so that we could gather plenty of data from their activities. It also means that we could observe their Web usage patterns 1. The number of processed log data is relatively higher on under various contexts. We paid about 20 dollars to each Web pages that contain interesting contents. subject for their participation in the first, second, and third 2. The amount of information in a Web page affects the experiments, respectively. For the fourth experiment, we amount of processed log data. paid 60 to 160 dollars to each subject according to the rate of the completed feedback. Experiment 2. Actually, the procedure of the second Experimental Concept and Procedure experiment was the same as the first experiment except that we collected ordinary Web pages that contain images, There are three main strategies for studying information- tables, videos, and frames. It was intended to see whether seeking behavior – laboratory experiments, sample surveys, there will be differences in usage patterns according to and field studies (Kellar et al. 2007). Considering these form of the Web page. When a subject finished reading all strategies, we designed four experiments and conducted of the Web pages, he/she activated a feedback window them in-series. In the first and second experiments, the wherein the subject could review all of the pages and subjects came to our laboratory and browsed some pre- answer some questions about each one visited. In this collected Web pages. In the third experiment, the subjects experiment, differently from the first experiment that performed given information-seeking tasks in our collected only the interest levels for the contents, we also laboratory. As a final step of each experiment, the subjects wanted to verify the influence of other subjective carried out feedback tasks in order to record their own assessments of Web pages - difficulty, complexity, and subjective assessments about each of the Web pages they credibility along with interest – on a 5-point scale. If a had browsed. The fourth experiment was carried out at the subject clicked one of the URLs on a visited page list in the subjects’ own residences. The subjects installed BMM on feedback window, the contents of the Web page appeared their PCs to collect their Web usage logs for a period of again, and the subject could then choose his/her points for about two weeks. For the feedback process of the fourth the questions regarding the subjective feedback. experiment, we let the subjects carry out the feedback tasks Experiment 3. We can find several different at least once a day. The first and second experiments were carried out in a blind mode in which the subjects could not categorizations of Web user behaviors in previous researches. Most recently, 4 task categories were provided see any information about the contents of each Web page in (Kellar et al. 2007) - fact finding, information gathering, before viewing them. In other words, no proximal cues (Chi et al. 2001) were provided. just browsing, and transactions. In (White and Drucker 2007), Web users are grouped into navigators and Experiment 1. The first experiment was a kind of explorers according to the level of visit variances. In preliminary study. We collected 120 Web pages that consideration of these previous works, we also classify a contain only text and offer information on various topics – user’s Web tasks into two groups. 0.28 0.24 0.14 Ex Feedback VT MM MW MC WP Ex1 interest 0.695 0.572 0.563 0.475 0.663 0.26 0.22 (**) (**) (**) (**) (**) 0.12 0.24 0.2 scroll -0.006 0.006 0.261 0.008 0.059 height 0.22 0.18 Means of Vewing Time Means of Vewing Time Means of Vewing Time 0.1 Ex2 interest 0.771 0.545 0.686 0.559 0.507 0.2 0.16 (**) (**) (**) (**) (*) complexity -0.391 -0.148 -0.599 0.18 0.14 0.08 -0.178 -0.196 (**) (**) (*) 0.16 0.12 difficulty -0.057 -0.476 -0.340 -0.532 -0.418 0.06 (**) 0.14 0.1 credibility 0.411 0.507 0.203 0.289 0.241 0.12 0.08 (*) 0.04 scroll 0.074 0.016 0.167 0.001 -0.059 0.1 0.06 height Ex3 interest 0.396 0.301 0.08 0.04 0.02 0.119 0.245 0.229 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 (*) (*) Interest Levels Credibility Levels Complexity Levels complexity -0.315 -0.129 -0.533 -0.162 0.040 (**) (**) (**) (**) difficulty 0.307 0.307 -0.124 0.182 -0.330 Figure 2. The viewing time according to feedback levels in the credibility 0.609 0.414 0.288 0.412 third experiment 0.389 (**) (**) (*) (**) scroll 0.011 -0.025 0.120 -0.036 -0.022 height details were the same as in the second experiment. Ex4 interest 0.442 0.315 0.258 0.306 0.282 Differently with the first and second experiments that (**) (KT) controlled the subjects’ activities in that the subjects could credibility 0.434 0.138 -0.010 0.222 0.124 (*) (KT) only visit the collected Web pages without any pre- scroll 0.056 0.001 0.117 0.017 0.001 information clues, in the third experiment, the subjects height (KT) could visit any Web page that they wanted and use any VT: Viewing time / MM: Mouse move / MW: Mouse wheel / search engine or portal site they wanted to use. Therefore, MC: Mouse click / WP : WM_PAINT / KT: Keyboard typing we observed a lot of re-visitation patterns. Thus, during the *: p-value of ANOVA test < 0.05 feedback phase, we let the subjects delete the logs of Web **: p-value of ANOVA test < 0.01 pages that they just used to find other Web pages to visit. In this way, we excluded the navigational Web pages. The Table 2. The values of correlation between feedback level and the concepts of the navigational Web pages will be given in amount of usage logs section 4.5. Experiment 4. For the fourth experiment, 12 graduate students participated - 4 females and 8 males. They Task 1: careful searching installed the BMM on their PCs and collected various logs This task is a type of information gathering that requires for about 2 weeks. Some of the subjects participated in our accuracy, trust, efficiency, and responsibility of the search experiment for 16 days. For their feedback, we encouraged results. In our experiment, the given task was to find some them to give their feedback levels of each visited Web information about their research topics. For examples, they page a 5-point scale and choose one of the task types. If a had to find some Web pages of laboratories in universities URL was not a content page according to the subject’s or companies that are related with their research topics and viewpoint, the URL could be deleted easily and BMM read the pages carefully to judge the relevance of the records a special number for the URL for future analysis. information. We encouraged the subjects to perform this In this experiment, we collected only three types of task as normally as possible. feedback – interest, credibility, and task types - because we wanted to minimize the subjects’ burden in answering Task 2: casual searching many questions for all of the visited Web pages visited. This task is a type of information gathering and browsing that can be performed without any burden or responsibility regarding the search results. For example, the subjects Result could search for some information about their hobbies, favorite products to buy, famous tourist spots, favorite In the series of experiments, we measured the numbers of sports or movie stars, and so on. We also encouraged the several processed messages on each visited Web page and subjects to perform these tasks as normally as possible. normalized the value using min-max normalization according to each subject. We included this normalization procedure because there would be variances in the amount The subjects performed the two tasks with their own of usage logs due to the subjects’ individual differences. topics for about 2 hours. The logging data and feedback 0.35 0.35 pages is not an important factor. Differently from the 0.3 0.3 results of interest level, the difficulty and complexity levels Viewing Time under High Credibility showed a negative correlation with the amount of usage Viewing Time under High Interest 0.25 0.25 logs. The credibility levels showed positive correlations 0.2 0.2 with the amount of usage logs but the differences of the amounts among the levels are not statistically significant. 0.15 0.15 From the results, we concluded that the interest level exerts 0.1 0.1 the most significant influence on the amount of usage logs, and that users are inclined to quickly leave Web pages that 0.05 0.05 have difficult contents or complex structures without many 0 1 2 0 1 2 interactions. Finally, we found that there were low Task(careful/casual) Task(careful/casual) correlations between the amount of usage logs and the (a) sizes of Web pages except for the amount of the mouse 0.35 0.25 wheel log. This was not different with the results of the first experiment. 0.3 Viewing Time under High Credibility Viewing Time under High Interest 0.2 0.25 Experiment 3 0.15 0.2 In figure 2 and table 2, we can see that the viewing time 0.15 and amount of mouse movement have positive correlations 0.1 with the interest levels, and that the differences of the 0.1 amounts among the interest levels are also statistically 0.05 0.05 significant. The amount of mouse wheel use, mouse clicks, and processed WM_PAINT messages also showed positive 0 1 2 0 1 2 correlations with interest levels, but the differences were Language(KOR/ENG) Language(KOR/ENG) not statistically significant. The amount of usage logs (b) increased according to the complexity levels, but dropped Figure 3. (a) The differences of viewing time according to task steeply at level 5. The difficulty levels showed no large types (b) the differences of viewing time according to languages correlation with the amount of usage logs. The most interesting pattern that we found in the results of the third experiment was that the amount of usage logs showed a positive correlation with the credibility levels, and that the Experiment 1 differences of the amounts of usage logs among the From the results of the first experiment, we found some credibility levels were statistically significant. This result interesting patterns. As we can see in table 2, there were was not found in the results of the second experiment in positive correlations between the amount of all usage logs which users browsed pre-collected Web pages without and interest levels. Furthermore, from a one-way ANOVA proximal cues. Therefore, we concluded that the usage logs test, we also found that the amount of the logs shows are under the influence of credibility levels as well as significant differences among the interest levels. Based on interest levels in ordinary Web browsing environments. this result, we temporally concluded that users have a In the third experiment, we also checked whether there tendency to interact more at high-interested Web pages, are differences in the amount of usage logs according to and hence all the logs can be used as implicit interest the task types and written languages used. From figure 3, indicators. One more interest thing is that there was a low we found that there was a general trend of more interaction correlation between the amount of usage logs and the size logs recorded during a careful task than during a casual one, of the Web pages except for the amount of the mouse especially on pages of the highest interest and credibility wheel log. levels. For written languages, there was a general trend of more interaction logs on English pages than on Korean Experiment 2 pages, especially on the pages with the highest interest levels, but there was no large difference according to Actually, we thought that there would be some differences credibility levels. These results showed us that the type of between the result patterns of the first experiment and task and written languages used also should be considered those of the second experiment because the forms of the as important influential factors that make differences in the Web pages were quite different. However, there were no amount of usage logs created. big differences between the results. Table 2 shows us that there were also positive correlations between the amount of Experiment 4 all usage logs and interest levels, similarly with the results of the first experiment. In addition, we also found In the fourth experiment, there were some logs that contain significant differences in the amount of usage logs among an excessively long viewing time because the experiment the interest levels. This means that the form of the Web has been conducted in the users’ personal environments. 4500 correlation coefficient 4000 0.8 3500 0.6 no. of pages 3000 0.4 2500 0.2 2000 0 1500 0 5 10 15 20 25 30 1000 maximun cutline (min.) p-value (ANOVA) 500 0.2 0 0.15 0 10 20 30 40 50 60 70 view time (sec) 0.1 Figure 4. The distribution of viewing time 0.05 0 0 5 10 15 20 25 30 maximun cutline (min.) From figure 4, we can find that the users stayed on 99% of the all visited Web pages for at most 346 seconds. We also Figure 5. The correlation coefficients (top) and p-values of found that there were some visited logs that showed a significance test (bottom) according to maximum cutline viewing time of over 30 minutes. This means that we should find a maximum cutline in order to filter out some logs as simply noise. We set various values to the cutline, Day frequency and feedback pages vs. non-feedback from 3 to 25 minutes. As we set the cutline values pages. Because most of target pages that users want to differently, we excluded logs in which the viewing time access can be reached via portal sites, news sites, and was above the cutline, and then normalized each user’s search engines, we thought that the front pages of these viewing time to his/her scale. Finally, we checked whether sites and hub pages within the sites may appear in the the magnitude of the cutline made an impact on the visited URL history more frequently than others. For applicability of the viewing time as an indicator. From example, when a user wants to read a newspaper, he/she figure 5, we can see that a reasonable cutline should be set visits the home page of news site and clicks on some links to somewhere between 5 and 18 minutes in order to that seem to contain interesting news. In a similar manner, observe a high positive correlation between the viewing whenever a user wants to find some information, he/she time and interest level, and the statistical difference among may visit the front page of a search engine first and then the viewing times in each interest level. For example, if we click on one of the links that the search engine retrieves. set the maximum cutline to 14 minutes, the viewing time Similarly, if the user wants to log onto some commercial shows a positive correlation with the interest level (r = sites or even his/her own Web mail accounts, he/she should 0.5522), and according to the result of a one-way ANOVA first visit the front page of the service and input his/her test, the differences among the viewing times of each level username and password in order to proceed. Therefore, if are significantly different (p = 0.0092). This means that we we look over the users’ visited URL histories, the can use viewing time to identify interested Web pages navigational pages - the front pages of portal sites, news based on the fact that users will stay for a relatively longer sites and search engines, and any type of hub page – will time on them than on uninterested Web pages. In addition, appear more frequently than others. Moreover, if the users we found that when we want to infer users’ interest based visit Websites according to their daily routine, they will on the viewing time, a careful noise-filtering task is visit some of the Websites everyday in their regular absolutely required. Therefore, we excluded logs that patterns. In this respect, we thought that the URLs of contained over 15 minutes of viewing time in the fourth navigational pages might be found in logs from each day. experiment. In figure 6 and table 2, we can see that only On the contrary, the content pages were shown relatively the viewing time showed positive correlations and statistically significant differences among the levels of 0.04 0.04 interest and credibility. It is very interesting that we could not find significant differences between other usage logs 0.035 0.035 and feedback levels. The differences in the amount of Means of Vewing Time Means of Vewing Time usage logs according to the task types were similar with the 0.03 0.03 result of the third experiment. 0.025 0.025 Additional Findings from Experiment 4 Because the fourth experiment was conducted during a 0.02 0.02 period of about 2 weeks, we can observe some more historical patterns that could not be observed in previous 0.015 1 2 3 4 5 0.015 1 2 3 4 5 Interest Levels Credibility Levels experiments. In this section, we introduce some additional findings. Figure 6. The viewing time according to feedback levels in the fourth experiment 16000 100 90 logs p-value 14000 80 URL Depth 0.0623 12000 70 Day Frequency 0.0003 (*) 10000 60 Viewing Time 0.0206 (*) 8000 50 Mouse Move 0.5314 40 Mouse Click 0.5258 6000 Mouse Wheel 0.0181 (*) 30 4000 Keyboard typing 0.0349 (*) 20 2000 10 Table 3. The results of significance test for difference of the 0 1 2 0 1 2 values of each interaction log between feedback pages and (a) (b) non-feedback pages: (*) means significant Figure 7. (a) The number of feedback pages and non-feedback our expectation. The 12 subjects have mainly deleted the pages and (b) the average number of outlinks contains: 1 – home pages of search engines, retrieved lists of search feedback page / 2- non-feedback page engines, the first pages of portal sites, news lists, home pages of community sites, online banking sites, intranet rarely because the users don’t usually view the same front pages and so on as non-feedback pages. In some of contents again and again. the previous researches, we found that there were several Based on the considerations that we have mentioned so attempts to discriminate content pages from navigational far, we formulated a very simple hypothesis - everyday- pages using the number of outlinks that are contained in visited URLs have a strong chance to be navigational the pages (Cooley et al. 1999; Fu et al. 2001; Domenech pages. For the hypothesis, we created a variable named and Lorenzo 2007). The main idea is that there will be a Day Frequency (DF). The concept of DF is very similar to larger number of outlinks on navigational pages than on document frequency, which is often used in information contents pages. We also thought that this idea is acceptable retrieval and text mining (Salton and McGill 1986), and so we counted the average numbers of contained outlinks DF value of each visited URL can be calculated using in both feedback pages and non-feedback pages. However, equation (1). as we can see in figure 7, the number of outlinks on feedback pages was higher than on non-feedback pages in |{d j : Urli  d j }| DFi  (1) our results. Therefore, we examined carefully whether the |D| DF values in feedback pages and non-feedback pages are significantly different. As we can see in figure 8, the In this equation, | D | is total number of days in average DF value of non-feedback pages is higher than the experiment, d j is the URL collection of the j-th day and values of feedback pages, and the difference is statistically | {d j : Urli  d j } | means the number of days where i-th significant (p = 0.0003). We found that the amount of some URL appears. If a URL exhibits a high value of DF, the usage logs was also different between feedback and non- URL is thought to be inappropriate for content extraction feedback pages. From table 3, we can see that viewing time, and should be regarded as a navigational page. the amount of mouse wheel use, and the amount of In the fourth experiment, the selection of a contents page keyboard typing were significantly different. was fully up to the subject’s subjective decision. Even Task Identification by Visited URLs. We believed that though we did not explain the concept of navigational users have their own URL lists that are specific to their pages in detail, they found by themselves that there are current tasks because they may use the Web based on their naturally several Web pages that may not be fit for individual previous experiences on the Web. In this expressing their feedback levels. As we can see in figure 7, respect, we analyzed the top-level URLs that users visited the number of non-feedback pages was much greater than during the period of the experiment. As we can see in table 4, over 90% of visited URLs were separable by the tasks. 6 0.25 0.08 5 0.06 user No. task separable (%) 0.2 0.04 0.02 1 92.68 4 0.15 0 1 2 3 2 92.59 3 1 - viewTime / 2 - mouseMove / 3 - mouseClick 3 93.17 0.04 2 0.1 0.03 4 95.77 0.05 0.02 5 75 1 0.01 6 93.86 0 1 2 0 1 2 0 1 2 7 100 (a) (b) 1 - mouseWheel / 2 - keyPress 8 90.57 Figure 8. The URL depth of feedback pages and non-feedback 9 89.29 pages (left - a) and the DF values (left - b) : 1 – feedback page / 10 97.40 11 91.07 2- non-feedback page and the mean values of interaction logs: on 12 96.21 feedback pages (right - left bars) and on non-feedback pages (right - right bars) Table 4. The proportion of task separable URLs 40 60 60 observed in the results of the second experiment in which 35 the subjects visited pre-collected Web pages even without 30 50 50 any pre-clue about the contents. In addition, there were No. of Top-level URL significant differences among the amounts of all usage logs 25 40 40 according to interest levels in the results of the second experiment, but only the amount of viewing time and Task1 Task2 20 30 30 mouse movements were affected by the interest levels in 15 20 20 the results of the third experiment. 10 Experiment 3 vs. Experiment 4. Differently from the 10 10 5 results of the third experiment, we observed that the 0 0 0 viewing time only showed a significant relation with the 1 2 Task(careful/casual) 2 4 6 8 10 12 Day 2 4 6 8 10 12 Day interest and credibility levels in the results of the fourth experiment. This means that the more natural the environment is, the more unknown factors will exert their Figure 9. The average number of URLs in each task (left), the influences on the usage patterns. We also observed in the increasing rate of average number of URLs in careful task results of the third experiment that there are some (middle), in casual task (right) differences in usage patterns according to the task types, such that the amount of usage logs on interested Web In other words, 90% of visited URLs belong to a specific pages in careful tasks is higher than in casual tasks. The task only, and hence we can infer the types of current task same result was observed in the fourth experiment. Finally, easily by checking the top-level URLs. Moreover, as we from historical data analyses, we found that Day can see in figure 9, the number of URLs that users visited Frequency and some usage logs are significantly different in the casual tasks is much higher than in the careful tasks. according to the page types. The most interesting patterns are the increasing rates of the Summary. We also briefly summarized all of the observed number of visited URLs as time goes on. The number of patterns as the following. visited URLs in the tasks of casual searching increased 1) Generally, the amount of usage logs is not under the more drastically than in careful searching. This means that influence of the size and form of the Web page. the subjects showed the navigator’s patterns in careful searching tasks but showed the explorer’s patterns in 2) Information scents exert noticeable influence on usage casual searching tasks (White and Drucker 2007). We patterns such that Web users choose links to visit based on believe that this pattern is meaningful in developing information scents, and the scents also cause the users to personalization schemes that are adaptive to current task show some uncertain usage patterns while they are viewing types. Web pages. 3) The viewing time is the best log to be used as an implicit feedback indicator if it is pre-processed carefully. Discussion and Future Work It means that we have to analyze the viewing time more carefully than other logs to develop personalization Review of the Result and Summary services that are adaptive to user interest. We analyzed the results of 4 experiments and recognized 4) The viewing time is under the influence of interest and that there are noticeable differences in usage patterns credibility levels. In other words, interest and credibility according to the experimental environment. In this section, levels are the most influential contextual factors in a we briefly summarize the interesting differences. natural Web environment. The difficulty and complexity levels do not create noticeable variations on the amount of Experiment 1 vs. Experiment 2. The forms of the Web usage logs. pages that the subjects visited in the first and second experiments were different, but we could not see large 5) The viewing time is also under the influences of current differences between the results of the two experiments. tasks, written languages, and page types. In addition, page Moreover, the amount of usage logs was not influenced by types are also influential on the variations of other usage the amount of contents or size of the Web pages. We logs such that the amount of mouse wheel use, number of believe that this pattern came from the fact that Web users visits in a day, and the amount of keyboard typing were read Web pages in a nonlinear pattern, and that there are significantly different based on the page types. some unique characteristics in reading digital documents 6) Web users visit different Websites when they are (Liu 2005). performing different tasks and they show different Experiment 2 vs. Experiment 3. In the results of the third navigational patterns according to the task types. experiment in which the subjects freely select the Web 7) We recognized that some historical and experiential pages to visit, we observed that the credibility levels aspects that may not be observed in short time analysis can regarding the contents exert a noticeable influence on the only be found in long time analysis. amount of usage logs, but the same pattern has not been Limitations of the experiments Although many interesting patterns were observed from our experiments, we also acknowledge the limitations of our experiments. We cannot expect that the observed patterns will generalize to a general population because we recruited small number of people from same population for our subjects according to our experimental convenience. However, the results show us valuable usage patterns of experienced Web users and consequently provide us with a good insight into further researches. Future Work As we already discussed in previous sections, the viewing time is under the influence of various factors. We cannot Figure 10. A possible practical solution- the arrows on the left decide what service applications are to be activated based shows their influential relationships and the arrows on the right solely on the fact that viewing time increases on a current means that the logs can be used for data preparation tasks Web page, because the viewing time will be affected by various factors - interest levels, credibility levels, page types, tasks, and written languages. Therefore, to find a user’s characteristics and select the applications References accordingly, it is necessary to intelligently detect what factors are currently influencing the usage patterns. We Al halabi W. S.; Kubat. M.; and Tapia M. 2007. Time think that it will be very challenging to find current spent on a web page is sufficient to infer a user's interest. contextual factors intelligently, but we also think that the In Proceedings of the IASTED European Conference: current factors can be identified through some careful internet and multimedia systems and applications, 41-46, statistical analyses on various historical usage patterns. For Chamonix, France: ACTA Press. example, as we already discussed in section 4.5, the URLs Badi R.; Bae S.; J. Moore M.; Meintanis K.; Zacchi A; of the Web pages that users are currently viewing will give Hsieh H.; Shipman F.; and Marshall C. C. 2006. us information of the current task types. In addition, Recognizing user interest and document value from because Web users have a tendency to choose Websites to reading and organizing activities in document triage. In visit according to their own previous experiences about the Proceedings of the 11th international conference on sites, the URLs are also useful for inferring the users’ Intelligent user interfaces, 218-225, Sydney, Australia: subjective feedback levels on the contents of Web pages if ACM. we monitor user activities for a long period. Actually, in Borlund, P. and Ingwersen, P. 1997. The Development of a the post interviews of the third experiment, the subjects Method for the Evaluation of Interactive Information told us that they use different search engines according to Retrieval Systems. Journal of Documentation, 53(3):225- their current tasks. For examples, they use Google for 250. careful tasks and Naver – a Korean portal site - for casual tasks. Therefore, we assume that URL information can be Byström K. and Järvelin K. 1995. Task complexity affects used very effectively for the purpose of inferring the user’s information seeking and use. Information Processing and contexts. The similarity between the contents of current Management, 31(2):191-213. Web pages and contents of previous high-interested Web Chi E. H.; Pirolli P.; Chen K.; and Pitkow J. 2001. Using pages can also be used to infer the interest levels on the information scent to model user information needs and current Web pages. Furthermore, the Day Frequency can actions and the Web. In Proceedings of the SIGCHI be used to infer the types of Web pages viewed. conference on Human factors in computing systems, 490- If our system can infer the current contextual factors 497, Seattle, Washington, United States: ACM. intelligently, some proactive services can be provided. In Choi J.; Lee G.; and Um Y. 2007. Analysis of Internet figure 10, we present the concept of a data preparation Users’ Interests Based on Windows GUI Messages. In service that we are developing in which unnecessary visit Proceedings of the 12th International Conference on logs and uninterested contents can be filtered out. In Human-Computer Interaction, Lecture Notes in Computer addition, if the system can identify a user’s current task Science, 4553:881-888.: Springer Berlin / Heidelberg. type correctly, the threshold of the viewing time to find Cooley R.; Mobasher B.; and Srivastava J. 1999. Data high-interested Web pages can be applied accordingly. Preparation for Mining World Wide Web Browsing Finally, we should consider individual differences Patterns. Knowledge and Information Systems, 1(1):5-32. because there may be variances according to user preference, cognitive styles, temperament, and so on. Domenech J. M. and Lorenzo J. 2007. A Tool for Web Usage Mining. In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, Kim K. and Allen B. 2002. Cognitive and task influences 4881:695-704, D.: Springer Berlin / Heidelberg. on Web searching behavior. Journal of the American Fogg B. J.; Soohoo C.; Da-nielson D. R.; Marable L.; Society for Information Science and Technology, Stanford J.; and Tauber E. R. 2003. How do users evaluate 53(2):109-119: John Wiley & Sons. the credibility of Web sites?: a study with over 2,500 Liu Z. 2005. Reading behavior in the digital environment. participants. In Proceedings of the 2003 conference on Journal of Documentation, 61(6):700-712. Emerald Group Designing for user experiences, 1-15, San Francisco, Publishing Limited. California: ACM. Mobasher B.; Cooley R.; and Srivastava J. 2000. Fu Y.; Shih M.; Creado M.; and Ju C.. Reorganizing web Automatic personalization based on Web usage mining. sites based on user access patterns. 2001. In Proceedings of Communications of the ACM, 43(8):142-151: ACM. the tenth international conference on Information and Nakamichi N.; Shima K.; Sakai M.; and Matsumoto K. knowledge management, 583-585, Atlanta, Georgia, USA,: 2006. Detecting low usability web pages using quantitative ACM. data of users' behavior. In Proceedings of the 28th Gauch S.; Speretta M.; Chandramouli A.; and Micarelli A. international conference on Software engineering, 569-576, 2007. User Profiles for Personalized Information Access. Shanghai, China: ACM. The Adaptive Web, Lecture Notes in Computer Science, Rieh S. Y. 2002. Judgement of information quality and 4321:54-89,: Springer Berlin / Heidelberg. cognitive authority in the Web. Journal of the American Hofgesang P. I. 2006. Relevance of Time Spent on Web Society for Information Science and Technology, Pages. In Proceedings of KDD Workshop on Web Mining 53(2):145-161: John Wiley & Sons. and Web Usage Analysis, in conjunction with the 12th Salton G. and McGill M. J. 1986. Introduction to Modern ACM SIGKDD International Conference on Knowledge Information Retrieval: McGraw-Hill. Discovery and Data Mining, Philadelphia, PA. Seo Y. W. and Zhang B. T. 2000. Learning user's Johnson J. D. 2003. On contexts of information seeking. preferences by analyzing Web-browsing behaviors. In Information Processing & Management, 39(5):735-760: Proceedings of the fourth international conference on Elsevier Autonomous agents, 381-387, Barcelona, Spain: ACM. Kari J. and Savolainen R. 2007. Relationships between Sonnenwald D. H. 1999. Evolving Perspectives of Human information seeking and context: A qualitative study of Information Behavior: Contexts, Situations, Social Internet searching and the goals of personal development. Networks and Information Horizons. Exploring the Library & Information Science Research, 29(1):47-69: contexts of information behaviour, 176-190: Taylor Elsevier Graham Publishing. Kellar M.; Watters C.; Duffy J.; and Shepherd M. 2005. Vakkari P. 1999. Task complexity, problem structure and Effect of Task on Time Spent Reading as an Implicit information actions: integrating studies on information Measure of Interest. In Proceedings of the American seeking and retrieval. Information processing & Society for Information Science and Technology, management, 35(6):819-837: Elsevier. 41(1):168-175. Vakkari P. 2001. A theory of the task-based information Kellar M.; Watters C.; and Shepherd M. 2007. A Field retrieval process: a summary and generalisation of a Study Characterizing Web-based Information Seeking longitudinal study. Journal of Documentation, 57(1):44- Tasks. Journal of the American Society for Information 60: Emerald Group Publishing Limited. Science and Technology, 58(7):999-1018: John Wiley & Wang Y. D. and Emurian H. H. 2005. An overview of Sons. online trust: Concepts, elements, and implications. Kelly D. and Belkin N. J. 2004. Display time as implicit Computers in Human Behavior, 21(1):105-125: Elsevier. feedback: understanding task effects. In Proceedings of the Wang P.; Hawk W. B.; and Tenopir C. 2000. Users' 27th annual international ACM SIGIR conference on interaction with World Wide Web resources: an Research and development in information retrieval, 377- 384, Sheffield, United Kingdom: ACM. exploratory study using a holistic approach. Information processing & management, 36(2):229-251: Elsevier. Kelly D. and Cool C. 2002. The effects of topic familiarity Wathen C. N. and Burkell J. 2001. Believe it or not: on information search behavior. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 74-75, Factors influencing credibility on the Web. Journal of the American Society for Information Science and Technology, Portland, Oregon, USA. 53(2):134-144: John Wiley & Sons. Kelly D. and Teevan J. 2003. Implicit feedback for inferring user preference: a biblio-graphy. ACM SIGIR White R. W. and Drucker S. M. 2007. Investigating behavioral variability in web search. In Proceedings of the Forum 37(2):18-28. ACM. 16th international conference on World Wide Web, 21-30, Kelly D. 2004. Understanding implicit feedback and Banff, Alberta, Canada: ACM. document preference: a naturalistic user study. Ph.D. Dissertation, Rutgers University. 2004.