Analysis of Web Usage Patterns in Consideration of Various Contextual
                                                                      Factors

            Jinhyuk Choi                                           Jeongseok Seo                                 Geehyuk Lee
 Korea Advanced Institute of Science and                Information and Communications                 Korea Advanced Institute of Science and
         Technology (KAIST)                                      University (ICU)                              Technology (KAIST)
       119, Munjiro, Yuseong-gu                             119, Munjiro, Yuseong-gu                         119, Munjiro, Yuseong-gu
  Daejeon, 305-732, Republic of Korea                  Daejeon, 305-732, Republic of Korea              Daejeon, 305-732, Republic of Korea
            demon@kaist.ac.kr                                     chaoticblue1@icu.ac.kr                        geehyuk@kaist.ac.kr


                              Abstract                                          necessary to learn more about the user and to build a user
  It is important to analyze user’s Web usage logs for                          model based on this knowledge. This personalization
  developing personalized Web services. However, there are                      process is a main topic of research on Web usage mining
  several inherent difficulties in analyzing usage logs because                 (Mobasher et al. 2000; Gauch et al. 2007). However, it is
  the kinds of available logs are very limited and the logs                     not easy to learn more user information because we cannot
  show uncertain patterns due to the influences of various                      explicitly ask the user about his/her characteristics or what
  contextual factors. Therefore, speculating that it is necessary               he/she is thinking at any particular time we want to know.
  to find what contextual factors exert influences on the usage                 This means that we have to find another way to learn more
  logs prior to designing personalized services, we conducted                   information about them. From this perspective, many
  several experiments in-series not only in situations of
                                                                                researchers have looked for effective implicit methods to
  performing designed tasks during short time periods but also
  in users’ natural Web environments during a period of                         learn more about users, and many intelligent methods have
  several days. From the results of our experiments, we found                   been actively suggested by several researchers (Kelly and
  that interest levels, credibility levels, page types, task types,             Teevan 2003; Kelly 2004; Kelly and Belkin 2004; Kelly
  and languages are influential contextual factors in a natural                 and Cool 2002; Choi et al. 2007; Hofgesang 2006; Seo and
  Web environment. Moreover, some historical and                                Zhang 2000; Badi et al. 2006; Al halabi et al. 2007; Kellar
  experiential patterns that could not be observed in short time                et al. 2005). In their researches, usage logs that are stored
  analysis were discovered in the results of long time analysis.                while users visit Web pages have been used to learn about
  These findings will be useful for other researchers,                          particular user interests. For examples, the URLs of visited
  practitioners, and especially for developers of adaptive                      Web pages, visit period, dwelling time, mouse clicks,
  personalization services.
                                                                                mouse movement, keyboard typing, and visit frequencies
                                                                                on each Web page have been applied as implicit interest
                                                                                indicators.
                          Introduction                                             Although many successful results have been provided so
The World Wide Web has a unique characteristic in that                          far, there are several inherent difficulties in analyzing
the amount of contained information is continuously                             usage logs and extracting necessary information from them.
increasing and yet can still be reached easily by users                         The first difficulty comes from the fact that the kinds of
through various Web services. Moreover, it provides                             available usage logs are very limited, and there are no
various types of media so that users can use it for multiple                    standard ways to interpret the meaning of usage patterns.
purposes. Therefore, it is very important for researchers                       This means that we have to carefully investigate usage
and practitioners to make the Web even more effective for                       patterns prior to using the logs as effective indicators.
finding necessary information.                                                  Secondly, Web users are under the influence of various
   One of various means by which we can make the Web                            contextual factors while they use the Web, as it has
more useful is to develop intelligent information delivery                      multiple aspects as a simple information tool, social
in order to allow users to find their target information more                   communication mediator, entertainment source, and so on.
effectively. A core part of intelligent information delivery                    Therefore, usage logs will show very uncertain patterns
is to search through personalized contents without the                          because various contextual factors will exert their
user’s explicit participation. For personalization, it is                       influence on the usage patterns concurrently (Kelly and
                                                                                Belkin 2004). The third difficulty is related with the
                                                                                historical aspect in that a user’s experiences also exert
Copyright © 2009, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
                                                                                influences on the variation of usage patterns. Therefore, a
                                                                                Web usage pattern analysis should be a long-term process
because it cannot be adequately performed by studying          is considered to be influenced by the particular task,
only short-time usage. In addition, to analyze a user’s        information need, knowledge state, cognitive style,
various characteristics, the usage data should be collected    affective state, and so on. They measured users’ cognitive
at the browser side in the user’s real Web environment for     styles and affective states before a user study, applying a
a long period without any constraint on a specific Web         process-tracing technique while users were conducting
server.                                                        information-seeking tasks, and found various types of
   This paper details the results of our experiments in        relationships among the elements of the dimensions. In
which we initially tried to find the possibilities of          (Fogg et al. 2003), based on results of an online qualitative
overcoming the above difficulties. For our experiments,        study, the credibility for Web contents were considered
various usage logs have been collected at the browser side     and important factors of credibility were suggested to
and carefully analyzed not only in situations of performing    formulate Web design guidance. In (Wathen and Burkell
designed tasks during short time periods but also in users’    2002), the authors asserted that users filter out most of the
natural Web environments during a period of several days.      gathered information and retain only useful information. In
We obtained several interesting findings from the results.     addition, they concluded that the credibility or believability
We think that these findings will be useful for other          of information is one of the most important criteria for the
researchers, practitioners, and especially for developers of   filtering. In (Rieh 2002), the authors found that users judge
personalization services.                                      cognitive authority and information quality by two types of
   This paper is organized as follows. In section 2, we        judgment - predictive judgment and evaluative judgment –
review some of the related researches. In section 3, we        and they also identified the main facets and keywords of
describe our experimental procedure and the results that       the judgments through a user study.
have been obtained so far are given in section 4. In section      From these researches, we found that human information
5, summary and future works are introduced.                    behavior cannot be studied without the consideration of the
                                                               influences of various types of contextual factors. However,
                                                               because the purposes of these researches were not to
                    Related Work                               develop an intelligent system but to construct theoretical
                                                               models, they did not study quantitatively how the Web
Human Information Behavior                                     usage patterns reflect the influences of various contextual
                                                               factors.
There have been a lot of studies that have focused on
human information behavior analyses in various research
fields. In those studies, the researchers have focused on      Web Usage Mining
several contextual factors that affect a user’s behavior,      There has been a lot of effort to quantitatively measure the
conceptualizing the relationships between information-         influences of contextual factors on Web user behaviors
seeking behavior and contextual factors. In (Sonnenwald        based on various usage logs in the field of Web usage
1999), the authors proposed an evolving framework in           mining. Among the various factors, user interest toward
which cognitive, social, and system perspectives are           content has been the main focus of researchers. The
incorporated. In the framework, human information              various implicit indicators of user interest can be found in
behavior including information exploration, seeking,           (Kelly and Teevan 2003). In (Kelly 2004), the familiarity
filtering, use, and communication were included. Based on      of a topic has been discussed, and the authors concluded
the framework, various influential factors - physical,         that as one’s familiarity with a topic increases, his/her
cognitive, affective, economic, social, and political – and    searching efficacy increases and reading time decreases.
their implications were investigated. In (Johnson 2003), the   For user characteristics, cognitive and problem-solving
needs of an information-seeking behavior analysis in a         styles were studied in (Kim and Allen 2002). In their study,
multi-contextual environment were presented and a              the authors observed various user activities - average time
theoretical framework was suggested. The authors of (Kari      spent, average number of Websites viewed, average
and Savolainen 2007) asserted that users are also              number of bookmarks made, and average number of times
improving along with the change of information                 a search/navigational tool was used for completing a search
environment, and they found 11 relationships between           task – while the users performed two types of given tasks
individual developmental objectives and information            in an experimental environment, and the authors found that
searching via the Internet. In (Byström and Järvelin 1995;     there are significant differences among user activities
Borlund and Ingwersen 1997; Bystrom 2002; Vakkari              according to the type of task and user’s problem solving
1999; Vakkari 2001), the influence of task complexity on       style. For usage logs, the display time was discussed most
information seeking behaviors was investigated. An             actively. In (Kelly 2004; Kelly and Belkin 2004), based on
overview of the nature of trust, and a framework of trust-     gathered data from 7 subjects for 14 weeks, the
inducing interface design features, were given in (Wang        relationships between display time and various factors –
and Emurian 2005). Particularly in (Wang et al. 2000), the     task, topic, usefulness, endurance, frequency, stage,
authors introduced a multidimensional model of user-web        persistence, familiarity, and retention – were investigated,
interaction, and three dimensions – user, interface, and the   and the authors concluded that the display time is not
Web – were considered. In the model, the user dimension        suitable for inferring a user’s interest because there is great
variation between display time and interest according to                Task            Contextual    Usage logs        Period
the user; large differences according to the task at hand                               factors
also appear. On the contrary, in (Choi et al. 2007), the         Ex1    Visit           Interest      Viewing time      2 hrs
                                                                        collected                     Mouse movement
viewing time has been used as a good implicit indicator,                pages                         Mouse wheel
and in (Hofgesang 2006), the authors made an assertion                  (text only)                   Mouse clicks
that time spent on a Web page is more important than visit                                            WM_PAINT
frequency in inferring a user’s interest. In (Seo and Zhang      Ex2    Visit           Interest      Viewing time      2 hrs
2000), bookmarking, time for reading, following up the                  collected       Complexity    Mouse movement
HTML document, and scrolling were used as relevant                      pages           Difficulty    Mouse wheel
                                                                                        Credibility   Mouse clicks
activities, and a machine learning algorithm was applied to                                           WM_PAINT
learn the user’s characteristics. In (Badi et al. 2006),         Ex3    Free visits     Interest      Viewing time      2 hrs
various parameters of document attributes, document                     / given tasks   Complexity    Mouse movement
reading activities, and document organizing activities were                             Difficulty    Mouse wheel
investigated to recognize user interest and document values.                            Credibility   Mouse clicks
                                                                                        Task type     WM_PAINT
In (Kellar et al. 2005), the authors found that the time spent   Ex4    Free visit      Interest      Viewing time      2 wks
is more useful for more complex Web searching tasks. In                 / free tasks    Credibility   Mouse movement
(Nakamichi et al. 2006), the authors also used several                                  Task type     Mouse wheel
quantitative data of user behavior – browsing time and                                                Mouse clicks
moving distance, moving speed, and wheel rolling of the                                               Keyboard typing
                                                                                                      Visit frequency
mouse – to detect low usable Web pages.                                                               Day frequency
   Most of the researches have analyzed usage logs with
the intention of developing an intelligent system that learns
user characteristics and builds a user model. However,             Table 1. The environment and gathered data of experiments
most of the studies did not fully consider the influences of
various contextual factors, or they focused only on a user’s     considered the users’ attitudes toward the current task as
interest without consideration of other types of subjective      one of the contextual factors. Actually, the types of user
feedback together. Moreover, most researches except              task can be classified into detailed categories – information
(Kelly, 2004; Kelly and Belkin 2004) did not consider the        seeking, fact-finding, transaction, and browsing (Kellar et
historical aspects of usage data that can only be gathered       al. 2007). However, we classified user tasks into only two
by a long-time analysis in a user’s natural Web                  categories – careful searching and casual searching -
environment.                                                     according to the users’ attitudes toward the current task. A
                                                                 detailed description of the task categorization appears in
                                                                 section 3.5. There are more contextual factors that cause
                    Our Approach                                 users to interact with Web pages. For example, a user may
                                                                 stay for a relatively long time at a specific Web page
Before everything else, we reviewed previous related             because there are interesting contents there, or the user
researches carefully and collected contextual factors for        feels that the contents are more useful than others.
consideration and usage logs that can be obtained at the         Sometimes, the user may roll the mouse wheel more
browser side. The contextual factors and usage logs that we      frequently on one Web page than on others because he/she
considered are given in Table 1.                                 wants to read the entire content of the page carefully. In
   We carried out not only a qualitative analysis but also a     this regard, we selected some further factors that may exert
quantitative one. For ecological validity, we also observed      an influence on user interactions with Web pages. The
users in their own personal places. Because some of the          factors are interest, credibility, complexity, and difficulty.
contextual factors are inherently subjective and cannot be       The complexity factor tells us how users feel about the
measured with only usage logs, we collected various types        layout structure of a Web page, and hence it may include a
of feedback regarding the current context directly from          user’s subjective viewpoint of usability and familiarity. We
users. However, to minimize the burden on the users in this      also included the difficulty factor because we thought that
study, we tried to minimize the number of feedback               user behavior is subject to variation according to a
questions as much as possible. We developed software that        subjective assessment of the difficulty of the contents
runs on each user’s PC in order to collect their behavior        displayed.
logs and feedback in their Web browsing environments.
                                                                 Web Usage Log
Contextual Factor
                                                                 Implicit user interest analysis has shown good performance
Contextual factors include subjective assessments about          at the server-side especially for commercial Websites.
contents, situational factors, a user’s individual               However, in spite of the fact that it is easier to analyze user
characteristics, and so on. Because these factors cannot be      interest at the server-side, currently many researchers have
measured systemically, we designed a process in which we         focused on browser-side analyses because user interest can
can obtain the users’ subjective feedback directly. First, we    be analyzed from various Websites, and a user model can
                                                                  links on a Web page, the Web page type, and the language
                                                                  presented (e.g., Korean or English).
                                                                     We also considered carefully some historical factors that
                                                                  can be analyzed only through relatively long periods of
                                                                  monitoring. The historical factors include visit frequencies
                                                                  and day frequency. Among those factors, day frequency is
                                                                  a new concept that has not been introduced before. A
                                                                  detailed description of day frequency will be given in a
                                                                  later section.

                                                                  Data Collection Software
                                                                  In some of the previous researches, custom-built browsers
   Figure 1. The feedback window consists of a browser control    have been used (Kellar et al. 2007), as have some
    to view the contents of visited web pages, a list window to   specialized logging software that works “in stealth mode”
   choose a visited URL, radio buttons to choose the answer of    (Kelly and Belkin, 2004). Although there are several merits
                    some questions, and so on                     in using custom-built browsers, because various data can
                                                                  be collected easily, we developed a browser-monitoring
be constructed using a wealth of information through a            module (BMM) that runs behind Internet Explorer without
browser side analysis. In order to analyze users’ implicit        any modification to the browser, as we wanted to preserve
interest at the browser side, we have to monitor several          the natural state of the Web browsing environment as
usage logs, for example, the viewing time, scroll                 much as possible.
movement, sequences of visited URLs, keyboard typing,                BMM is a type of monitoring software that was
and so on. In our research, we have chosen several usage          developed to detect Windows GUI messages while users
logs to record while users view different Web pages. The          read Web pages, and thus it is possible to measure user
viewing time that has mainly been investigated in the             activities in real-time without any interruption to the users.
related researches so far is the time during which users          BMM uses a global hooker library, written in C++, which
remain on a particular web page. The mouse wheel counts           runs in the background and hooks all Windows operating
the number of WM-MOUSEWHEEL messages (Choi et al.                 system events. In addition, using Windows Shell API,
2007). For mouse and scrollbar movement, we measured              BMM can access all instances of currently running Internet
the distance between two consecutive positions of the             Explorers through the COM object. In addition, necessary
mouse cursor and scroll bar at regular intervals and              properties of Web pages can be obtained from the COM
summed the distances. We also counted the number of               object. BMM is written in C#, running under a Windows
processed WM-PAINT messages, as WM-PAINT                          platform with .NET Framework 2.0.
messages are processed when users change the size of their           BMM consists of four components - hooker, data
browser window, scroll within the window, move their              recorder, data aggregator, and feedback window. The data
mouse cursor, and so on. The number of mouse clicks and           to hook are the number of keys pressed, events of program
keyboard typing were also considered. We believe that             focus changes, number of WM_PAINT events, mouse
these activities are good indicators of user interest             click and mouse wheel messages, and so on. Basically, the
regarding the contents of Web pages. We have chosen               hooker catches every message passed within the operating
these logs because they can be measured without much              system, so we should filter out irrelevant messages to
effort. However, for scroll movement, we were unable to           record only necessary data for our studies. For instance,
obtain the position of the scrollbar on some of the Web           because a WM_PAINT message is invoked whenever the
pages, and the WM-PAINT messages can be affected by               O/S needs to re-draw some parts of a window, we have to
the dynamic content of certain Web pages. This means that         be able to ignore the messages from unfocused windows
we have to be careful when using these data as logs for           and count the number of messages that are invoked for
measuring user activities.                                        only the currently focused browser window. The
   We did not record some of the behaviors that have been         aggregator can acquire several properties of a Web page by
considered by other researches – bookmarking, saving,             using a Document Object Model (DOM). Acquired
printing, and coping and pasting – because users do not           properties are the viewing size of a document (in pixels),
always show those behaviors on every valuable Web page,           file size (in bytes), current location of the scrollbar, and
and hence their records do not suit our purpose.                  character set of the page. The location of scroll bar is
   We collected some physical data of Web pages - the             periodically updated so that the total displacement of the
scroll height, file size, and URL information (top-level          scrollbar can be estimated. However, a critical issue arises
URL and depth of URL) - of each visited Web page.                 at several 'fancy' Web pages that have different structures
Moreover, in the course of the experiments, some                  from standard Web documents, eventually yielding no data
additional factors were included when they were required          while accessing the DOM property. The data aggregator
for analysis. The additional factors were the number of out-      also aggregates all data from these multiple components,
                                                                  and the data recorder stores the aggregated data in a
human-readable XML format for future analysis. After              politics, economics, education, engineering, entertainment,
Web searching, using the feedback window, users can               science, health, and sports – with varying content size. The
review the visited Web pages and choose radio buttons that        twenty-five subjects read each page in their own desired
ask about several types of assessments about the contents         manner from the list of collected Web pages. Because we
of each Web page. If the users do not want to answer              wanted to exclude any effect of information clues, we
questions regarding some of the Web pages, they can even          simply provided numbers on the list without showing any
remove the records easily. In figure 1, the structures of the     information about the contents of the Web pages in
feedback windows are shown.                                       advance. Thus, the subjects were supposed to click the
                                                                  numbers in order to view the contents. To obtain the
Subject                                                           appropriate data, the subjects were not told that some
                                                                  activities would be measured while they were viewing the
We conducted 4 experiments, each with its own purpose.            Web pages. During the experiments, the subjects’ activities
The detailed concept of the experiments will be described
                                                                  while reading the Web pages, and some measurable data,
in the next section. For each experiment, we recruited
                                                                  were recorded in a log file for future analysis. In addition,
some graduate students who are majoring in computer               whenever a subject finished reading a Web page, a small
science for our subjects. Twenty-five students participated
                                                                  window appeared wherein the subject recorded his/her
in the first experiment, 23 in the second, 19 in the third,
                                                                  interest level for the contents of the page. There were 5
and 12 students in the fourth. Among the students, 11 got         levels of interest, and the subjects recorded their interest
through the second, third, and fourth experiments, and one
                                                                  for the contents of a Web page accordingly. Due to some
new subject volunteered for the fourth experiment. All of
                                                                  malfunctions of the BMM in the users’ browsing
the students have a high level of knowledge and experience        environment and a failure to properly obtain user feedback,
about the Internet and the Web. We chose these students as
                                                                  the log files of 5 users were excluded. Therefore, we
subjects because all of them use the Web not only for their
                                                                  analyzed 20 users’ log files. For the first experiment, we
work but also for entertainment or distraction. Most of all,      formulated the following simple hypotheses.
they use the Web for a relatively long time each day so that
we could gather plenty of data from their activities. It also
means that we could observe their Web usage patterns              1. The number of processed log data is relatively higher on
under various contexts. We paid about 20 dollars to each          Web pages that contain interesting contents.
subject for their participation in the first, second, and third   2. The amount of information in a Web page affects the
experiments, respectively. For the fourth experiment, we          amount of processed log data.
paid 60 to 160 dollars to each subject according to the rate
of the completed feedback.
                                                                  Experiment 2. Actually, the procedure of the second
Experimental Concept and Procedure                                experiment was the same as the first experiment except
                                                                  that we collected ordinary Web pages that contain images,
There are three main strategies for studying information-         tables, videos, and frames. It was intended to see whether
seeking behavior – laboratory experiments, sample surveys,        there will be differences in usage patterns according to
and field studies (Kellar et al. 2007). Considering these         form of the Web page. When a subject finished reading all
strategies, we designed four experiments and conducted            of the Web pages, he/she activated a feedback window
them in-series. In the first and second experiments, the          wherein the subject could review all of the pages and
subjects came to our laboratory and browsed some pre-             answer some questions about each one visited. In this
collected Web pages. In the third experiment, the subjects        experiment, differently from the first experiment that
performed given information-seeking tasks in our                  collected only the interest levels for the contents, we also
laboratory. As a final step of each experiment, the subjects      wanted to verify the influence of other subjective
carried out feedback tasks in order to record their own           assessments of Web pages - difficulty, complexity, and
subjective assessments about each of the Web pages they           credibility along with interest – on a 5-point scale. If a
had browsed. The fourth experiment was carried out at the         subject clicked one of the URLs on a visited page list in the
subjects’ own residences. The subjects installed BMM on           feedback window, the contents of the Web page appeared
their PCs to collect their Web usage logs for a period of         again, and the subject could then choose his/her points for
about two weeks. For the feedback process of the fourth           the questions regarding the subjective feedback.
experiment, we let the subjects carry out the feedback tasks
                                                                  Experiment 3. We can find several different
at least once a day. The first and second experiments were
carried out in a blind mode in which the subjects could not       categorizations of Web user behaviors in previous
                                                                  researches. Most recently, 4 task categories were provided
see any information about the contents of each Web page
                                                                  in (Kellar et al. 2007) - fact finding, information gathering,
before viewing them. In other words, no proximal cues
(Chi et al. 2001) were provided.                                  just browsing, and transactions. In (White and Drucker
                                                                  2007), Web users are grouped into navigators and
Experiment 1. The first experiment was a kind of                  explorers according to the level of visit variances. In
preliminary study. We collected 120 Web pages that                consideration of these previous works, we also classify a
contain only text and offer information on various topics –       user’s Web tasks into two groups.
                                                                                                 0.28                                            0.24                                               0.14
 Ex      Feedback        VT     MM       MW        MC        WP
 Ex1      interest     0.695    0.572    0.563    0.475     0.663                                0.26                                            0.22
                        (**)     (**)     (**)     (**)      (**)                                                                                                                                   0.12
                                                                                                 0.24                                             0.2
           scroll      -0.006   0.006    0.261    0.008     0.059
           height                                                                                0.22                                            0.18


                                                                          Means of Vewing Time


                                                                                                                          Means of Vewing Time


                                                                                                                                                                             Means of Vewing Time
                                                                                                                                                                                                     0.1
 Ex2      interest      0.771   0.545     0.686   0.559     0.507
                                                                                                  0.2                                            0.16
                         (**)    (**)      (**)    (**)       (*)
        complexity     -0.391            -0.148            -0.599                                0.18                                            0.14                                               0.08
                                -0.178            -0.196
                         (**)              (**)               (*)
                                                                                                 0.16                                            0.12
         difficulty                                        -0.057
                       -0.476   -0.340   -0.532   -0.418                                                                                                                                            0.06
                                                             (**)                                0.14                                             0.1
         credibility                     0.411
                       0.507    0.203             0.289     0.241                                0.12                                            0.08
                                          (*)                                                                                                                                                       0.04
           scroll      0.074    0.016    0.167    0.001    -0.059                                 0.1                                            0.06
           height
 Ex3      interest      0.396    0.301                                                           0.08                                            0.04                                               0.02
                                         0.119    0.245     0.229                                         1 2 3 4 5                                        1 2 3 4 5                                      1 2 3 4 5
                          (*)      (*)                                                                  Interest Levels                                 Credibility Levels                             Complexity Levels
        complexity     -0.315   -0.129   -0.533   -0.162
                                                            0.040
                         (**)     (**)     (**)     (**)
         difficulty     0.307   0.307    -0.124   0.182    -0.330      Figure 2. The viewing time according to feedback levels in the
         credibility    0.609    0.414    0.288    0.412                                     third experiment
                                                            0.389
                         (**)     (**)      (*)     (**)
           scroll       0.011   -0.025    0.120   -0.036   -0.022
           height                                                    details were the same as in the second experiment.
 Ex4      interest     0.442    0.315    0.258    0.306     0.282    Differently with the first and second experiments that
                        (**)                                (KT)     controlled the subjects’ activities in that the subjects could
         credibility   0.434    0.138    -0.010   0.222     0.124
                         (*)                                (KT)
                                                                     only visit the collected Web pages without any pre-
           scroll      0.056    0.001    0.117    0.017     0.001    information clues, in the third experiment, the subjects
           height                                           (KT)     could visit any Web page that they wanted and use any
 VT: Viewing time / MM: Mouse move / MW: Mouse wheel /               search engine or portal site they wanted to use. Therefore,
 MC: Mouse click / WP : WM_PAINT / KT: Keyboard typing               we observed a lot of re-visitation patterns. Thus, during the
 *: p-value of ANOVA test < 0.05                                     feedback phase, we let the subjects delete the logs of Web
 **: p-value of ANOVA test < 0.01                                    pages that they just used to find other Web pages to visit.
                                                                     In this way, we excluded the navigational Web pages. The
 Table 2. The values of correlation between feedback level and the   concepts of the navigational Web pages will be given in
                       amount of usage logs                          section 4.5.
                                                                     Experiment 4. For the fourth experiment, 12 graduate
                                                                     students participated - 4 females and 8 males. They
Task 1: careful searching                                            installed the BMM on their PCs and collected various logs
This task is a type of information gathering that requires           for about 2 weeks. Some of the subjects participated in our
accuracy, trust, efficiency, and responsibility of the search        experiment for 16 days. For their feedback, we encouraged
results. In our experiment, the given task was to find some          them to give their feedback levels of each visited Web
information about their research topics. For examples, they          page a 5-point scale and choose one of the task types. If a
had to find some Web pages of laboratories in universities           URL was not a content page according to the subject’s
or companies that are related with their research topics and         viewpoint, the URL could be deleted easily and BMM
read the pages carefully to judge the relevance of the               records a special number for the URL for future analysis.
information. We encouraged the subjects to perform this              In this experiment, we collected only three types of
task as normally as possible.                                        feedback – interest, credibility, and task types - because we
                                                                     wanted to minimize the subjects’ burden in answering
Task 2: casual searching
                                                                     many questions for all of the visited Web pages visited.
This task is a type of information gathering and browsing
that can be performed without any burden or responsibility
regarding the search results. For example, the subjects                                                                                           Result
could search for some information about their hobbies,
favorite products to buy, famous tourist spots, favorite             In the series of experiments, we measured the numbers of
sports or movie stars, and so on. We also encouraged the             several processed messages on each visited Web page and
subjects to perform these tasks as normally as possible.             normalized the value using min-max normalization
                                                                     according to each subject. We included this normalization
                                                                     procedure because there would be variances in the amount
  The subjects performed the two tasks with their own                of usage logs due to the subjects’ individual differences.
topics for about 2 hours. The logging data and feedback
                                              0.35                                                                 0.35                          pages is not an important factor. Differently from the
                                               0.3                                                                  0.3
                                                                                                                                                 results of interest level, the difficulty and complexity levels


                                                                             Viewing Time under High Credibility
                                                                                                                                                 showed a negative correlation with the amount of usage
           Viewing Time under High Interest
                                              0.25                                                                 0.25
                                                                                                                                                 logs. The credibility levels showed positive correlations
                                               0.2                                                                  0.2                          with the amount of usage logs but the differences of the
                                                                                                                                                 amounts among the levels are not statistically significant.
                                              0.15                                                                 0.15
                                                                                                                                                 From the results, we concluded that the interest level exerts
                                               0.1                                                                  0.1                          the most significant influence on the amount of usage logs,
                                                                                                                                                 and that users are inclined to quickly leave Web pages that
                                              0.05                                                                 0.05
                                                                                                                                                 have difficult contents or complex structures without many
                                                0
                                                          1        2
                                                                                                                     0
                                                                                                                               1        2
                                                                                                                                                 interactions. Finally, we found that there were low
                                                     Task(careful/casual)                                                 Task(careful/casual)   correlations between the amount of usage logs and the
                                                                            (a)                                                                  sizes of Web pages except for the amount of the mouse
                                              0.35                                                                 0.25                          wheel log. This was not different with the results of the
                                                                                                                                                 first experiment.
                                               0.3
                                                                             Viewing Time under High Credibility
           Viewing Time under High Interest


                                                                                                                    0.2
                                              0.25                                                                                               Experiment 3
                                                                                                                   0.15
                                               0.2                                                                                               In figure 2 and table 2, we can see that the viewing time
                                              0.15
                                                                                                                                                 and amount of mouse movement have positive correlations
                                                                                                                    0.1
                                                                                                                                                 with the interest levels, and that the differences of the
                                               0.1                                                                                               amounts among the interest levels are also statistically
                                              0.05
                                                                                                                   0.05
                                                                                                                                                 significant. The amount of mouse wheel use, mouse clicks,
                                                                                                                                                 and processed WM_PAINT messages also showed positive
                                                0
                                                          1      2
                                                                                                                     0
                                                                                                                               1      2          correlations with interest levels, but the differences were
                                                     Language(KOR/ENG)                                                    Language(KOR/ENG)
                                                                                                                                                 not statistically significant. The amount of usage logs
                                                                            (b)                                                                  increased according to the complexity levels, but dropped
   Figure 3. (a) The differences of viewing time according to task                                                                               steeply at level 5. The difficulty levels showed no large
  types (b) the differences of viewing time according to languages                                                                               correlation with the amount of usage logs. The most
                                                                                                                                                 interesting pattern that we found in the results of the third
                                                                                                                                                 experiment was that the amount of usage logs showed a
                                                                                                                                                 positive correlation with the credibility levels, and that the
Experiment 1                                                                                                                                     differences of the amounts of usage logs among the
From the results of the first experiment, we found some                                                                                          credibility levels were statistically significant. This result
interesting patterns. As we can see in table 2, there were                                                                                       was not found in the results of the second experiment in
positive correlations between the amount of all usage logs                                                                                       which users browsed pre-collected Web pages without
and interest levels. Furthermore, from a one-way ANOVA                                                                                           proximal cues. Therefore, we concluded that the usage logs
test, we also found that the amount of the logs shows                                                                                            are under the influence of credibility levels as well as
significant differences among the interest levels. Based on                                                                                      interest levels in ordinary Web browsing environments.
this result, we temporally concluded that users have a                                                                                              In the third experiment, we also checked whether there
tendency to interact more at high-interested Web pages,                                                                                          are differences in the amount of usage logs according to
and hence all the logs can be used as implicit interest                                                                                          the task types and written languages used. From figure 3,
indicators. One more interest thing is that there was a low                                                                                      we found that there was a general trend of more interaction
correlation between the amount of usage logs and the size                                                                                        logs recorded during a careful task than during a casual one,
of the Web pages except for the amount of the mouse                                                                                              especially on pages of the highest interest and credibility
wheel log.                                                                                                                                       levels. For written languages, there was a general trend of
                                                                                                                                                 more interaction logs on English pages than on Korean
Experiment 2                                                                                                                                     pages, especially on the pages with the highest interest
                                                                                                                                                 levels, but there was no large difference according to
Actually, we thought that there would be some differences                                                                                        credibility levels. These results showed us that the type of
between the result patterns of the first experiment and                                                                                          task and written languages used also should be considered
those of the second experiment because the forms of the                                                                                          as important influential factors that make differences in the
Web pages were quite different. However, there were no                                                                                           amount of usage logs created.
big differences between the results. Table 2 shows us that
there were also positive correlations between the amount of
                                                                                                                                                 Experiment 4
all usage logs and interest levels, similarly with the results
of the first experiment. In addition, we also found                                                                                              In the fourth experiment, there were some logs that contain
significant differences in the amount of usage logs among                                                                                        an excessively long viewing time because the experiment
the interest levels. This means that the form of the Web                                                                                         has been conducted in the users’ personal environments.
                            4500


                                                                                      correlation coefficient
                            4000                                                                                      0.8

                            3500                                                                                      0.6


             no. of pages
                            3000
                                                                                                                      0.4
                            2500
                                                                                                                      0.2
                            2000
                                                                                                                                       0
                            1500                                                                                                           0          5             10               15                           20          25        30

                            1000                                                                                                                                maximun cutline (min.)


                                                                                 p-value (ANOVA)
                            500                                                                                       0.2

                              0                                                                                 0.15
                                   0   10   20   30   40   50   60   70

                                             view time (sec)                                                          0.1


           Figure 4. The distribution of viewing time                                                           0.05

                                                                                                                                       0
                                                                                                                                           0          5             10               15                           20          25        30
                                                                                                                                                                maximun cutline (min.)
From figure 4, we can find that the users stayed on 99% of
the all visited Web pages for at most 346 seconds. We also                  Figure 5. The correlation coefficients (top) and p-values of
found that there were some visited logs that showed a                        significance test (bottom) according to maximum cutline
viewing time of over 30 minutes. This means that we
should find a maximum cutline in order to filter out some
logs as simply noise. We set various values to the cutline,               Day frequency and feedback pages vs. non-feedback
from 3 to 25 minutes. As we set the cutline values                        pages. Because most of target pages that users want to
differently, we excluded logs in which the viewing time                   access can be reached via portal sites, news sites, and
was above the cutline, and then normalized each user’s                    search engines, we thought that the front pages of these
viewing time to his/her scale. Finally, we checked whether                sites and hub pages within the sites may appear in the
the magnitude of the cutline made an impact on the                        visited URL history more frequently than others. For
applicability of the viewing time as an indicator. From                   example, when a user wants to read a newspaper, he/she
figure 5, we can see that a reasonable cutline should be set              visits the home page of news site and clicks on some links
to somewhere between 5 and 18 minutes in order to                         that seem to contain interesting news. In a similar manner,
observe a high positive correlation between the viewing                   whenever a user wants to find some information, he/she
time and interest level, and the statistical difference among             may visit the front page of a search engine first and then
the viewing times in each interest level. For example, if we              click on one of the links that the search engine retrieves.
set the maximum cutline to 14 minutes, the viewing time                   Similarly, if the user wants to log onto some commercial
shows a positive correlation with the interest level (r =                 sites or even his/her own Web mail accounts, he/she should
0.5522), and according to the result of a one-way ANOVA                   first visit the front page of the service and input his/her
test, the differences among the viewing times of each level               username and password in order to proceed. Therefore, if
are significantly different (p = 0.0092). This means that we              we look over the users’ visited URL histories, the
can use viewing time to identify interested Web pages                     navigational pages - the front pages of portal sites, news
based on the fact that users will stay for a relatively longer            sites and search engines, and any type of hub page – will
time on them than on uninterested Web pages. In addition,                 appear more frequently than others. Moreover, if the users
we found that when we want to infer users’ interest based                 visit Websites according to their daily routine, they will
on the viewing time, a careful noise-filtering task is                    visit some of the Websites everyday in their regular
absolutely required. Therefore, we excluded logs that                     patterns. In this respect, we thought that the URLs of
contained over 15 minutes of viewing time in the fourth                   navigational pages might be found in logs from each day.
experiment. In figure 6 and table 2, we can see that only                 On the contrary, the content pages were shown relatively
the viewing time showed positive correlations and
statistically significant differences among the levels of                                                                                  0.04                                                        0.04
interest and credibility. It is very interesting that we could
not find significant differences between other usage logs                                                                              0.035                                                          0.035
and feedback levels. The differences in the amount of
                                                                                                                Means of Vewing Time


                                                                                                                                                                               Means of Vewing Time


usage logs according to the task types were similar with the                                                                               0.03                                                        0.03
result of the third experiment.
                                                                                                                                       0.025                                                          0.025

Additional Findings from Experiment 4
Because the fourth experiment was conducted during a                                                                                       0.02                                                        0.02


period of about 2 weeks, we can observe some more
historical patterns that could not be observed in previous                                                                             0.015
                                                                                                                                                  1   2       3        4   5
                                                                                                                                                                                                      0.015
                                                                                                                                                                                                              1    2        3       4   5
                                                                                                                                                       Interest Levels                                             Credibility Levels
experiments. In this section, we introduce some additional
findings.
                                                                           Figure 6. The viewing time according to feedback levels in the
                                                                                                fourth experiment
                      16000                       100

                                                  90
                                                                                                                                       logs                  p-value
                      14000

                                                  80
                                                                                                                                    URL Depth                 0.0623
                      12000
                                                  70
                                                                                                                                  Day Frequency             0.0003 (*)
                      10000
                                                  60
                                                                                                                                   Viewing Time             0.0206 (*)
                       8000                       50
                                                                                                                                   Mouse Move                 0.5314
                                                  40
                                                                                                                                   Mouse Click                0.5258
                       6000
                                                                                                                                   Mouse Wheel              0.0181 (*)
                                                  30
                       4000                                                                                                       Keyboard typing           0.0349 (*)
                                                  20
                       2000
                                                  10
                                                                                                                     Table 3. The results of significance test for difference of the
                          0
                               1             2
                                                   0
                                                               1             2                                       values of each interaction log between feedback pages and
                                   (a)                             (b)                                                        non-feedback pages: (*) means significant


  Figure 7. (a) The number of feedback pages and non-feedback                                                      our expectation. The 12 subjects have mainly deleted the
    pages and (b) the average number of outlinks contains: 1 –                                                     home pages of search engines, retrieved lists of search
               feedback page / 2- non-feedback page                                                                engines, the first pages of portal sites, news lists, home
                                                                                                                   pages of community sites, online banking sites, intranet
rarely because the users don’t usually view the same                                                               front pages and so on as non-feedback pages. In some of
contents again and again.                                                                                          the previous researches, we found that there were several
   Based on the considerations that we have mentioned so                                                           attempts to discriminate content pages from navigational
far, we formulated a very simple hypothesis - everyday-                                                            pages using the number of outlinks that are contained in
visited URLs have a strong chance to be navigational                                                               the pages (Cooley et al. 1999; Fu et al. 2001; Domenech
pages. For the hypothesis, we created a variable named                                                             and Lorenzo 2007). The main idea is that there will be a
Day Frequency (DF). The concept of DF is very similar to                                                           larger number of outlinks on navigational pages than on
document frequency, which is often used in information                                                             contents pages. We also thought that this idea is acceptable
retrieval and text mining (Salton and McGill 1986), and                                                            so we counted the average numbers of contained outlinks
DF value of each visited URL can be calculated using                                                               in both feedback pages and non-feedback pages. However,
equation (1).                                                                                                      as we can see in figure 7, the number of outlinks on
                                                                                                                   feedback pages was higher than on non-feedback pages in
                                   |{d j : Urli  d j }|
                      DFi                                                                                   (1)   our results. Therefore, we examined carefully whether the
                                                 |D|                                                               DF values in feedback pages and non-feedback pages are
                                                                                                                   significantly different. As we can see in figure 8, the
In this equation, | D | is total number of days in                                                                 average DF value of non-feedback pages is higher than the
experiment, d j is the URL collection of the j-th day and                                                          values of feedback pages, and the difference is statistically
| {d j : Urli  d j } | means the number of days where i-th                                                        significant (p = 0.0003). We found that the amount of some
URL appears. If a URL exhibits a high value of DF, the                                                             usage logs was also different between feedback and non-
URL is thought to be inappropriate for content extraction                                                          feedback pages. From table 3, we can see that viewing time,
and should be regarded as a navigational page.                                                                     the amount of mouse wheel use, and the amount of
   In the fourth experiment, the selection of a contents page                                                      keyboard typing were significantly different.
was fully up to the subject’s subjective decision. Even                                                            Task Identification by Visited URLs. We believed that
though we did not explain the concept of navigational                                                              users have their own URL lists that are specific to their
pages in detail, they found by themselves that there are                                                           current tasks because they may use the Web based on their
naturally several Web pages that may not be fit for                                                                individual previous experiences on the Web. In this
expressing their feedback levels. As we can see in figure 7,                                                       respect, we analyzed the top-level URLs that users visited
the number of non-feedback pages was much greater than                                                             during the period of the experiment. As we can see in table
                                                                                                                   4, over 90% of visited URLs were separable by the tasks.
    6                   0.25                            0.08


    5
                                                        0.06
                                                                                                                                   user No.         task separable (%)
                         0.2                            0.04

                                                        0.02
                                                                                                                                       1                   92.68
    4
                        0.15                              0
                                                                     1                 2                 3
                                                                                                                                       2                   92.59
    3                                                          1 - viewTime / 2 - mouseMove / 3 - mouseClick                           3                   93.17
                                                        0.04

    2
                         0.1

                                                        0.03
                                                                                                                                       4                   95.77
                        0.05
                                                        0.02                                                                           5                     75
    1
                                                        0.01                                                                           6                   93.86
    0
        1         2
                          0
                               1         2
                                                          0
                                                                         1                      2                                      7                    100
            (a)                    (b)                                   1 - mouseWheel / 2 - keyPress
                                                                                                                                       8                   90.57
  Figure 8. The URL depth of feedback pages and non-feedback                                                                           9                   89.29
  pages (left - a) and the DF values (left - b) : 1 – feedback page /                                                                 10                   97.40
                                                                                                                                      11                   91.07
 2- non-feedback page and the mean values of interaction logs: on
                                                                                                                                      12                   96.21
    feedback pages (right - left bars) and on non-feedback pages
                           (right - right bars)                                                                             Table 4. The proportion of task separable URLs
                                 40
                                                                 60                           60                   observed in the results of the second experiment in which
                                 35                                                                                the subjects visited pre-collected Web pages even without
                                 30
                                                                 50                           50                   any pre-clue about the contents. In addition, there were
          No. of Top-level URL                                                                                     significant differences among the amounts of all usage logs
                                 25                              40                           40
                                                                                                                   according to interest levels in the results of the second
                                                                                                                   experiment, but only the amount of viewing time and


                                                         Task1


                                                                                      Task2
                                 20
                                                                 30                           30
                                                                                                                   mouse movements were affected by the interest levels in
                                 15
                                                                 20                           20                   the results of the third experiment.
                                 10
                                                                                                                   Experiment 3 vs. Experiment 4. Differently from the
                                                                 10                           10
                                 5                                                                                 results of the third experiment, we observed that the
                                 0                               0                            0
                                                                                                                   viewing time only showed a significant relation with the
                                      1    2
                                  Task(careful/casual)
                                                                      2 4 6 8 10 12
                                                                          Day
                                                                                                   2 4 6 8 10 12
                                                                                                       Day
                                                                                                                   interest and credibility levels in the results of the fourth
                                                                                                                   experiment. This means that the more natural the
                                                                                                                   environment is, the more unknown factors will exert their
   Figure 9. The average number of URLs in each task (left), the                                                   influences on the usage patterns. We also observed in the
     increasing rate of average number of URLs in careful task                                                     results of the third experiment that there are some
                   (middle), in casual task (right)                                                                differences in usage patterns according to the task types,
                                                                                                                   such that the amount of usage logs on interested Web
In other words, 90% of visited URLs belong to a specific                                                           pages in careful tasks is higher than in casual tasks. The
task only, and hence we can infer the types of current task                                                        same result was observed in the fourth experiment. Finally,
easily by checking the top-level URLs. Moreover, as we                                                             from historical data analyses, we found that Day
can see in figure 9, the number of URLs that users visited                                                         Frequency and some usage logs are significantly different
in the casual tasks is much higher than in the careful tasks.                                                      according to the page types.
The most interesting patterns are the increasing rates of the                                                      Summary. We also briefly summarized all of the observed
number of visited URLs as time goes on. The number of                                                              patterns as the following.
visited URLs in the tasks of casual searching increased                                                            1) Generally, the amount of usage logs is not under the
more drastically than in careful searching. This means that                                                        influence of the size and form of the Web page.
the subjects showed the navigator’s patterns in careful
searching tasks but showed the explorer’s patterns in                                                              2) Information scents exert noticeable influence on usage
casual searching tasks (White and Drucker 2007). We                                                                patterns such that Web users choose links to visit based on
believe that this pattern is meaningful in developing                                                              information scents, and the scents also cause the users to
personalization schemes that are adaptive to current task                                                          show some uncertain usage patterns while they are viewing
types.                                                                                                             Web pages.
                                                                                                                   3) The viewing time is the best log to be used as an
                                                                                                                   implicit feedback indicator if it is pre-processed carefully.
                                 Discussion and Future Work                                                        It means that we have to analyze the viewing time more
                                                                                                                   carefully than other logs to develop personalization
Review of the Result and Summary                                                                                   services that are adaptive to user interest.
We analyzed the results of 4 experiments and recognized                                                            4) The viewing time is under the influence of interest and
that there are noticeable differences in usage patterns                                                            credibility levels. In other words, interest and credibility
according to the experimental environment. In this section,                                                        levels are the most influential contextual factors in a
we briefly summarize the interesting differences.                                                                  natural Web environment. The difficulty and complexity
                                                                                                                   levels do not create noticeable variations on the amount of
Experiment 1 vs. Experiment 2. The forms of the Web                                                                usage logs.
pages that the subjects visited in the first and second
experiments were different, but we could not see large                                                             5) The viewing time is also under the influences of current
differences between the results of the two experiments.                                                            tasks, written languages, and page types. In addition, page
Moreover, the amount of usage logs was not influenced by                                                           types are also influential on the variations of other usage
the amount of contents or size of the Web pages. We                                                                logs such that the amount of mouse wheel use, number of
believe that this pattern came from the fact that Web users                                                        visits in a day, and the amount of keyboard typing were
read Web pages in a nonlinear pattern, and that there are                                                          significantly different based on the page types.
some unique characteristics in reading digital documents                                                           6) Web users visit different Websites when they are
(Liu 2005).                                                                                                        performing different tasks and they show different
Experiment 2 vs. Experiment 3. In the results of the third                                                         navigational patterns according to the task types.
experiment in which the subjects freely select the Web                                                             7) We recognized that some historical and experiential
pages to visit, we observed that the credibility levels                                                            aspects that may not be observed in short time analysis can
regarding the contents exert a noticeable influence on the                                                         only be found in long time analysis.
amount of usage logs, but the same pattern has not been
Limitations of the experiments
Although many interesting patterns were observed from
our experiments, we also acknowledge the limitations of
our experiments. We cannot expect that the observed
patterns will generalize to a general population because we
recruited small number of people from same population for
our subjects according to our experimental convenience.
However, the results show us valuable usage patterns of
experienced Web users and consequently provide us with a
good insight into further researches.

Future Work
As we already discussed in previous sections, the viewing
time is under the influence of various factors. We cannot          Figure 10. A possible practical solution- the arrows on the left
decide what service applications are to be activated based        shows their influential relationships and the arrows on the right
solely on the fact that viewing time increases on a current          means that the logs can be used for data preparation tasks
Web page, because the viewing time will be affected by
various factors - interest levels, credibility levels, page
types, tasks, and written languages. Therefore, to find a
user’s characteristics and select the applications                                        References
accordingly, it is necessary to intelligently detect what
factors are currently influencing the usage patterns. We         Al halabi W. S.; Kubat. M.; and Tapia M. 2007. Time
think that it will be very challenging to find current           spent on a web page is sufficient to infer a user's interest.
contextual factors intelligently, but we also think that the     In Proceedings of the IASTED European Conference:
current factors can be identified through some careful           internet and multimedia systems and applications, 41-46,
statistical analyses on various historical usage patterns. For   Chamonix, France: ACTA Press.
example, as we already discussed in section 4.5, the URLs        Badi R.; Bae S.; J. Moore M.; Meintanis K.; Zacchi A;
of the Web pages that users are currently viewing will give      Hsieh H.; Shipman F.; and Marshall C. C. 2006.
us information of the current task types. In addition,           Recognizing user interest and document value from
because Web users have a tendency to choose Websites to          reading and organizing activities in document triage. In
visit according to their own previous experiences about the      Proceedings of the 11th international conference on
sites, the URLs are also useful for inferring the users’         Intelligent user interfaces, 218-225, Sydney, Australia:
subjective feedback levels on the contents of Web pages if       ACM.
we monitor user activities for a long period. Actually, in       Borlund, P. and Ingwersen, P. 1997. The Development of a
the post interviews of the third experiment, the subjects        Method for the Evaluation of Interactive Information
told us that they use different search engines according to      Retrieval Systems. Journal of Documentation, 53(3):225-
their current tasks. For examples, they use Google for           250.
careful tasks and Naver – a Korean portal site - for casual
tasks. Therefore, we assume that URL information can be          Byström K. and Järvelin K. 1995. Task complexity affects
used very effectively for the purpose of inferring the user’s    information seeking and use. Information Processing and
contexts. The similarity between the contents of current         Management, 31(2):191-213.
Web pages and contents of previous high-interested Web           Chi E. H.; Pirolli P.; Chen K.; and Pitkow J. 2001. Using
pages can also be used to infer the interest levels on the       information scent to model user information needs and
current Web pages. Furthermore, the Day Frequency can            actions and the Web. In Proceedings of the SIGCHI
be used to infer the types of Web pages viewed.                  conference on Human factors in computing systems, 490-
   If our system can infer the current contextual factors        497, Seattle, Washington, United States: ACM.
intelligently, some proactive services can be provided. In       Choi J.; Lee G.; and Um Y. 2007. Analysis of Internet
figure 10, we present the concept of a data preparation          Users’ Interests Based on Windows GUI Messages. In
service that we are developing in which unnecessary visit        Proceedings of the 12th International Conference on
logs and uninterested contents can be filtered out. In           Human-Computer Interaction, Lecture Notes in Computer
addition, if the system can identify a user’s current task       Science, 4553:881-888.: Springer Berlin / Heidelberg.
type correctly, the threshold of the viewing time to find        Cooley R.; Mobasher B.; and Srivastava J. 1999. Data
high-interested Web pages can be applied accordingly.
                                                                 Preparation for Mining World Wide Web Browsing
   Finally, we should consider individual differences
                                                                 Patterns. Knowledge and Information Systems, 1(1):5-32.
because there may be variances according to user
preference, cognitive styles, temperament, and so on.            Domenech J. M. and Lorenzo J. 2007. A Tool for Web
                                                                 Usage Mining. In Proceedings of the 8th International
                                                                 Conference on Intelligent Data Engineering and
Automated Learning, Lecture Notes in Computer Science,         Kim K. and Allen B. 2002. Cognitive and task influences
4881:695-704, D.: Springer Berlin / Heidelberg.                on Web searching behavior. Journal of the American
Fogg B. J.; Soohoo C.; Da-nielson D. R.; Marable L.;           Society for Information Science and Technology,
Stanford J.; and Tauber E. R. 2003. How do users evaluate      53(2):109-119: John Wiley & Sons.
the credibility of Web sites?: a study with over 2,500         Liu Z. 2005. Reading behavior in the digital environment.
participants. In Proceedings of the 2003 conference on         Journal of Documentation, 61(6):700-712. Emerald Group
Designing for user experiences, 1-15, San Francisco,           Publishing Limited.
California: ACM.                                               Mobasher B.; Cooley R.; and Srivastava J. 2000.
Fu Y.; Shih M.; Creado M.; and Ju C.. Reorganizing web         Automatic personalization based on Web usage mining.
sites based on user access patterns. 2001. In Proceedings of   Communications of the ACM, 43(8):142-151: ACM.
the tenth international conference on Information and          Nakamichi N.; Shima K.; Sakai M.; and Matsumoto K.
knowledge management, 583-585, Atlanta, Georgia, USA,:         2006. Detecting low usability web pages using quantitative
ACM.                                                           data of users' behavior. In Proceedings of the 28th
Gauch S.; Speretta M.; Chandramouli A.; and Micarelli A.       international conference on Software engineering, 569-576,
2007. User Profiles for Personalized Information Access.       Shanghai, China: ACM.
The Adaptive Web, Lecture Notes in Computer Science,           Rieh S. Y. 2002. Judgement of information quality and
4321:54-89,: Springer Berlin / Heidelberg.                     cognitive authority in the Web. Journal of the American
Hofgesang P. I. 2006. Relevance of Time Spent on Web           Society for Information Science and Technology,
Pages. In Proceedings of KDD Workshop on Web Mining            53(2):145-161: John Wiley & Sons.
and Web Usage Analysis, in conjunction with the 12th           Salton G. and McGill M. J. 1986. Introduction to Modern
ACM SIGKDD International Conference on Knowledge               Information Retrieval: McGraw-Hill.
Discovery and Data Mining, Philadelphia, PA.
                                                               Seo Y. W. and Zhang B. T. 2000. Learning user's
Johnson J. D. 2003. On contexts of information seeking.        preferences by analyzing Web-browsing behaviors. In
Information Processing & Management, 39(5):735-760:            Proceedings of the fourth international conference on
Elsevier                                                       Autonomous agents, 381-387, Barcelona, Spain: ACM.
Kari J. and Savolainen R. 2007. Relationships between          Sonnenwald D. H. 1999. Evolving Perspectives of Human
information seeking and context: A qualitative study of        Information Behavior: Contexts, Situations, Social
Internet searching and the goals of personal development.      Networks and Information Horizons. Exploring the
Library & Information Science Research, 29(1):47-69:           contexts of information behaviour, 176-190: Taylor
Elsevier                                                       Graham Publishing.
Kellar M.; Watters C.; Duffy J.; and Shepherd M. 2005.         Vakkari P. 1999. Task complexity, problem structure and
Effect of Task on Time Spent Reading as an Implicit            information actions: integrating studies on information
Measure of Interest. In Proceedings of the American            seeking and retrieval. Information processing &
Society for Information Science and Technology,                management, 35(6):819-837: Elsevier.
41(1):168-175.
                                                               Vakkari P. 2001. A theory of the task-based information
Kellar M.; Watters C.; and Shepherd M. 2007. A Field           retrieval process: a summary and generalisation of a
Study Characterizing Web-based Information Seeking             longitudinal study. Journal of Documentation, 57(1):44-
Tasks. Journal of the American Society for Information         60: Emerald Group Publishing Limited.
Science and Technology, 58(7):999-1018: John Wiley &
                                                               Wang Y. D. and Emurian H. H. 2005. An overview of
Sons.
                                                               online trust: Concepts, elements, and implications.
Kelly D. and Belkin N. J. 2004. Display time as implicit       Computers in Human Behavior, 21(1):105-125: Elsevier.
feedback: understanding task effects. In Proceedings of the
                                                               Wang P.; Hawk W. B.; and Tenopir C. 2000. Users'
27th annual international ACM SIGIR conference on
                                                               interaction with World Wide Web resources: an
Research and development in information retrieval, 377-
384, Sheffield, United Kingdom: ACM.                           exploratory study using a holistic approach. Information
                                                               processing & management, 36(2):229-251: Elsevier.
Kelly D. and Cool C. 2002. The effects of topic familiarity
                                                               Wathen C. N. and Burkell J. 2001. Believe it or not:
on information search behavior. In Proceedings of the 2nd
ACM/IEEE-CS joint conference on Digital libraries, 74-75,      Factors influencing credibility on the Web. Journal of the
                                                               American Society for Information Science and Technology,
Portland, Oregon, USA.
                                                               53(2):134-144: John Wiley & Sons.
Kelly D. and Teevan J. 2003. Implicit feedback for
inferring user preference: a biblio-graphy. ACM SIGIR          White R. W. and Drucker S. M. 2007. Investigating
                                                               behavioral variability in web search. In Proceedings of the
Forum 37(2):18-28. ACM.
                                                               16th international conference on World Wide Web, 21-30,
Kelly D. 2004. Understanding implicit feedback and             Banff, Alberta, Canada: ACM.
document preference: a naturalistic user study. Ph.D.
Dissertation, Rutgers University. 2004.