=Paper=
{{Paper
|id=None
|storemode=property
|title=A Context Ultra-Sensitive Approach to High Quality Web
Recommendations based on Web Usage Mining and Neural Network Committees

|pdfUrl=https://ceur-ws.org/Vol-703/paper4.pdf
|volume=Vol-703
|dblpUrl=https://dblp.org/rec/conf/www/NasraouiP04a
}}
==A Context Ultra-Sensitive Approach to High Quality Web
Recommendations based on Web Usage Mining and Neural Network Committees
==
<pdf width="1500px">https://ceur-ws.org/Vol-703/paper4.pdf</pdf>
<pre>
   A Context Ultra-Sensitive Approach to High Quality Web
     Recommendations based on Web Usage Mining and
                 Neural Network Committees
                                             Olfa Nasraoui and Mrudula Pavuluri
                                           Dept. of Electrical and Computer Engineering
                                                  206 Engineering Science Bldg.
                                                     The University of Memphis
                                                     Memphis, TN 38152-3180
                                                      onasraou@memphis.edu


ABSTRACT
           Personalization tailors a user’s interaction with the Web
                                                                             1. INTRODUCTION
information space based on information gathered about them.                  The flow of information in a completely automated Web
Declarative user information such as manually entered profiles               personalization system spans several stages starting from the
continue to raise privacy concerns and are neither scalable nor              user’s Web navigation patterns until the final recommendations,
flexible in the face of very active dynamic Web sites and                    and including the intermediate stages of logging the users’
changing user trends and interests. One way to deal with this                activities, preprocessing and segmenting Web log data into Web
problem is through a completely automated Web personalization                user sessions, and learning a usage model from this data. The
system. Such a system can be based on Web usage mining to                    usage model can come in many forms: from the lazy type
discover Web usage profiles, followed by a recommendation                    modeling used in collaborative filtering[15,16], that simply stores
system that can respond to the users’ individual interests. We               all the users’ information and then relies on K Nearest Neighbors
present several architectures that rely on pre-discovered user               to provide recommendations from the previous history of
profiles: Context Sensitive Approaches based on single-step                  neighbors or similar users; to a set of frequent itemsets or
Recommender systems (CSA-1-step-Rec), and Context Ultra-                     associations; to a set of clusters of user sessions, and resulting
Sensitive Approaches based on two-step Recommender systems                   Web user profiles or summaries. Pazzani and Billsus [7] presented
(CUSA-2-step-Rec). In particular, the two-step recommendation                a collaborative filtering approach to recommendation, based on
strategy based on a committee of profile-specific URL-Predictor              users’ ratings of specific web pages, and Naives Bayes as the
models, is more accurate and faster to train because only the                prediction tool. Mobasher et al. [9] use pre-discovered association
URLs that are relevant to a specific profile are used to define the          rules and an efficient data structure to provide faster
relevant attributes for this profile’s specialized URL-Predictor             recommendations based on web navigation patterns.
model. Hence, the model complexity, such as the neural network               Collaborative models do not perform well in the face of very
architecture, can be significantly reduced compared to a single              sparse data, and do not scale well to the huge number of users and
global model that could involve hundreds of thousands of                     URLs/items [15]. In association rule based methods, large support
URLs/items. The two-step approach can also be expected to                    thresholds yield only a very small fraction of the total number of
handle overlap in user interests, and even to mend the effects of a          itemsets, while smaller support thresholds generally result in a
profile dichotomy that is too coarse. Finally, we note that all the          staggering number of mostly spurious URL itemsets. Techniques
mass-profile based recommendation strategies investigated are                that are based on clustering to discover profiles are also expected
intuitive, and are low in recommendation-time cost compared to               to face serious limitations in the presence of huge, very high-
collaborative filtering, (no need to store or compare to a large             dimensional, and sparse usage data, unless the clustering
number of instances). In our simulations on real Web activity                technique is efficient, robust to noise, and can handle the
data,     the   proposed      context     ultra-sensitive    two-step        sparseness. Unsupervised robust multi-resolution clustering
recommendation strategy achieves unprecedented high coverage                 techniques [6] can reliably discover most of the Web user
and precision compared to other approaches such as K-NN                      profiles/interest groups, because they benefit from a multi-
collaborative filtering and single-step recommenders such as the             resolution mechanism to discover even the less pronounced user
Nearest-Profile recommender.                                                 profiles, and they benefit from robustness to avoid the
                                                                             noisy/spurious profiles that are common with low support
Keywords                                                                     thresholds. Their unsupervised nature also avoids reliance on
Web personalization, web recommendation, web usage mining,                   input parameters or prior knowledge about the number of profiles
neural networks, collaborative filtering.                                    to be sought. Linden et al. [14] used item-to-item collaborative
                                                                             filtering as a recommendation strategy on Amazon.com. This
                                                                             approach matches each of the user’s purchased and rated items to


                                                                        32
similar items, then combines those similar items into a                      2. PROFILE DISCOVERY BASED ON WEB
recommendation list. A similar-item table is built by finding
items that customers tend to purchase together.                              USAGE MINING
                                                                             The first step in intelligent Web personalization is the automatic
In this paper, we propose several Context Sensitive Approaches               identification of user profiles. This constitutes the knowledge
based on single-step Recommender systems (CSA-1-step-Rec),                   discovery engine. These profiles are later used to recommend
and Context Ultra-Sensitive Approaches based on two-step                     relevant URLs to old and new anonymous users of a Web site.
Recommender systems (CUSA-2-step-Rec). The single-step                       This constitutes the recommendation engine, and constitutes the
recommender systems (CSA-1-step-Rec) simply predict the URLs                 main focus of this paper.
that are part of the nearest estimated profile as recommendations.
For example, the Nearest-Profile prediction model simply bases               The knowledge discovery part based on Web usage mining
its recommendations on the closest profile based on a similarity             [2,3,6,17,18,19,20] can be executed offline by periodically mining
measure. While this is a very simple and fast approach, it makes             new contents of the user access log files. In this paper, Web usage
the critical assumption that sessions in different profiles are              mining is used to discover the profiles using the following steps:
linearly separated. While this may be applicable for certain web
mining methods, it may not be true for others. In order to be able
to reliably map new unseen sessions to a set of mined profiles,              (1) Preprocess log file to extract user sessions,
without such assumptions about the profiles or how they separate             (2) Categorize sessions by Hierarchical Unsupervised Niche
the sessions, we can resort to classification methods that can                   Clustering (H-UNC) [6],
handle non-separable profile boundaries. In this paper, we explore           (3) Summarize the session categories in terms of user profiles,
both decision trees and Multi-Layer Perceptron neural networks               (4) Infer context-sensitive URL associations from user profiles,
for this task. Once trained, using the decision tree or neural
network model to classify a new session is fast, and constitutes
the single step of the recommendation process, since the classified          Step 1: Preprocessing the Web Log File to extract User
profile is the recommendation set.                                           Sessions

The Context Ultra-Sensitive two-step recommender system                      The access log of a Web server is a record of all files (URLs)
(CUSA-2-step-Rec) first maps a user session to one of the pre-               accessed by users on a Web site. Each log entry consists of the
discovered profiles, and then uses one of several profile-specific           following information components: access time, IP address, URL
URL-predictor neural networks (such as Multilayer perceptron or              viewed, …etc. An example showing two entries is displayed
Hopfield Autoassociative memory networks) in the second step to              below
provide the final recommendations. Based on this classification, a           ______________________________________________________________________
different recommendation model is designed for each profile                  17:11:48 141.225.195.29 GET /graphics/horizon.jpg 200
separately. A specific neural network was trained offline for each           17:11:48 141.225.195.29 GET /people/faculty/nasraoui/index.html 200
profile in order to provide a profile-specific recommendation
strategy that predicts web pages of interest to the user depending           The first step in preprocessing [1,2,3] consists of mapping the NU
on their profile and the current URLs. The two-step                          URLs on a website to distinct indices. A user session consists of
recommendation method not only handles overlap in user                       accesses originating from the same IP address within a predefined
interests, but can mend the effects of misclassification in the first        time period. Each URL in the site is assigned a unique number j ∈
nearest profile assignment step, and even the effect of a coarse             1, …, NU, where NU is the total number of valid URLs. Thus, the
profile dichotomy. The two-step recommendation method based                  ith user session is encoded as an NU-dimensional binary attribute
on multilayer perceptron networks achieves unprecedented high                         (i)
coverage and precision compared to K-NN and Nearest-Profile                  vector s with the property
recommendations. However, the approach based on Hopfield
Autoassociative memory networks seemes to struggle against the
well known harsh constraints limiting the number of patterns that                          1 if user accessed j th URL
                                                                                 s (ji ) = 
can be memorized relative to the number of units/URLs. Finally,
                                                                                           0 otherwise
we note that the proposed recommendations are intuitive, and low                                                                                      (1)
in cost compared to collaborative filtering: They are faster and
require lower main memory at recommendation time (no need to                 Step 2: Clustering Sessions into an Optimal Number of
store or compare to a large number of instances).                            Categories

The rest of the paper is organized as follows. In Section 2, we              For this task, we use Hierarchical Unsupervised Niche Clustering
present an overview of profile discovery using Web usage mining.             [6] or H-UNC. H-UNC is a hierarchical version of a robust
In Section 3, we present the single-step profile prediction based            genetic clustering approach (UNC) [5]. A Genetic Algorithm
recommendation process, and the two-step recommender system                  (GA) [14,15,16,17,18] evolves a population of candidate solutions
based on a committee of profile-specific URL-predictor neural                through generations of competition and reproduction until
networks. In Section 4, we present an empirical evaluation of the            convergence to one solution. Hence, the GA cannot maintain
recommendation strategies on real web usage data, and finally, in            population diversity. Niching methods attempt to maintain a
Section 5, we present our conclusions.                                       diverse population with members distributed among niches
                                                                             corresponding to multiple solutions. An initial population of
                                                                             randomly selected sessions is coded into binary chromosome
                                                                             strings that compete based on a density fitness measure that is


                                                                        33
highest at the centers of good (dense) clusters. Different niches in             3.1 Context Sensitive Approach Based on
the fitness landscape correspond to distinct clusters in the data set.
The algorithms are detailed below. Note that the clusters and their              Single-Step Profile Prediction Recommender
number are determined automatically. More details on HUNC can                    System (CSA-1-step-Rec)
be found in [6].
                                                                                 3.1.1 Single-Step Nearest-Profile Prediction Based
                                                                                 Recommender System
Hierarchical Unsupervised Niche Clustering Algorithm (H-                         The simplest and most rudimentary approach to profile based
UNC) [6]:                                                                        Web recommendation is to simply determine the most similar
-Encode binary session vectors                                                   profile to the current session, and to recommend the URLs in this
-Set current resolution Level L = 1                                              profile, together with their URL relevance weights as URL
-Start by applying UNC to entire data set w/ small population size;              recommendation scores.
-Repeat recursively until cluster cardinality or scale become too small {
   -Increment resolution level: L = L + 1
   -For each parent cluster found at Level (L-1):
      -Reapply UNC [5] only on data subset assigned to this parent
        cluster
      -Extract more child clusters at higher resolution (L > 1)
 }


Step 3: Summarizing Session Clusters into User Profiles
After automatically grouping sessions into different clusters, we
summarize the session categories in terms of user profile vectors
[3,4], pi: The kth component/weight of this vector (pik) captures                 Figure 1: Context-Sensitive Approach based onn single-step
the relevance of URLk in the ith profile, as estimated by the                     profile prediction based Recommender System (CSA-1-step-
conditional probability that URLk is accessed in a session                        Rec). The Profile Prediction Model can be a Nearest-Profile
belonging to the ith cluster (this is the frequency with which URLk                   classifier or any of the models shown in Figs 2 or 3.
was accessed in the sessions belonging to the ith cluster). The
                                                                                 Figure 1 shows the structure of such a recommendation system,
model is further extended to a robust profile [3,6] based on robust
                                                                                 where the profile prediction model simply consists of a nearest-
weights that assign only sessions with high robust weight to a
                                                                                 profile estimator based on computing a session to profile
cluster’s core. Unpolluted by noisy (irrelevant) sessions, these
                                                                                 similarity, and selecting the profile with highest similarity as the
profiles give a cleaner description of the user interests.
                                                                                 predicted profile.
3. DESCRIPTION OF THE SINGLE-STEP                                                The similarity score between an input session, s, and the ith
AND TWO-STEP RECOMMENDATION                                                      profile, pi, can be computed using the cosine similarity as follows,
STRATEGY OPTIONS                                                                                             ∑ ps
                                                                                                                         NU
                                                                                                                                   ik k                                    (2)
                                                                                            S   cos ine
                                                                                                si        =              k =1


                                                                                                            ∑ p ∑ s
                                                                                                                    NU                  NU
                                                                                                                    k =1      ik        k =1 k
Let U = {url1, url2, …, urlNU} be a set of NU urls on a given web
site visited in web user sessions sj, j = 1, ...., Ns, as defined in (1).
Let P = {p1, p2, …, pNP} be the set of NP Web user profiles
computed by the profile discovery engine. Each profile consists of
                                                                                 If a hierarchical Web site structure should be taken into account,
a set of URLs associated with their relevance weights in that
                                                                                 then a modification of the cosine similarity, introduced in [3,4],
profile, and can be viewed as a relevance vector of length NU,
                                                                                 that can take into account the Website structure can be used to
with pik = relevance of urlk in the ith profile. The problem of                  yield the following input membership,
recommendation can be stated as follows. Given a current Web
user session vector, sj = [sj1, sj2, …, sjNU], predict the set of URLs
that are most relevant according to the user’s interest, and                                                             ∑NU ∑NU pil S u (l , k ) sk                    (3)
recommend them to the user, usually as a set of Hypertext links                                           S siweb = max  l =1 NUk =1                  , S sicos ine 
                                                                                                                          ∑k =1 pik ∑k =1 sk
                                                                                                                                         NU
dynamically appended to the contents of the Web document                                                                                                              
returned in response to the most recent Web query. Because the                   where Su is a URL to URL similarity matrix that is computed
degree of relevance of the URLs that are determined of interest to               based on the amount of overlap between the paths leading from
the user, may vary, it may also be useful to associate the kth                   the root of the website (main page) to any two URLs, and is given
recommended URL with a corresponding URL relevance score,
                                                                                 by                                   pi ∩ p j        
rjk. Hence it is practical to denote the recommendations for                                   Su (i, j) = min1,                      
current Web user session, sj, by a vector rj = [rj1, rj2, …, rjNU]. In                                                              (
                                                                                                               max1, max pi , p j − 1 
                                                                                                                                               (   ) )
this study, we limit the scores to be binary.
                                                                                 We refer to the special similarity in (3) as the Web Session
                                                                                 Similarity.


                                                                            34
3.1.2 Single-Step Decision-Tree Based Profile                                  consists of as many input nodes as the number of valid URLs (i.e.
Prediction Recommender System                                                  NU nodes), an output layer having one output node for each profile
The nearest profile prediction model makes the critical                        (i.e. Np nodes), and a hidden layer with (NU+ Np) /2 nodes. Figure
assumption that sessions in different profiles are linearly                    3 shows the architecture of the neural network used to predict the
separated. While this may be applicable for certain web mining                 most relevant profile. The index of the output node with highest
methods, it may not be true for others. In order to be able to                 activation indicates the final class/profile.
reliably map new unseen sessions to a set of mined profiles,                             During the learning process, the input is passed forward
without such assumptions about the profiles or how they separate               to the other layers and the output of each element is computed
the sessions, we can resort to classification methods that are not             layer by layer. The difference between the acquired output of the
based on distance or similarity computations. In this paper, we                output layer and the desired output is back propagated to the
explore both decision trees and neural networks for this task.                 previous layers (modified by a derivative of the transfer function)
Once trained, using the decision tree or neural network model to               and the weights are adjusted. Learning of the network continues
classify a new session is fast, and constitutes the single step of the         until all the weights are learnt and no further change in the
recommendation process, since the classified profile is the                    weights occur.
recommendation set.
The decision tree profile prediction model is very similar to the
nearest profile prediction model. An input binary vector is
presented as input to the decision tree [22] and a profile/class is
predicted as the output. Each URL in the input vector is
considered as an attribute. In learning, first the entire training data
set is presented. Here, an attribute value is tested at each decision
node with two possible outcomes of the test, a branch and a sub-
tree. The class node indicates the class to be predicted.
           Building the tree [22] starts by selecting the URL with
maximum information gain, and placing this URL, say url1 at the
root node. The test at the branch is whether the session data
includes a visit to url1. If so, they are placed as one group. The
next URL with the maximum gain is placed as the node of the
sub-tree. For instance, let the next URL be url2. Again all the
remaining sessions are checked to see if they traversed url2. If
yes, they are sub-grouped. This procedure is continued until all               Figure 3: Architecture of a Profile Prediction Model based on
the attributes are checked or all the class attributes are identical (a        a Multi-Layer Perceptron that can be used within CSA-1-step-
pure sub-sample). This example is illustrated in fIgure 2.                                                  Rec


                                                                               3.2       Context Ultra-Sensitive Approach
                                                                                         Based on Two-Step Recommender
                                                                                         System with A Committee Of Profile-
                                                                                         Specific   URL-Predictor   Neural
                                                                                         Networks (CUSA-2-step-Rec)
                                                                               The single-step Profile prediction recommendation procedure is
                                                                               intuitively appealing and very simple. In particular, its
                                                                               implementation and deployment in a live setting is very efficient.
                                                                               Essentially, it amounts to a look-up table. However, it has several
                                                                               flaws: (i) the degree of similarity between the current session and
                                                                               the nearest profile that is identified may not be taken into account,
                                                                               (ii) the above procedure does not take into account sessions that
                                                                               are similar to more than a single profile, (iii) it cannot handle
 Figure 2: Example of a Profile Prediction Model based on a                    sessions which are different from all known profiles, and (iv) the
    decision tree that can be used within CSA-1-step-Rec                       set of recommendations derive directly from the contents of a
                                                                               single (assigned) profile for all sessions assigned to this profile,
                                                                               without any further distinction between the specific access
3.1.3 Single-Step Neural Network Based Profile                                 patterns. For this reason, we propose a two-step approach that in
Prediction Recommender System                                                  addition to exploiting the profile information, is able to
In the neural network [21] based approach of profile prediction, a             recommend more highly personalized recommendations that
feed-forward multilayer perceptron is used and is trained with                 depend not only on the assigned profile (people-to-people
Back-Propagation. The inputs (session URLs) and output (class or               collaboration filtering), but also explicitly, on the input session
profile) to the prediction model remain the same as the ones                   itself (item-to-item collaboration filtering),.
described above. The neural network replaces the classification                In this method, the URLs reflecting the more specific user
model block in Figure 1. Hence the input layer of the network                  interests, which may differ even between users assigned to the


                                                                          35
same profile, are predicted as recommendations instead of the               are a part of the ground truth complete session, as output of the
same profile. Hence, the more individualized URL predictions are            network. For each ground truth complete session, we find all the
expected to perform better compared to the coarser single-step              sub-sessions for window sizes 1-10, and use them to generate
profile prediction model.                                                   independent training and testing sets. Cosine similarity is used to
                                                                            map each sub-session to the closest profile, and the URL-Predictor
                                                                            network specialized for that profile is invoked to obtain the
                                                                            recommendations. Hence the input/output units of the URL-
3.2.1     Description of the Multi-Layer Perceptron                         Predictor network specialized for a particular profile are limited to
          URL-Predictor Neural Network                                      only the URLs that are present in the sessions assigned to that
A Multilayer Perceptron neural network [21] can be used for                 profile. For this reason, we re-index the URLs in the sub-sessions
directly predicting the URLs to be given as recommendations.                to a smaller index set, so that the number of input nodes in each
The architecture of the network is different from the network used          network will be reduced compared to the total number of URLs in
in the profile prediction scenario of Figure 3, and is shown in             the website. This will reduce the complexity of the network’s
Figure 4. This is because the number of output nodes is now equal           architecture. The output URLs of the invoked network are given as
to the number of input nodes. Each training input consists of a             the URL recommendations to the user. The output URL
user sub-session (ss) derived from a ground-truth complete                  recommendations (R) are in the binary format and are decoded into
session S, while the output nodes should conform to the remainder           the URLs by applying a threshold of ‘0.5’. That is, the URL is
of this session (S-ss). This means that there is one output node per        considered to be recommended if its activation value exceeds a
URL. Hence, the architecture of the network can become                      ‘0.5’ at the corresponding output node of the network.
extremely complex, as there would be NU input and NU output
nodes. This will in turn increase the number of hidden nodes to
roughly NU nodes. Training such a network may prove to be
unrealistic on large websites that may consist of thousands of
URLs. To overcome this problem, a separate network is learned
for each profile independently, with an architecture of its own.
Here, the number of input and output nodes depends only on the
number of significant URLs in that profile, and possibly those
related to its URLs by URL-level or conceptual similarity, and the
number of hidden nodes is set to the average of number of input
and output nodes. Figure 4 shows the architecture of each URL-
predictor neural network. There will be a committee of Np
specialized networks of similar kind used in developing this URL
recommendation prediction model, as illustrated in Figure 5. Each
of these networks is completely specialized to forming the                   Figure 5: Context Ultra-Sensitive Approach based on Two-
recommendations for only one profile, hence offering a local,                 Step Recommendation Process (CUSA-2-step-Rec) using a
more refined model, that enjoys the advantages of better                        Committee of Profile-Specific URL-Predictor Neural
accuracy, simplicity (fewer nodes and connections), and ease of              Networks (Any URL-Predictor model can be substituted for
training (as a result of simplicity).                                           the Multi-Layer Perceptron, e.g. a Hopfield network)


                                                                            3.3       Recommendations    Based    On
                                                                                      Autoassociative Memory Hopfield
                                                                                      Networks

                                                                            Hopfield networks are a special kind of recurrent neural networks
                                                                            that can be used as associative memory [21]. A Hopfield network
                                                                            can retrieve a complete pattern stored through the training process
                                                                            from an imperfect or noisy version of it. In some sense, a
                                                                            recommender system performs a similar operation, when it
  Figure 4: Architecture of a Profile-Specific URL-Predictor                recommends certain URLs from an incomplete session. Given Nurl
          Neural Network used in CUSA-2-step-Rec                            fully connected (via symmetric weights wij between each two units
                                                                            i and j) neurons, each serving simultaneously as input and as
                                                                            output, and assuming that the activation values, xi, are bipolar
                                                                            (+1/-1), the optimal weights to memorize Np patterns, can be
3.2.2     Learning the Profile-Specific URL-Predictor                       determined by Hebbian learning as follows
          Neural Network Models                                                                  Np
                                                                                                              for all i ≠ j (0, otherwise)
The URL-Predictor network for each profile is learnt                                       wij = ∑ xip x jp
independently with a separate set of training data. Learning each                                p =1

network involves presenting a sub-session consisting of some of             During testing/recall, when a new noisy pattern xnew is presented
the URLs visited by the user belonging to that profile as input and         as input, we set the activation at node i at iteration 0 to be xi0 =
adjusting the network weights by back propagation to recommend
URLs that are not part of the sub-session given as input, but which


                                                                       36
xnew-i, then the units are adjusted by iteratively computing, at each
                                                                                                             ∑ r s
                                                                                                                        NU        *         *
                                                                                                                                  jk        jk                    (4)
iteration t                                                                                           prec =            k

                                                                                                             ∑ (r )
                                                                                                             j          NU         * 2
                                    N url                                                                               k          jk
                            xit +1 = ∑ wij x tj
                                     j =1


until the network converges to a stable state. However, the desired          and coverage is given by
behavior of recall in a Hopfield network is expected to hold only
                                                                                                                 ∑ r s
                                                                                                                    NU       *         *
if all the possible complete session prototypes can be stored in the                                                         jk        jk                         (5)
                                                                                                      cov j =       k

                                                                                                                 ∑ (s )       * 2
                                                                                                                    NU
Hopfield network’s connection weights, and if these complete                                                        k         jk
sessions do not interact (or cross-talk) excessively. Severe
deterioration starts occurring when the number of patterns Np >
0.15Nurl, hence limiting Hopfield recommender system to sites
                                                                             In most recommender systems, precision and coverage act in
with a large number of URLs and yet very little variety in the user
                                                                             contradictory fashion. That is while one increases, the other
access patterns. This limitation is paradoxical in the context of
                                                                             decreases, and vice versa. To take into account both of these
large websites or transactional database systems. Our preliminary
                                                                             measures, a measure known as the “effectiveness” or F1-Measure
simulations with both a single global Hopfield network as well as
                                                                             can be computed. This measure achieves higher value when both
several profile-specific Hopfield networks have resulted in low
                                                                             the precision and coverage are simultaneously high. F1 is given
recall qualities since the network seemed to be able to memorize
                                                                             by
only very few stable states. However several profile-specific
Hopfield networks perform better than one global network, but
                                                                                                             2 * Pr ecision * Coverage                            (6)
only for some of the profiles.                                                                        F1 =
                                                                                                              Pr ecision + Coverage


                                                                             4. EXPERIMENTAL RESULTS
                                                                             4.1 Mining User profiles from Anonymous
                                                                             Web Usage Data
               Figure 6: Architecture of an Auto-                            Hierarchical Unsupervised Niche Clustering (H-UNC) [6] was
               Associative Memory Hopfield URL-                              applied on a set of web sessions preprocessed from the 12 day
              Predictor Neural Network that can be                           access log data of the Web site of the department of Computer
                    used in CUSA-2-step-Rec                                  Engineering and Computer Sciences at the University of
                                                                             Missouri, Columbia. After filtering out irrelevant entries, the data
3.4       Evaluating the Recommender Systems                                 was segmented into 1703 sessions accessing 343 distinct URLs.
                                                                             The maximum elapsed time between two consecutive accesses in
First a data set consisting of real web user sessions extracted from         the same session was set to 45 minutes. We applied H-UNC to the
the log files of a Web server is used to generate training and               Web sessions using a maximal number of levels, L = 5, in the
testing sets. For each complete session considered as the ground-            hierarchy, and the following parameters that control the final
truth, all the possible sub-sessions of different sizes are generated        resolution: Nsplit = 30, and σsplit = 0.1. H-UNC partitioned the Web
up to 9 URLs. If a user session consists of more than 10 URLs,               users sessions into 20 clusters at level 5, and each cluster was
then we first randomly select 10 URLs and then use them as the               characterized by one of the profile vectors, pi, i=0, ...,19. Some of
basis to form all sub-sessions. This is done in order to maintain            these profiles are summarized in Table 1.
the nesting or inclusion relationship between sub-sessions of size
(k) and sub-sessions of size (k+1). This in turn will facilitate the
interpretation of trends in performance versus the sub-session
size.
Once the URL-prediction model is built, we need to evaluate its                       TABLE 1. SOME OF THE 20 DISCOVERED PROFILES
performance by presenting test sub-sessions to the trained                     No.    size   Relevant URLs/Profile                               Profile Description
network. The test dataset is generated similarly to the training               0      106    {0.29 - /staff.html} {0.92 - /}                     Main      page,  people,
dataset, but it forms an independent 20% of the complete set of                              {0.98 - /people.html}                               faculty, staff, degrees,
                                                                                             {0.99 - /people_index.html}                         research.
sub-sessions generated.                                                                      {0.97 - /faculty.html}
                                                                                             {0.15 - /degrees.html}
Each actual completed session, sT, is treated as ground-truth, a                             {0.20 - /research.html}
subset of this session is treated as incomplete current sub-session,                         {0.35 - /grad_people.html}
                                                                                             {0.26 - /undergrad_people.html}
sj, and the computed recommendations are treated as predicted
                                                                               1      104    {0.99 - /}                                          Main page only
complete session. This process is very similar to an information                             {1.00 - /cecs_computer.class}
retrieval problem. Hence the evaluation proceeds by computing                  2             {0.81 - /}                                          Main page and course
                                                                                             {0.75 - /cecs_computer.class}                       pages
the precision and the coverage of the recommendations.                                       {0.87 - /courses.html}
                                                                                             {0.90 - /courses_index.html}
Let r*j = rj - sj. This corresponds to the recommendations                                   {0.88 - /courses100.html}
obtained after omitting all URLs that are part of the current                                {0.22 - /courses300.html}
                                                                                             {0.16 - /courses_webpg.html}
subsession. Also, let s*j = sT - sj, be the set of ground truth URLs,                        {0.16 - /~lan/cecs353}
not including the ones in the current subsession being processed               3      61     {0.80 - /} {0.48 - /degrees.html}                   Main page and graduate
                                                                                             {0.23 - /degrees_grad.html}                         degrees
for recommendations. The recommendation precision is given by                                {0.23 - /degrees_grad_index.html}


                                                                        37
               {0.23 - /deg_grad_genor.html}                                                          We notice that the single-step recommender systems (CSA-1-
  4      58    {0.72 - /}                                       Main              page,
               {0.97 - /degrees_undergrad.html}                 undergraduate courses,          step-Rec) do not have this nice feature, i.e., precision and
               {0.95 - /degrees_index.html}                     degrees,      especially        coverage will generally have opposing trends. However the
               {0.97 - /bsce.html} {0.52 - /bscs.html}          undergraduate degrees.
               {0.34 - /bacs.html} {0.34 - /courses.html}                                       single-step neural network profile prediction recommender seems
               {0.34-/courses100.html} {0.26 - /general.html}                                   to have a rather increasing precision, unlike the other single-step
               {0.22- /courses300.html}{0.81 - /degrees.html}
               {0.19 - /courses200.html}                                                        methods (nearest profile and decision trees). Again, this may be
                                                                                                attributed to the fact that the more sophisticated neural network
  13     38    {0.47 - /~shi}                                   CECS345             Java        succeeds in learning the true profile/class better regardless of the
               {0.82 - /~shi/cecs345}                           examples/references
               {0.21 - /~shi/cecs345/java_examples}                                             session size.
               {0.34 - /~shi/cecs345/references.html}                                                 The performance of k-NN fares competitively with all the
                                                                                                single-step recommender strategies, but only for longer session
                                                                                                sizes. This is not surprising, considering that k-NN can yield very
4.2 Comparative Simulation Results for                                                          accurate predictions, because it too is based on local context-
    CUSA-2-step-Rec, CUSA-2-step-Rec, and                                                       sensitive models. However, k-NN is notorious for its excessive
    K-NN                                                                                        computational and memory costs, at recommendation time, in
We used the following parameters in training the multilayer                                     contrast to all the other investigated techniques. While lazy in the
perceptron URL-Predictor neural networks: Maximum number of                                     learning phase, involving nothing more than storing all the
epochs = 2000, Learning Rate = 0.7 (for Input to Hidden layer)                                  previously seen cases, k-NN takes its toll during the
and 0.07 (for Hidden to Output layer). This parameter is                                        recommendation phase, when it needs to compare a new session
important in learning the network as it gives the amount of change                              with all the historic cases in order to produce recommendations.
of weights at each step. These optimal values were selected by                                        Based on the F1 measure, shown in Figure 9, that combines
trial and error, because with a very small value, it may take a long                            both precision and coverage equally, we can conclude that the
time for the algorithm to converge; while a large value will make                               single-step recommender systems vary in performance depending
the algorithm diverge by bouncing around the error surface. The                                 on the range of session sizes. The single-step Nearest Profile
learning rate is set higher for the connections from input to hidden                            predictor recommender system works best for very small session
layer when compared to the one set for hidden to output layer.                                  sizes (less than 2). The single-step Decision Tree predictor
This makes the learning slower and more accurate for the output                                 recommender system works best for session sizes in the range 3-
layer nodes. We have also used a Momentum factor of 0.5, to                                     5, and then competes very closely with the single-step Neural
accelerate learning while reducing unstable behavior.                                           Network predictor recommender system above session size 5.
                                                                                                While k-NN outperforms the single-step recommender systems,
The Collaborative filtering approach is based on using K Nearest                                the two-step profile-specific URL-predictor neural network
Neighbors (K-NN) followed by top-N recommendations for                                          recommender system wins in terms of both better precision and
different values of K and N. First the closest K complete sessions                              better coverage, particularly above session size 2. This suggests
from the entire history of accesses are found. Then the URLs                                    using a combination of different recommender modules that are
present in these top K sessions are sorted in decreasing order of                               alternated based on session size, even within the lifetime of a
their frequency, and the top N URLs are treated as the                                          single user session.
recommendation set. We show only the best results obtained for
K-NN at K=50 neighbors and N=10 URLs.                                                                Finally Fig. 10 shows the averaged F1 measure values for the
                                                                                                sessions belonging to each profile separately, when the CUSA-2-
Figures 7 through 9, depicting the 20-profile averaged measures                                 step-Rec strategy is used. The label for each profile is indicated
(precision, coverage, and F1), show that the two-step profile-                                  close to its own curve. We notice that performance for most
specific URL-predictor multilayer perceptron neural network                                     profiles is very good, except for profile 3. By looking at Table 1,
recommender system (CUSA-2-step-Rec) wins in terms of both                                      profile 1 is seen to grab some URLs with low URL significance
better precision and better coverage, particularly above session                                weight. This confirms the presence of some sessions, that while
size 2. This appeared at first unusual since it is very rare that a                             sharing the main profile URLs, also have additional URLs that
recommendation strategy scores this high on both precision and                                  altogether are accessed very infrequently. These additional URLs
coverage, and that an increase in precision did not seem to                                     are very hard to model in the URL prediction network, because
compromise coverage in any way. However, by looking at the                                      the overwhelming majority of the sessions in this profile do not
details of the design of the profile-specific URL-predictor neural                              contain them. In fact, using the robust session identification
network, we explain this relentless increase in precision by the                                approach described in [6], we were able to identify that only half
fact that the neural network output is trained to predict only the                              of the sessions in profile 3 form a strong consensus of two URLs.
URLs that the user has not seen before, i.e. ‘S-ss’, where S is the                             The remaining sessions cannot be modeled effectively even with a
complete session, and ss is the sub-session (URLs visited by the                                highly specialized URL prediction model. Since we did not screen
user). Clearly, as the sub-session size increases, more URLs are                                these widely varying noise sessions from the recommendation
presented to the output of the Neural network, making the                                       process, their effect will be seen. Exploiting the robustness feature
prediction task easier, since fewer URLs need to be predicted                                   of HUNC [6] is an interesting and worthwhile step towards
compared to smaller input sub-sessions. Similarly, coverage                                     greatly enhancing the quality of the recommendations, and we
increases, since with more input URLs, the neural network is able                               leave this open for future investigation.
to predict more of the missing URLs to complete the puzzle.
However, this does not happen at the expense of precision. On the                                     Fig. 11 shows the averaged F1 measure values for the
contrary, in this special type of URL prediction neural network,                                sessions belonging to each profile separately, when the K-NN
giving more hints about the user in the form of more of the visited                             strategy is used. We can see a more sporadic behavior of the
URLs makes the prediction task easier, and hence, will only result                              quality of the recommendations for different sizes, witnessing to
in more accurate predictions.                                                                   the inadequacy of a global (i.e. across all profiles) collaborative


                                                                                           38
neighborhood model for some profiles and session sizes,
especially in such a non-metric, noisy, and sparse high                                                                                                                                                                                       Coverage
dimensional space. Note that when the measure curves dive to
zero at a certain session size, this means that there were no                                                                                                         0.9

sessions of longer size, and does not reflect the quality of
recommendations.                                                                                                                                                      0.8

     Most importantly, we notice that the widely varying
performance for different profiles suggests that different                                                                                                            0.7
recommendation strategies perform best for different profiles, and                                                                                                                                                                    2-step Profile-specific NN
it may be beneficial to combine different strategies depending not                                                                                                    0.6                        K-NN(K=50,N=10)
only on the current session size, as we have noticed previously,
but also depending on the identified profile, yielding a                                                                                                                                                                             1-step Decision Tree
                                                                                                                                                                      0.5
sophisticated hybrid meta-recommendation system.


                                                                                                                                                        C o v e ra g e
                                                                                                                                                                                    1-step Neural Network                                                                                                1-step Nearest Profile-Cosine

                                                                                                                                                                      0.4

                                                                                                                                                                                                                                                             1-step Nearest Profile-WebSession
                                                                 Precision
                                                                                                                                                                      0.3


                   1
                                                                                                                                                                      0.2


            0.9
                                                                                                                                                                      0.1

                                                    2-step Profile-specific NN
            0.8
                                                                                                                                                                            0
                                                                                                                                                                                0            1              2           3              4              5                 6              7             8                9             10
            0.7                                                                  K-NN(K=50,N=10)                                                                                                                                              subsession size


            0.6                                                                                                                                         Figure 8: Coverage Values for all recommendation strategies
                                                                                                                                                               (CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN)
 P r e c is io n


            0.5
                                    1-step Decision Tree
            0.4                                                                                                                                                                                                                    F1-Measure (All Methods)
                                                                                     1-step Nearest Profile-Cosine       1-step Nearest Profile-
                                   1-step Neural Network
                                                                                                                                                                      0.9
            0.3

                                                                                                                                                                      0.8
            0.2

                                                                                                                                                                      0.7
            0.1                                                                                                                                                                                                                 2-step Profile-specific NN

                                                                                                                                                                      0.6
                   0
                                                                                                                                                                                      1-step Nearest Profile-Cosine                                            K-NN(K=50,N=10)
                       0   1   2            3              4               5             6              7            8             9               10
                                                                                                                                                                      0.5
                                                                                                                                                        F 1 -M e a s u re


                                                                  subsession size                                                                                                                                             1-step Decision Tree


                                                                                                                                                                      0.4
                                                                                                                                                                                                                      1-step Neural Network
Figure 7: Precision Values for all recommendation strategies                                                                                                                                                                                                                         1-step Nearest Profile-WebSession
       (CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN)                                                                                                                    0.3


                                                                                                                                                                      0.2


                                                                                                                                                                      0.1


                                                                                                                                                                            0
                                                                                                                                                                                0            1              2           3              4              5                 6              7             8                9             10
                                                                                                                                                                                                                                              subsession size


                                                                                                                                                                          Figure 9: F1-Measure Values for all recommendation
                                                                                                                                                                        strategies (CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN)


                                                                                                                                               39
                                                                                                                     4.3 Time    and                 Memory             Complexity
                                                                                                                         Considerations

                                                                                                                     From the point of view of cost or time and memory complexity, it
                                                                                                                     is interesting to note that training may take longer with the profile
                                                                                                                     based approaches. However this training can be done on a back
                                                                                                                     end computer and not on the server, and is therefore an offline
                                                                                                                     process that does not affect the operation of the web server. On
                                                                                                                     the other hand, all profile based approaches are extremely fast at
                                                                                                                     recommendation time and require a minimal amount of main
                                                                                                                     memory to function since they work with a mere summary of the
                                                                                                                     previous usage history instead of the entire history as in k-NN
                                                                                                                     collaborative filtering [15]. In our simulations, the proposed
                                                                                                                     profile-based recommendations with non-optimized Java code
                                                                                                                     running on a 1.7 GHz Pentium IV PC took only a small fraction
                                                                                                                     of a second to provide one recommendation, resulting in the order
                                                                                                                     of 10-100 recommendations per second depending on the strategy
                                                                                                                     used1. The far more computationally complex K-Nearest
                                                                                                                     Neighbor collaborative filtering generated recommendations at a
                                                                                                                     leisurely 2 recommendations per second.


                                                                                                                     5. CONCLUSIONS

 Figure 10: Individual averaged F1-Measure Values for each
                                                                                                                     We have investigated several single-step and two-step
         profile with the CUSA-2-step-Rec strategy
                                                                                                                     recommender systems. The single-step recommender systems
                                                                                                                     (CSA-1-step-Rec) simply predict the URLs that are part of the
                                                                                                                     nearest estimated profile as recommendations. The nearest profile
                                                           F1-Measure (K-NN k=50,N=10)                               prediction model simply based its recommendations on the closest
                                                                                                                     profile based on a similarity measure. In order to be able to
              0.8                                                                                                    reliably map new unseen sessions to a set of mined profiles,
                                                                                                                     without such assumptions about the profiles or how they separate
                                                  11                                                                 the sessions, we can resort to more powerful classification
              0.7                                                                                                    methods. In this paper, we explored both decision trees and neural
                                                       6                                                             networks for this task. Once trained, using the decision tree or
                                                                                                                     neural network model to classify a new session constitutes the
              0.6                                          3
                                             0                                                                       single step of the recommendation process, since the classified
                                4
                                                                              2                                      profile is the recommendation set.
                                                                    1
              0.5
                                                                5                                                    The two-step recommender system (CUSA-2-step-Rec) first maps
F 1 -m e a s u re


                                                                                                                     a user session to one of the pre-discovered profiles, and then uses
              0.4                                                                                                    one of several profile-specific URL-predictor neural networks in
                                                                                                                     the second step to provide the final recommendations. Based on
                                                                    15
                                                                                                                     this classification, a different recommendation model is designed
              0.3
                                                                                                                     for each profile separately. A specific back-propagation neural
                                    10
                                                               13                                                    network was trained offline for each profile in order to provide a
                                                                    14                                               profile-specific recommendation strategy that predicts web pages
              0.2
                                                                                                                     of interest to the user depending on their profile. The two-step
                                                                                                                     recommendation method achieves unprecedented high coverage
                                             16
              0.1                                                                                                    and precision compared to K-NN and single-step nearest-profile
                                                                                                                     recommendations. Finally, we note that the proposed
                                                                                                                     recommendations are low in cost compared to collaborative
                    0
                                                                                                                     filtering: They are faster and require lower main memory at
                        0   1            2        3                  4            5        6   7   8   9   10
                                                                                                                     recommendation time (no need to store or compare to a large
                                                                         subsession size
                                                                                                                     number of instances). Even though training neural networks to

Figure 11: Individual averaged F1-Measure Values for each
                                                                                                                     1
      profile with the K-NN (K = 50, N = 10) strategy                                                                    Here, 1 recommendation per second is actually a full set of
                                                                                                                         recommendations for a given session and not individual URLs
                                                                                                                         per second.


                                                                                                                40
build a recommendation system may take a long time offline,                     [4] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi. Extracting Web
testing this built network on a new user may take just a small                  User Profiles Using Relational Competitive Fuzzy Clustering,
fraction of a second.                                                           International Journal on Artificial Intelligence Tools, Vol. 9, No. 4, pp.
                                                                                509-526, 2000.
           Unlike most previous work, the proposed two-step                     [5] O. Nasraoui, and R. Krishnapuram, A Novel Approach to
profile-specific URL-predictor neural network recommender                       Unsupervised Robust Clustering using Genetic Niching, Proc. of the 9th
system allows a more refined context sensitive recommendation                   IEEE International Conf. on Fuzzy Systems, San Antonio, TX, May 2000,
                                                                                pp. 170-175.
process. The idea of using a separate network specialized to each
                                                                                 [6] O. Nasraoui and R. Krishnapuram. A New Evolutionary Approach to
profile seems to be novel, since it provides an even higher level of            Web Usage and Context Sensitive Associations Mining, International
context-awareness in personalization than the level already                     Journal on Computational Intelligence and Applications - Special Issue on
offered through collaborative filtering based personalization. The              Internet Intelligent Systems, Vol. 2, No. 3, pp. 339-348, Sep. 2002.
proposed method simultaneously achieved precision and coverage                  [7] M. Pazzani and D. Billsus, Learning and revising User Profiles: The
values of unprecedented quality (higher than 0.6 for most session               identification of Interesting Web Sites, Machine Learning, Arlington, 27,
sizes). Most existing techniques tend to excel in one measure                   pp. 313-331, 1997.
(such as precision), and fail in the other (example, coverage).                 [8] Kraft, D.H., Chen, J., Martin-Bautista, M.J., and Vila, M.A., Textual
                                                                                Information Retrieval with User Profiles Using Fuzzy Clustering and
           It is important to note that the divide-and-conquer                  Inferencing, in Szczepaniak, P.S., Segovia, J., Kacprzyk, J., and Zadeh,
strategy adopted in the 2-step approach to break what would be an               L.A. (eds.), Intelligent Exploration of the Web, Heidelberg, Germany:
unmanageable learning task (URL prediction for all sessions                     Physica-Verlag, 2002.
regardless of their profile) into several simpler learning tasks                 [9] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, Effective
(each profile separately) was the key to both (i) enabling faster               personalizaton based on association rule discovery from Web usage data,
                                                                                ACM Workshop on Web information and data management, Atlanta, GA,
learning, (ii) enabling better performance by reducing interactions
                                                                                Nov. 2001.
between different sessions. It is reasonable to expect that this                [10] J. H. Holland. Adaptation in natural and artificial systems. MIT
modular design could be extended by replacing the URL-                          Press, 1975.
Predictor neural network modules by different learning paradigms                 [11] L. Zadeh (1965). Fuzzy sets. Inf. Control 8, 338-353.
that are faster to train, while not compromising the accuracy of                [12] Klir, G. J., and Yuan, B., Fuzzy Sets and Fuzzy Logic, Prentice Hall,
predictions. Such learning modules could be decision trees. The                 1995, ISBN 0-13-101171-5.
proposed model could also be made even faster to train and more                 [13] R. Agrawal and R. Srikant (1994), Fast algorithms for mining
accurate by encouraging the discovery of even more high-                        association rules, Proceedings of the 20th VLDB Conference, Santiago,
                                                                                Chile, pp. 487-499.
resolution profiles, which is specifically one of the desirable
                                                                                [14] G. Linden, B. Smith, and J. York, Amazon.com Recommendations
features of Hierarchical Unsupervised Niche Clustering.                         Iem-to-item collaborative filtering, IEEE Internet Computing, Vo. 7, No.
           We finally classify our recommendation approaches                    1, pp. 76-80, Jan. 2003
with respect to the two-dimensional taxonomy presented in [16].                 [15] J. Breese, H. Heckerman, and C. Kadie, Empirical Analysis of
First, because the user is anonymous at all times, our approaches               Predictive Algorithms for Collaborative Filtering, Proc. 14th Conf.
                                                                                Uncertainty in Artificial Intelligence, pp. 43-52, 1998.
are all ephemeral with respect to the persistence dimension.
                                                                                [16] J.B. Schafer, J. Konstan, and J. Reidel, Recommender Systems in E-
Second, with respect to the automation dimension, our approaches                Commerce, Proc. ACM Conf. E-commerce, pp. 158-166, 1999.
are fully automatic. Furthermore, with regard to the four different             [17] J. Srivastava, R. Cooley, M. Deshpande, and P-N Tan, Web usage
families of recommendation techniques identified in [16] (non-                  mining: Discovery and applications of usage patterns from web data,
personalized, attribute based, item-to-item correlation, and                    SIGKDD Explorations, Vol. 1, No. 2, Jan 2000, pp. 1-12.
people-to-people correlation), the 1-step recommenders (CSA-1-                  [18] O. Zaiane, M. Xin, and J. Han, Discovering web access patterns and
step-Rec) can be considered as people-to people collaborative                   trends by applying OLAP and data mining technology on web logs, in
filtering. However, they use a cluster/profile summarization                    "Advances in Digital Libraries", 1998, Santa Barbara, CA, pp. 19-29.
                                                                                [19] M. Spiliopoulou and L. C. Faulstich, WUM: A Web utilization Miner,
model, hence providing better scalability. On the other hand, the
                                                                                in Proceedings of EDBT workshop WebDB98, Valencia, Spain, 1999.
CUSA-2-step-Rec model uses people-to people collaborative                       [20] J. Borges and M. Levene,Data Mining of User Navigation Patterns, in
filtering that is summarized through a cluster model, in the first              "Web Usage Analysis and User Profiling”, Lecture Notes in Computer
stage to map a new user to a profile. Then it uses a specialized                Science", H. A. Abbass, R. A. Sarker, and C.S. Newton Eds., Springer-
item-to-item recommendation model to produce the final                          Verlag , pp. 92-111,1999.
recommendations. Hence the CUSA-2-step-Rec approach can be                      [21] S. Haykin, Neural Networks: A Comprehensive Foundation,
considered as a hybrid between people-to-people and item-to-item                Macmillan, New York, 1994.
recommendations, which may explain its superior performance.                    [22] J. R. Quinlan. Induction of Decision Trees. Machine Learning, Vol. 1,
                                                                                pp. 81--106, 1986.
Acknowledgements
This work is supported by National Science Foundation CAREER Award
IIS-0133948 to O. Nasraoui.
References
[1] M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically
learning for user access pattern. Proc. 6th int. WWW conference, 1997.
[2] R. Cooley, B. Mobasher, and J. Srivastava, Web Mining: Information
and Pattern discovery on the World Wide Web, Proc. IEEE Intl. Conf.
Tools with AI, Newport Beach, CA, pp. 558-567, 1997.
[3] O. Nasraoui and R. Krishnapuram, and A. Joshi. Mining Web Access
Logs Using a Relational Clustering Algorithm Based on a Robust
Estimator, 8th International World Wide Web Conference, Toronto, pp. 40-
41, 1999.


                                                                           41

</pre>