=Paper=
{{Paper
|id=None
|storemode=property
|title=A Context Ultra-Sensitive Approach to High Quality Web
Recommendations based on Web Usage Mining and Neural Network Committees
|pdfUrl=https://ceur-ws.org/Vol-703/paper4.pdf
|volume=Vol-703
|dblpUrl=https://dblp.org/rec/conf/www/NasraouiP04a
}}
==A Context Ultra-Sensitive Approach to High Quality Web
Recommendations based on Web Usage Mining and Neural Network Committees
==
A Context Ultra-Sensitive Approach to High Quality Web
Recommendations based on Web Usage Mining and
Neural Network Committees
Olfa Nasraoui and Mrudula Pavuluri
Dept. of Electrical and Computer Engineering
206 Engineering Science Bldg.
The University of Memphis
Memphis, TN 38152-3180
onasraou@memphis.edu
ABSTRACT
Personalization tailors a user’s interaction with the Web
1. INTRODUCTION
information space based on information gathered about them. The flow of information in a completely automated Web
Declarative user information such as manually entered profiles personalization system spans several stages starting from the
continue to raise privacy concerns and are neither scalable nor user’s Web navigation patterns until the final recommendations,
flexible in the face of very active dynamic Web sites and and including the intermediate stages of logging the users’
changing user trends and interests. One way to deal with this activities, preprocessing and segmenting Web log data into Web
problem is through a completely automated Web personalization user sessions, and learning a usage model from this data. The
system. Such a system can be based on Web usage mining to usage model can come in many forms: from the lazy type
discover Web usage profiles, followed by a recommendation modeling used in collaborative filtering[15,16], that simply stores
system that can respond to the users’ individual interests. We all the users’ information and then relies on K Nearest Neighbors
present several architectures that rely on pre-discovered user to provide recommendations from the previous history of
profiles: Context Sensitive Approaches based on single-step neighbors or similar users; to a set of frequent itemsets or
Recommender systems (CSA-1-step-Rec), and Context Ultra- associations; to a set of clusters of user sessions, and resulting
Sensitive Approaches based on two-step Recommender systems Web user profiles or summaries. Pazzani and Billsus [7] presented
(CUSA-2-step-Rec). In particular, the two-step recommendation a collaborative filtering approach to recommendation, based on
strategy based on a committee of profile-specific URL-Predictor users’ ratings of specific web pages, and Naives Bayes as the
models, is more accurate and faster to train because only the prediction tool. Mobasher et al. [9] use pre-discovered association
URLs that are relevant to a specific profile are used to define the rules and an efficient data structure to provide faster
relevant attributes for this profile’s specialized URL-Predictor recommendations based on web navigation patterns.
model. Hence, the model complexity, such as the neural network Collaborative models do not perform well in the face of very
architecture, can be significantly reduced compared to a single sparse data, and do not scale well to the huge number of users and
global model that could involve hundreds of thousands of URLs/items [15]. In association rule based methods, large support
URLs/items. The two-step approach can also be expected to thresholds yield only a very small fraction of the total number of
handle overlap in user interests, and even to mend the effects of a itemsets, while smaller support thresholds generally result in a
profile dichotomy that is too coarse. Finally, we note that all the staggering number of mostly spurious URL itemsets. Techniques
mass-profile based recommendation strategies investigated are that are based on clustering to discover profiles are also expected
intuitive, and are low in recommendation-time cost compared to to face serious limitations in the presence of huge, very high-
collaborative filtering, (no need to store or compare to a large dimensional, and sparse usage data, unless the clustering
number of instances). In our simulations on real Web activity technique is efficient, robust to noise, and can handle the
data, the proposed context ultra-sensitive two-step sparseness. Unsupervised robust multi-resolution clustering
recommendation strategy achieves unprecedented high coverage techniques [6] can reliably discover most of the Web user
and precision compared to other approaches such as K-NN profiles/interest groups, because they benefit from a multi-
collaborative filtering and single-step recommenders such as the resolution mechanism to discover even the less pronounced user
Nearest-Profile recommender. profiles, and they benefit from robustness to avoid the
noisy/spurious profiles that are common with low support
Keywords thresholds. Their unsupervised nature also avoids reliance on
Web personalization, web recommendation, web usage mining, input parameters or prior knowledge about the number of profiles
neural networks, collaborative filtering. to be sought. Linden et al. [14] used item-to-item collaborative
filtering as a recommendation strategy on Amazon.com. This
approach matches each of the user’s purchased and rated items to
32
similar items, then combines those similar items into a 2. PROFILE DISCOVERY BASED ON WEB
recommendation list. A similar-item table is built by finding
items that customers tend to purchase together. USAGE MINING
The first step in intelligent Web personalization is the automatic
In this paper, we propose several Context Sensitive Approaches identification of user profiles. This constitutes the knowledge
based on single-step Recommender systems (CSA-1-step-Rec), discovery engine. These profiles are later used to recommend
and Context Ultra-Sensitive Approaches based on two-step relevant URLs to old and new anonymous users of a Web site.
Recommender systems (CUSA-2-step-Rec). The single-step This constitutes the recommendation engine, and constitutes the
recommender systems (CSA-1-step-Rec) simply predict the URLs main focus of this paper.
that are part of the nearest estimated profile as recommendations.
For example, the Nearest-Profile prediction model simply bases The knowledge discovery part based on Web usage mining
its recommendations on the closest profile based on a similarity [2,3,6,17,18,19,20] can be executed offline by periodically mining
measure. While this is a very simple and fast approach, it makes new contents of the user access log files. In this paper, Web usage
the critical assumption that sessions in different profiles are mining is used to discover the profiles using the following steps:
linearly separated. While this may be applicable for certain web
mining methods, it may not be true for others. In order to be able
to reliably map new unseen sessions to a set of mined profiles, (1) Preprocess log file to extract user sessions,
without such assumptions about the profiles or how they separate (2) Categorize sessions by Hierarchical Unsupervised Niche
the sessions, we can resort to classification methods that can Clustering (H-UNC) [6],
handle non-separable profile boundaries. In this paper, we explore (3) Summarize the session categories in terms of user profiles,
both decision trees and Multi-Layer Perceptron neural networks (4) Infer context-sensitive URL associations from user profiles,
for this task. Once trained, using the decision tree or neural
network model to classify a new session is fast, and constitutes
the single step of the recommendation process, since the classified Step 1: Preprocessing the Web Log File to extract User
profile is the recommendation set. Sessions
The Context Ultra-Sensitive two-step recommender system The access log of a Web server is a record of all files (URLs)
(CUSA-2-step-Rec) first maps a user session to one of the pre- accessed by users on a Web site. Each log entry consists of the
discovered profiles, and then uses one of several profile-specific following information components: access time, IP address, URL
URL-predictor neural networks (such as Multilayer perceptron or viewed, …etc. An example showing two entries is displayed
Hopfield Autoassociative memory networks) in the second step to below
provide the final recommendations. Based on this classification, a ______________________________________________________________________
different recommendation model is designed for each profile 17:11:48 141.225.195.29 GET /graphics/horizon.jpg 200
separately. A specific neural network was trained offline for each 17:11:48 141.225.195.29 GET /people/faculty/nasraoui/index.html 200
profile in order to provide a profile-specific recommendation
strategy that predicts web pages of interest to the user depending The first step in preprocessing [1,2,3] consists of mapping the NU
on their profile and the current URLs. The two-step URLs on a website to distinct indices. A user session consists of
recommendation method not only handles overlap in user accesses originating from the same IP address within a predefined
interests, but can mend the effects of misclassification in the first time period. Each URL in the site is assigned a unique number j ∈
nearest profile assignment step, and even the effect of a coarse 1, …, NU, where NU is the total number of valid URLs. Thus, the
profile dichotomy. The two-step recommendation method based ith user session is encoded as an NU-dimensional binary attribute
on multilayer perceptron networks achieves unprecedented high (i)
coverage and precision compared to K-NN and Nearest-Profile vector s with the property
recommendations. However, the approach based on Hopfield
Autoassociative memory networks seemes to struggle against the
well known harsh constraints limiting the number of patterns that 1 if user accessed j th URL
s (ji ) =
can be memorized relative to the number of units/URLs. Finally,
0 otherwise
we note that the proposed recommendations are intuitive, and low (1)
in cost compared to collaborative filtering: They are faster and
require lower main memory at recommendation time (no need to Step 2: Clustering Sessions into an Optimal Number of
store or compare to a large number of instances). Categories
The rest of the paper is organized as follows. In Section 2, we For this task, we use Hierarchical Unsupervised Niche Clustering
present an overview of profile discovery using Web usage mining. [6] or H-UNC. H-UNC is a hierarchical version of a robust
In Section 3, we present the single-step profile prediction based genetic clustering approach (UNC) [5]. A Genetic Algorithm
recommendation process, and the two-step recommender system (GA) [14,15,16,17,18] evolves a population of candidate solutions
based on a committee of profile-specific URL-predictor neural through generations of competition and reproduction until
networks. In Section 4, we present an empirical evaluation of the convergence to one solution. Hence, the GA cannot maintain
recommendation strategies on real web usage data, and finally, in population diversity. Niching methods attempt to maintain a
Section 5, we present our conclusions. diverse population with members distributed among niches
corresponding to multiple solutions. An initial population of
randomly selected sessions is coded into binary chromosome
strings that compete based on a density fitness measure that is
33
highest at the centers of good (dense) clusters. Different niches in 3.1 Context Sensitive Approach Based on
the fitness landscape correspond to distinct clusters in the data set.
The algorithms are detailed below. Note that the clusters and their Single-Step Profile Prediction Recommender
number are determined automatically. More details on HUNC can System (CSA-1-step-Rec)
be found in [6].
3.1.1 Single-Step Nearest-Profile Prediction Based
Recommender System
Hierarchical Unsupervised Niche Clustering Algorithm (H- The simplest and most rudimentary approach to profile based
UNC) [6]: Web recommendation is to simply determine the most similar
-Encode binary session vectors profile to the current session, and to recommend the URLs in this
-Set current resolution Level L = 1 profile, together with their URL relevance weights as URL
-Start by applying UNC to entire data set w/ small population size; recommendation scores.
-Repeat recursively until cluster cardinality or scale become too small {
-Increment resolution level: L = L + 1
-For each parent cluster found at Level (L-1):
-Reapply UNC [5] only on data subset assigned to this parent
cluster
-Extract more child clusters at higher resolution (L > 1)
}
Step 3: Summarizing Session Clusters into User Profiles
After automatically grouping sessions into different clusters, we
summarize the session categories in terms of user profile vectors
[3,4], pi: The kth component/weight of this vector (pik) captures Figure 1: Context-Sensitive Approach based onn single-step
the relevance of URLk in the ith profile, as estimated by the profile prediction based Recommender System (CSA-1-step-
conditional probability that URLk is accessed in a session Rec). The Profile Prediction Model can be a Nearest-Profile
belonging to the ith cluster (this is the frequency with which URLk classifier or any of the models shown in Figs 2 or 3.
was accessed in the sessions belonging to the ith cluster). The
Figure 1 shows the structure of such a recommendation system,
model is further extended to a robust profile [3,6] based on robust
where the profile prediction model simply consists of a nearest-
weights that assign only sessions with high robust weight to a
profile estimator based on computing a session to profile
cluster’s core. Unpolluted by noisy (irrelevant) sessions, these
similarity, and selecting the profile with highest similarity as the
profiles give a cleaner description of the user interests.
predicted profile.
3. DESCRIPTION OF THE SINGLE-STEP The similarity score between an input session, s, and the ith
AND TWO-STEP RECOMMENDATION profile, pi, can be computed using the cosine similarity as follows,
STRATEGY OPTIONS ∑ ps
NU
ik k (2)
S cos ine
si = k =1
∑ p ∑ s
NU NU
k =1 ik k =1 k
Let U = {url1, url2, …, urlNU} be a set of NU urls on a given web
site visited in web user sessions sj, j = 1, ...., Ns, as defined in (1).
Let P = {p1, p2, …, pNP} be the set of NP Web user profiles
computed by the profile discovery engine. Each profile consists of
If a hierarchical Web site structure should be taken into account,
a set of URLs associated with their relevance weights in that
then a modification of the cosine similarity, introduced in [3,4],
profile, and can be viewed as a relevance vector of length NU,
that can take into account the Website structure can be used to
with pik = relevance of urlk in the ith profile. The problem of yield the following input membership,
recommendation can be stated as follows. Given a current Web
user session vector, sj = [sj1, sj2, …, sjNU], predict the set of URLs
that are most relevant according to the user’s interest, and ∑NU ∑NU pil S u (l , k ) sk (3)
recommend them to the user, usually as a set of Hypertext links S siweb = max l =1 NUk =1 , S sicos ine
∑k =1 pik ∑k =1 sk
NU
dynamically appended to the contents of the Web document
returned in response to the most recent Web query. Because the where Su is a URL to URL similarity matrix that is computed
degree of relevance of the URLs that are determined of interest to based on the amount of overlap between the paths leading from
the user, may vary, it may also be useful to associate the kth the root of the website (main page) to any two URLs, and is given
recommended URL with a corresponding URL relevance score,
by pi ∩ p j
rjk. Hence it is practical to denote the recommendations for Su (i, j) = min1,
current Web user session, sj, by a vector rj = [rj1, rj2, …, rjNU]. In (
max1, max pi , p j − 1
( ) )
this study, we limit the scores to be binary.
We refer to the special similarity in (3) as the Web Session
Similarity.
34
3.1.2 Single-Step Decision-Tree Based Profile consists of as many input nodes as the number of valid URLs (i.e.
Prediction Recommender System NU nodes), an output layer having one output node for each profile
The nearest profile prediction model makes the critical (i.e. Np nodes), and a hidden layer with (NU+ Np) /2 nodes. Figure
assumption that sessions in different profiles are linearly 3 shows the architecture of the neural network used to predict the
separated. While this may be applicable for certain web mining most relevant profile. The index of the output node with highest
methods, it may not be true for others. In order to be able to activation indicates the final class/profile.
reliably map new unseen sessions to a set of mined profiles, During the learning process, the input is passed forward
without such assumptions about the profiles or how they separate to the other layers and the output of each element is computed
the sessions, we can resort to classification methods that are not layer by layer. The difference between the acquired output of the
based on distance or similarity computations. In this paper, we output layer and the desired output is back propagated to the
explore both decision trees and neural networks for this task. previous layers (modified by a derivative of the transfer function)
Once trained, using the decision tree or neural network model to and the weights are adjusted. Learning of the network continues
classify a new session is fast, and constitutes the single step of the until all the weights are learnt and no further change in the
recommendation process, since the classified profile is the weights occur.
recommendation set.
The decision tree profile prediction model is very similar to the
nearest profile prediction model. An input binary vector is
presented as input to the decision tree [22] and a profile/class is
predicted as the output. Each URL in the input vector is
considered as an attribute. In learning, first the entire training data
set is presented. Here, an attribute value is tested at each decision
node with two possible outcomes of the test, a branch and a sub-
tree. The class node indicates the class to be predicted.
Building the tree [22] starts by selecting the URL with
maximum information gain, and placing this URL, say url1 at the
root node. The test at the branch is whether the session data
includes a visit to url1. If so, they are placed as one group. The
next URL with the maximum gain is placed as the node of the
sub-tree. For instance, let the next URL be url2. Again all the
remaining sessions are checked to see if they traversed url2. If
yes, they are sub-grouped. This procedure is continued until all Figure 3: Architecture of a Profile Prediction Model based on
the attributes are checked or all the class attributes are identical (a a Multi-Layer Perceptron that can be used within CSA-1-step-
pure sub-sample). This example is illustrated in fIgure 2. Rec
3.2 Context Ultra-Sensitive Approach
Based on Two-Step Recommender
System with A Committee Of Profile-
Specific URL-Predictor Neural
Networks (CUSA-2-step-Rec)
The single-step Profile prediction recommendation procedure is
intuitively appealing and very simple. In particular, its
implementation and deployment in a live setting is very efficient.
Essentially, it amounts to a look-up table. However, it has several
flaws: (i) the degree of similarity between the current session and
the nearest profile that is identified may not be taken into account,
(ii) the above procedure does not take into account sessions that
are similar to more than a single profile, (iii) it cannot handle
Figure 2: Example of a Profile Prediction Model based on a sessions which are different from all known profiles, and (iv) the
decision tree that can be used within CSA-1-step-Rec set of recommendations derive directly from the contents of a
single (assigned) profile for all sessions assigned to this profile,
without any further distinction between the specific access
3.1.3 Single-Step Neural Network Based Profile patterns. For this reason, we propose a two-step approach that in
Prediction Recommender System addition to exploiting the profile information, is able to
In the neural network [21] based approach of profile prediction, a recommend more highly personalized recommendations that
feed-forward multilayer perceptron is used and is trained with depend not only on the assigned profile (people-to-people
Back-Propagation. The inputs (session URLs) and output (class or collaboration filtering), but also explicitly, on the input session
profile) to the prediction model remain the same as the ones itself (item-to-item collaboration filtering),.
described above. The neural network replaces the classification In this method, the URLs reflecting the more specific user
model block in Figure 1. Hence the input layer of the network interests, which may differ even between users assigned to the
35
same profile, are predicted as recommendations instead of the are a part of the ground truth complete session, as output of the
same profile. Hence, the more individualized URL predictions are network. For each ground truth complete session, we find all the
expected to perform better compared to the coarser single-step sub-sessions for window sizes 1-10, and use them to generate
profile prediction model. independent training and testing sets. Cosine similarity is used to
map each sub-session to the closest profile, and the URL-Predictor
network specialized for that profile is invoked to obtain the
recommendations. Hence the input/output units of the URL-
3.2.1 Description of the Multi-Layer Perceptron Predictor network specialized for a particular profile are limited to
URL-Predictor Neural Network only the URLs that are present in the sessions assigned to that
A Multilayer Perceptron neural network [21] can be used for profile. For this reason, we re-index the URLs in the sub-sessions
directly predicting the URLs to be given as recommendations. to a smaller index set, so that the number of input nodes in each
The architecture of the network is different from the network used network will be reduced compared to the total number of URLs in
in the profile prediction scenario of Figure 3, and is shown in the website. This will reduce the complexity of the network’s
Figure 4. This is because the number of output nodes is now equal architecture. The output URLs of the invoked network are given as
to the number of input nodes. Each training input consists of a the URL recommendations to the user. The output URL
user sub-session (ss) derived from a ground-truth complete recommendations (R) are in the binary format and are decoded into
session S, while the output nodes should conform to the remainder the URLs by applying a threshold of ‘0.5’. That is, the URL is
of this session (S-ss). This means that there is one output node per considered to be recommended if its activation value exceeds a
URL. Hence, the architecture of the network can become ‘0.5’ at the corresponding output node of the network.
extremely complex, as there would be NU input and NU output
nodes. This will in turn increase the number of hidden nodes to
roughly NU nodes. Training such a network may prove to be
unrealistic on large websites that may consist of thousands of
URLs. To overcome this problem, a separate network is learned
for each profile independently, with an architecture of its own.
Here, the number of input and output nodes depends only on the
number of significant URLs in that profile, and possibly those
related to its URLs by URL-level or conceptual similarity, and the
number of hidden nodes is set to the average of number of input
and output nodes. Figure 4 shows the architecture of each URL-
predictor neural network. There will be a committee of Np
specialized networks of similar kind used in developing this URL
recommendation prediction model, as illustrated in Figure 5. Each
of these networks is completely specialized to forming the Figure 5: Context Ultra-Sensitive Approach based on Two-
recommendations for only one profile, hence offering a local, Step Recommendation Process (CUSA-2-step-Rec) using a
more refined model, that enjoys the advantages of better Committee of Profile-Specific URL-Predictor Neural
accuracy, simplicity (fewer nodes and connections), and ease of Networks (Any URL-Predictor model can be substituted for
training (as a result of simplicity). the Multi-Layer Perceptron, e.g. a Hopfield network)
3.3 Recommendations Based On
Autoassociative Memory Hopfield
Networks
Hopfield networks are a special kind of recurrent neural networks
that can be used as associative memory [21]. A Hopfield network
can retrieve a complete pattern stored through the training process
from an imperfect or noisy version of it. In some sense, a
recommender system performs a similar operation, when it
Figure 4: Architecture of a Profile-Specific URL-Predictor recommends certain URLs from an incomplete session. Given Nurl
Neural Network used in CUSA-2-step-Rec fully connected (via symmetric weights wij between each two units
i and j) neurons, each serving simultaneously as input and as
output, and assuming that the activation values, xi, are bipolar
(+1/-1), the optimal weights to memorize Np patterns, can be
3.2.2 Learning the Profile-Specific URL-Predictor determined by Hebbian learning as follows
Neural Network Models Np
for all i ≠ j (0, otherwise)
The URL-Predictor network for each profile is learnt wij = ∑ xip x jp
independently with a separate set of training data. Learning each p =1
network involves presenting a sub-session consisting of some of During testing/recall, when a new noisy pattern xnew is presented
the URLs visited by the user belonging to that profile as input and as input, we set the activation at node i at iteration 0 to be xi0 =
adjusting the network weights by back propagation to recommend
URLs that are not part of the sub-session given as input, but which
36
xnew-i, then the units are adjusted by iteratively computing, at each
∑ r s
NU * *
jk jk (4)
iteration t prec = k
∑ (r )
j NU * 2
N url k jk
xit +1 = ∑ wij x tj
j =1
until the network converges to a stable state. However, the desired and coverage is given by
behavior of recall in a Hopfield network is expected to hold only
∑ r s
NU * *
if all the possible complete session prototypes can be stored in the jk jk (5)
cov j = k
∑ (s ) * 2
NU
Hopfield network’s connection weights, and if these complete k jk
sessions do not interact (or cross-talk) excessively. Severe
deterioration starts occurring when the number of patterns Np >
0.15Nurl, hence limiting Hopfield recommender system to sites
In most recommender systems, precision and coverage act in
with a large number of URLs and yet very little variety in the user
contradictory fashion. That is while one increases, the other
access patterns. This limitation is paradoxical in the context of
decreases, and vice versa. To take into account both of these
large websites or transactional database systems. Our preliminary
measures, a measure known as the “effectiveness” or F1-Measure
simulations with both a single global Hopfield network as well as
can be computed. This measure achieves higher value when both
several profile-specific Hopfield networks have resulted in low
the precision and coverage are simultaneously high. F1 is given
recall qualities since the network seemed to be able to memorize
by
only very few stable states. However several profile-specific
Hopfield networks perform better than one global network, but
2 * Pr ecision * Coverage (6)
only for some of the profiles. F1 =
Pr ecision + Coverage
4. EXPERIMENTAL RESULTS
4.1 Mining User profiles from Anonymous
Web Usage Data
Figure 6: Architecture of an Auto- Hierarchical Unsupervised Niche Clustering (H-UNC) [6] was
Associative Memory Hopfield URL- applied on a set of web sessions preprocessed from the 12 day
Predictor Neural Network that can be access log data of the Web site of the department of Computer
used in CUSA-2-step-Rec Engineering and Computer Sciences at the University of
Missouri, Columbia. After filtering out irrelevant entries, the data
3.4 Evaluating the Recommender Systems was segmented into 1703 sessions accessing 343 distinct URLs.
The maximum elapsed time between two consecutive accesses in
First a data set consisting of real web user sessions extracted from the same session was set to 45 minutes. We applied H-UNC to the
the log files of a Web server is used to generate training and Web sessions using a maximal number of levels, L = 5, in the
testing sets. For each complete session considered as the ground- hierarchy, and the following parameters that control the final
truth, all the possible sub-sessions of different sizes are generated resolution: Nsplit = 30, and σsplit = 0.1. H-UNC partitioned the Web
up to 9 URLs. If a user session consists of more than 10 URLs, users sessions into 20 clusters at level 5, and each cluster was
then we first randomly select 10 URLs and then use them as the characterized by one of the profile vectors, pi, i=0, ...,19. Some of
basis to form all sub-sessions. This is done in order to maintain these profiles are summarized in Table 1.
the nesting or inclusion relationship between sub-sessions of size
(k) and sub-sessions of size (k+1). This in turn will facilitate the
interpretation of trends in performance versus the sub-session
size.
Once the URL-prediction model is built, we need to evaluate its TABLE 1. SOME OF THE 20 DISCOVERED PROFILES
performance by presenting test sub-sessions to the trained No. size Relevant URLs/Profile Profile Description
network. The test dataset is generated similarly to the training 0 106 {0.29 - /staff.html} {0.92 - /} Main page, people,
dataset, but it forms an independent 20% of the complete set of {0.98 - /people.html} faculty, staff, degrees,
{0.99 - /people_index.html} research.
sub-sessions generated. {0.97 - /faculty.html}
{0.15 - /degrees.html}
Each actual completed session, sT, is treated as ground-truth, a {0.20 - /research.html}
subset of this session is treated as incomplete current sub-session, {0.35 - /grad_people.html}
{0.26 - /undergrad_people.html}
sj, and the computed recommendations are treated as predicted
1 104 {0.99 - /} Main page only
complete session. This process is very similar to an information {1.00 - /cecs_computer.class}
retrieval problem. Hence the evaluation proceeds by computing 2 {0.81 - /} Main page and course
{0.75 - /cecs_computer.class} pages
the precision and the coverage of the recommendations. {0.87 - /courses.html}
{0.90 - /courses_index.html}
Let r*j = rj - sj. This corresponds to the recommendations {0.88 - /courses100.html}
obtained after omitting all URLs that are part of the current {0.22 - /courses300.html}
{0.16 - /courses_webpg.html}
subsession. Also, let s*j = sT - sj, be the set of ground truth URLs, {0.16 - /~lan/cecs353}
not including the ones in the current subsession being processed 3 61 {0.80 - /} {0.48 - /degrees.html} Main page and graduate
{0.23 - /degrees_grad.html} degrees
for recommendations. The recommendation precision is given by {0.23 - /degrees_grad_index.html}
37
{0.23 - /deg_grad_genor.html} We notice that the single-step recommender systems (CSA-1-
4 58 {0.72 - /} Main page,
{0.97 - /degrees_undergrad.html} undergraduate courses, step-Rec) do not have this nice feature, i.e., precision and
{0.95 - /degrees_index.html} degrees, especially coverage will generally have opposing trends. However the
{0.97 - /bsce.html} {0.52 - /bscs.html} undergraduate degrees.
{0.34 - /bacs.html} {0.34 - /courses.html} single-step neural network profile prediction recommender seems
{0.34-/courses100.html} {0.26 - /general.html} to have a rather increasing precision, unlike the other single-step
{0.22- /courses300.html}{0.81 - /degrees.html}
{0.19 - /courses200.html} methods (nearest profile and decision trees). Again, this may be
attributed to the fact that the more sophisticated neural network
13 38 {0.47 - /~shi} CECS345 Java succeeds in learning the true profile/class better regardless of the
{0.82 - /~shi/cecs345} examples/references
{0.21 - /~shi/cecs345/java_examples} session size.
{0.34 - /~shi/cecs345/references.html} The performance of k-NN fares competitively with all the
single-step recommender strategies, but only for longer session
sizes. This is not surprising, considering that k-NN can yield very
4.2 Comparative Simulation Results for accurate predictions, because it too is based on local context-
CUSA-2-step-Rec, CUSA-2-step-Rec, and sensitive models. However, k-NN is notorious for its excessive
K-NN computational and memory costs, at recommendation time, in
We used the following parameters in training the multilayer contrast to all the other investigated techniques. While lazy in the
perceptron URL-Predictor neural networks: Maximum number of learning phase, involving nothing more than storing all the
epochs = 2000, Learning Rate = 0.7 (for Input to Hidden layer) previously seen cases, k-NN takes its toll during the
and 0.07 (for Hidden to Output layer). This parameter is recommendation phase, when it needs to compare a new session
important in learning the network as it gives the amount of change with all the historic cases in order to produce recommendations.
of weights at each step. These optimal values were selected by Based on the F1 measure, shown in Figure 9, that combines
trial and error, because with a very small value, it may take a long both precision and coverage equally, we can conclude that the
time for the algorithm to converge; while a large value will make single-step recommender systems vary in performance depending
the algorithm diverge by bouncing around the error surface. The on the range of session sizes. The single-step Nearest Profile
learning rate is set higher for the connections from input to hidden predictor recommender system works best for very small session
layer when compared to the one set for hidden to output layer. sizes (less than 2). The single-step Decision Tree predictor
This makes the learning slower and more accurate for the output recommender system works best for session sizes in the range 3-
layer nodes. We have also used a Momentum factor of 0.5, to 5, and then competes very closely with the single-step Neural
accelerate learning while reducing unstable behavior. Network predictor recommender system above session size 5.
While k-NN outperforms the single-step recommender systems,
The Collaborative filtering approach is based on using K Nearest the two-step profile-specific URL-predictor neural network
Neighbors (K-NN) followed by top-N recommendations for recommender system wins in terms of both better precision and
different values of K and N. First the closest K complete sessions better coverage, particularly above session size 2. This suggests
from the entire history of accesses are found. Then the URLs using a combination of different recommender modules that are
present in these top K sessions are sorted in decreasing order of alternated based on session size, even within the lifetime of a
their frequency, and the top N URLs are treated as the single user session.
recommendation set. We show only the best results obtained for
K-NN at K=50 neighbors and N=10 URLs. Finally Fig. 10 shows the averaged F1 measure values for the
sessions belonging to each profile separately, when the CUSA-2-
Figures 7 through 9, depicting the 20-profile averaged measures step-Rec strategy is used. The label for each profile is indicated
(precision, coverage, and F1), show that the two-step profile- close to its own curve. We notice that performance for most
specific URL-predictor multilayer perceptron neural network profiles is very good, except for profile 3. By looking at Table 1,
recommender system (CUSA-2-step-Rec) wins in terms of both profile 1 is seen to grab some URLs with low URL significance
better precision and better coverage, particularly above session weight. This confirms the presence of some sessions, that while
size 2. This appeared at first unusual since it is very rare that a sharing the main profile URLs, also have additional URLs that
recommendation strategy scores this high on both precision and altogether are accessed very infrequently. These additional URLs
coverage, and that an increase in precision did not seem to are very hard to model in the URL prediction network, because
compromise coverage in any way. However, by looking at the the overwhelming majority of the sessions in this profile do not
details of the design of the profile-specific URL-predictor neural contain them. In fact, using the robust session identification
network, we explain this relentless increase in precision by the approach described in [6], we were able to identify that only half
fact that the neural network output is trained to predict only the of the sessions in profile 3 form a strong consensus of two URLs.
URLs that the user has not seen before, i.e. ‘S-ss’, where S is the The remaining sessions cannot be modeled effectively even with a
complete session, and ss is the sub-session (URLs visited by the highly specialized URL prediction model. Since we did not screen
user). Clearly, as the sub-session size increases, more URLs are these widely varying noise sessions from the recommendation
presented to the output of the Neural network, making the process, their effect will be seen. Exploiting the robustness feature
prediction task easier, since fewer URLs need to be predicted of HUNC [6] is an interesting and worthwhile step towards
compared to smaller input sub-sessions. Similarly, coverage greatly enhancing the quality of the recommendations, and we
increases, since with more input URLs, the neural network is able leave this open for future investigation.
to predict more of the missing URLs to complete the puzzle.
However, this does not happen at the expense of precision. On the Fig. 11 shows the averaged F1 measure values for the
contrary, in this special type of URL prediction neural network, sessions belonging to each profile separately, when the K-NN
giving more hints about the user in the form of more of the visited strategy is used. We can see a more sporadic behavior of the
URLs makes the prediction task easier, and hence, will only result quality of the recommendations for different sizes, witnessing to
in more accurate predictions. the inadequacy of a global (i.e. across all profiles) collaborative
38
neighborhood model for some profiles and session sizes,
especially in such a non-metric, noisy, and sparse high Coverage
dimensional space. Note that when the measure curves dive to
zero at a certain session size, this means that there were no 0.9
sessions of longer size, and does not reflect the quality of
recommendations. 0.8
Most importantly, we notice that the widely varying
performance for different profiles suggests that different 0.7
recommendation strategies perform best for different profiles, and 2-step Profile-specific NN
it may be beneficial to combine different strategies depending not 0.6 K-NN(K=50,N=10)
only on the current session size, as we have noticed previously,
but also depending on the identified profile, yielding a 1-step Decision Tree
0.5
sophisticated hybrid meta-recommendation system.
C o v e ra g e
1-step Neural Network 1-step Nearest Profile-Cosine
0.4
1-step Nearest Profile-WebSession
Precision
0.3
1
0.2
0.9
0.1
2-step Profile-specific NN
0.8
0
0 1 2 3 4 5 6 7 8 9 10
0.7 K-NN(K=50,N=10) subsession size
0.6 Figure 8: Coverage Values for all recommendation strategies
(CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN)
P r e c is io n
0.5
1-step Decision Tree
0.4 F1-Measure (All Methods)
1-step Nearest Profile-Cosine 1-step Nearest Profile-
1-step Neural Network
0.9
0.3
0.8
0.2
0.7
0.1 2-step Profile-specific NN
0.6
0
1-step Nearest Profile-Cosine K-NN(K=50,N=10)
0 1 2 3 4 5 6 7 8 9 10
0.5
F 1 -M e a s u re
subsession size 1-step Decision Tree
0.4
1-step Neural Network
Figure 7: Precision Values for all recommendation strategies 1-step Nearest Profile-WebSession
(CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN) 0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
subsession size
Figure 9: F1-Measure Values for all recommendation
strategies (CSA-1-step-Rec, CUSA-2-step-Rec, and K-NN)
39
4.3 Time and Memory Complexity
Considerations
From the point of view of cost or time and memory complexity, it
is interesting to note that training may take longer with the profile
based approaches. However this training can be done on a back
end computer and not on the server, and is therefore an offline
process that does not affect the operation of the web server. On
the other hand, all profile based approaches are extremely fast at
recommendation time and require a minimal amount of main
memory to function since they work with a mere summary of the
previous usage history instead of the entire history as in k-NN
collaborative filtering [15]. In our simulations, the proposed
profile-based recommendations with non-optimized Java code
running on a 1.7 GHz Pentium IV PC took only a small fraction
of a second to provide one recommendation, resulting in the order
of 10-100 recommendations per second depending on the strategy
used1. The far more computationally complex K-Nearest
Neighbor collaborative filtering generated recommendations at a
leisurely 2 recommendations per second.
5. CONCLUSIONS
Figure 10: Individual averaged F1-Measure Values for each
We have investigated several single-step and two-step
profile with the CUSA-2-step-Rec strategy
recommender systems. The single-step recommender systems
(CSA-1-step-Rec) simply predict the URLs that are part of the
nearest estimated profile as recommendations. The nearest profile
F1-Measure (K-NN k=50,N=10) prediction model simply based its recommendations on the closest
profile based on a similarity measure. In order to be able to
0.8 reliably map new unseen sessions to a set of mined profiles,
without such assumptions about the profiles or how they separate
11 the sessions, we can resort to more powerful classification
0.7 methods. In this paper, we explored both decision trees and neural
6 networks for this task. Once trained, using the decision tree or
neural network model to classify a new session constitutes the
0.6 3
0 single step of the recommendation process, since the classified
4
2 profile is the recommendation set.
1
0.5
5 The two-step recommender system (CUSA-2-step-Rec) first maps
F 1 -m e a s u re
a user session to one of the pre-discovered profiles, and then uses
0.4 one of several profile-specific URL-predictor neural networks in
the second step to provide the final recommendations. Based on
15
this classification, a different recommendation model is designed
0.3
for each profile separately. A specific back-propagation neural
10
13 network was trained offline for each profile in order to provide a
14 profile-specific recommendation strategy that predicts web pages
0.2
of interest to the user depending on their profile. The two-step
recommendation method achieves unprecedented high coverage
16
0.1 and precision compared to K-NN and single-step nearest-profile
recommendations. Finally, we note that the proposed
recommendations are low in cost compared to collaborative
0
filtering: They are faster and require lower main memory at
0 1 2 3 4 5 6 7 8 9 10
recommendation time (no need to store or compare to a large
subsession size
number of instances). Even though training neural networks to
Figure 11: Individual averaged F1-Measure Values for each
1
profile with the K-NN (K = 50, N = 10) strategy Here, 1 recommendation per second is actually a full set of
recommendations for a given session and not individual URLs
per second.
40
build a recommendation system may take a long time offline, [4] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi. Extracting Web
testing this built network on a new user may take just a small User Profiles Using Relational Competitive Fuzzy Clustering,
fraction of a second. International Journal on Artificial Intelligence Tools, Vol. 9, No. 4, pp.
509-526, 2000.
Unlike most previous work, the proposed two-step [5] O. Nasraoui, and R. Krishnapuram, A Novel Approach to
profile-specific URL-predictor neural network recommender Unsupervised Robust Clustering using Genetic Niching, Proc. of the 9th
system allows a more refined context sensitive recommendation IEEE International Conf. on Fuzzy Systems, San Antonio, TX, May 2000,
pp. 170-175.
process. The idea of using a separate network specialized to each
[6] O. Nasraoui and R. Krishnapuram. A New Evolutionary Approach to
profile seems to be novel, since it provides an even higher level of Web Usage and Context Sensitive Associations Mining, International
context-awareness in personalization than the level already Journal on Computational Intelligence and Applications - Special Issue on
offered through collaborative filtering based personalization. The Internet Intelligent Systems, Vol. 2, No. 3, pp. 339-348, Sep. 2002.
proposed method simultaneously achieved precision and coverage [7] M. Pazzani and D. Billsus, Learning and revising User Profiles: The
values of unprecedented quality (higher than 0.6 for most session identification of Interesting Web Sites, Machine Learning, Arlington, 27,
sizes). Most existing techniques tend to excel in one measure pp. 313-331, 1997.
(such as precision), and fail in the other (example, coverage). [8] Kraft, D.H., Chen, J., Martin-Bautista, M.J., and Vila, M.A., Textual
Information Retrieval with User Profiles Using Fuzzy Clustering and
It is important to note that the divide-and-conquer Inferencing, in Szczepaniak, P.S., Segovia, J., Kacprzyk, J., and Zadeh,
strategy adopted in the 2-step approach to break what would be an L.A. (eds.), Intelligent Exploration of the Web, Heidelberg, Germany:
unmanageable learning task (URL prediction for all sessions Physica-Verlag, 2002.
regardless of their profile) into several simpler learning tasks [9] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, Effective
(each profile separately) was the key to both (i) enabling faster personalizaton based on association rule discovery from Web usage data,
ACM Workshop on Web information and data management, Atlanta, GA,
learning, (ii) enabling better performance by reducing interactions
Nov. 2001.
between different sessions. It is reasonable to expect that this [10] J. H. Holland. Adaptation in natural and artificial systems. MIT
modular design could be extended by replacing the URL- Press, 1975.
Predictor neural network modules by different learning paradigms [11] L. Zadeh (1965). Fuzzy sets. Inf. Control 8, 338-353.
that are faster to train, while not compromising the accuracy of [12] Klir, G. J., and Yuan, B., Fuzzy Sets and Fuzzy Logic, Prentice Hall,
predictions. Such learning modules could be decision trees. The 1995, ISBN 0-13-101171-5.
proposed model could also be made even faster to train and more [13] R. Agrawal and R. Srikant (1994), Fast algorithms for mining
accurate by encouraging the discovery of even more high- association rules, Proceedings of the 20th VLDB Conference, Santiago,
Chile, pp. 487-499.
resolution profiles, which is specifically one of the desirable
[14] G. Linden, B. Smith, and J. York, Amazon.com Recommendations
features of Hierarchical Unsupervised Niche Clustering. Iem-to-item collaborative filtering, IEEE Internet Computing, Vo. 7, No.
We finally classify our recommendation approaches 1, pp. 76-80, Jan. 2003
with respect to the two-dimensional taxonomy presented in [16]. [15] J. Breese, H. Heckerman, and C. Kadie, Empirical Analysis of
First, because the user is anonymous at all times, our approaches Predictive Algorithms for Collaborative Filtering, Proc. 14th Conf.
Uncertainty in Artificial Intelligence, pp. 43-52, 1998.
are all ephemeral with respect to the persistence dimension.
[16] J.B. Schafer, J. Konstan, and J. Reidel, Recommender Systems in E-
Second, with respect to the automation dimension, our approaches Commerce, Proc. ACM Conf. E-commerce, pp. 158-166, 1999.
are fully automatic. Furthermore, with regard to the four different [17] J. Srivastava, R. Cooley, M. Deshpande, and P-N Tan, Web usage
families of recommendation techniques identified in [16] (non- mining: Discovery and applications of usage patterns from web data,
personalized, attribute based, item-to-item correlation, and SIGKDD Explorations, Vol. 1, No. 2, Jan 2000, pp. 1-12.
people-to-people correlation), the 1-step recommenders (CSA-1- [18] O. Zaiane, M. Xin, and J. Han, Discovering web access patterns and
step-Rec) can be considered as people-to people collaborative trends by applying OLAP and data mining technology on web logs, in
filtering. However, they use a cluster/profile summarization "Advances in Digital Libraries", 1998, Santa Barbara, CA, pp. 19-29.
[19] M. Spiliopoulou and L. C. Faulstich, WUM: A Web utilization Miner,
model, hence providing better scalability. On the other hand, the
in Proceedings of EDBT workshop WebDB98, Valencia, Spain, 1999.
CUSA-2-step-Rec model uses people-to people collaborative [20] J. Borges and M. Levene,Data Mining of User Navigation Patterns, in
filtering that is summarized through a cluster model, in the first "Web Usage Analysis and User Profiling”, Lecture Notes in Computer
stage to map a new user to a profile. Then it uses a specialized Science", H. A. Abbass, R. A. Sarker, and C.S. Newton Eds., Springer-
item-to-item recommendation model to produce the final Verlag , pp. 92-111,1999.
recommendations. Hence the CUSA-2-step-Rec approach can be [21] S. Haykin, Neural Networks: A Comprehensive Foundation,
considered as a hybrid between people-to-people and item-to-item Macmillan, New York, 1994.
recommendations, which may explain its superior performance. [22] J. R. Quinlan. Induction of Decision Trees. Machine Learning, Vol. 1,
pp. 81--106, 1986.
Acknowledgements
This work is supported by National Science Foundation CAREER Award
IIS-0133948 to O. Nasraoui.
References
[1] M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically
learning for user access pattern. Proc. 6th int. WWW conference, 1997.
[2] R. Cooley, B. Mobasher, and J. Srivastava, Web Mining: Information
and Pattern discovery on the World Wide Web, Proc. IEEE Intl. Conf.
Tools with AI, Newport Beach, CA, pp. 558-567, 1997.
[3] O. Nasraoui and R. Krishnapuram, and A. Joshi. Mining Web Access
Logs Using a Relational Clustering Algorithm Based on a Robust
Estimator, 8th International World Wide Web Conference, Toronto, pp. 40-
41, 1999.
41