<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Seo, Y. W. and Zhang, B. T. Personalized Web-Document
Filtering Using Reinforcement Learning, Applied Artificial
Intelligence</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Intelligent Agent for e-Tourism: Personalization Travel Support Agent using Reinforcement Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anongnart Srivihok</string-name>
          <email>anongnart.s@ku.ac.th</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Faculty of Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kasetsart University</institution>
          ,
          <addr-line>Bangkok 10900</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2001</year>
      </pub-date>
      <volume>665</volume>
      <abstract>
        <p>Web personalization and one to one marketing have been introduced as strategy and marketing tools. By using historical and present information of customers, organizations can learn, predict customer's behaviors and develop products to fit potential customers. In this study, a Personalization Travel Support System is introduced to manage traveling information for user. It provides the information that matches the users' interests. This system applies the Reinforcement Learning to analyze, learn customer behaviors and recommend products to meet customer interests. There are two learning approaches using in this study. First, Personalization Learner by Group Properties is learning from all users in one group to find the group interests of travel information by using given data on user ages and genders. Second, Personalization Learner by User Behavior: user profile, user behaviors and trip features will be analyzed to find the unique interest of each web user. The results from this study reveal that it is possible to develop Personalization Travel Support System. Using weighted trip features improve effectiveness and increase the accuracy of the personalized engine. Precision, Recall and Harmonic Mean of the learned system are higher than the original one. This study offers useful information regarding the areas of personalization of web support system.</p>
      </abstract>
      <kwd-group>
        <kwd>Personalization</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>intelligent agent</kwd>
        <kwd>recommendation algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>At present information technology (IT) plays an important role
in working environments, many organizations use IT as a tool
in making their business run smoother and competing faster in
the market. In many industries, the Internet and WWW have
significant roles in business processes. Online business is more
competitive than traditional one since there are plenty of low
cost online stores offering products and services on the Internet.
Further, customer royalty for online business is low comparing
to traditional market so that it is challenging for a company to
attract new and keep customers in e-Commerce. Traditional
marketing is not always successful on the Internet, and thus
more specific online system such as one-to-one marketing
should be helpful. In order to be more competitive on the</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORKS</title>
      <p>Joachims et al. (1997) developed Web Watcher Program that
analyzed user’s interactions with specific websites. In this
program, a Reinforcement Learning theory was adopted. The
purpose is to offer the most suitable information to user by
showing links in HTML.</p>
      <p>The WAIR system [3] proposed information filtering
techniques, by using reinforcement learning program. The
system learnt the user’ interests by observing his or her
behaviors while interacting with the system. Then personalized
information was provided to target users. Comparing with the
other techniques, it was found that Reinforcement learning
technique was the most efficient in information retrieval.
Yuan introduced the comparison shopping system [6] which
supported the personalization system. Comparison shopping
feature keeps the record of users, analyzes users’ behavior,
manage the record and gives the reward to the products based
on those records. This method is called Temporal Difference
Reinforcement Learning, which is one of the effective
Reinforcement Learning process.</p>
    </sec>
    <sec id="sec-3">
      <title>3. DESIGN OF PERSONALIZATION</title>
    </sec>
    <sec id="sec-4">
      <title>TRAVEL SUPPORT ENGINE</title>
      <p>The characteristic of reinforcement learning [5] is a
trial-anderror feature. A reward will be given when the answer to a
question is correct, while the penalty will be awarded when
there is an error. This goal-oriented approach is to explore
personal interests by maximizing the reward to the item which
user concerns and awarding the penalty to the items that user
does not concern.</p>
      <p>Environment (state): A trip list which users can select
Agent: An agent records data from user behaviors on
clicking and reading on the web sites. Then it analyzes users’
interests, and gives rewards and/or penalties.</p>
      <p>Action: Filtering the travel list according to the
agent’s analysis.</p>
      <p>Reward: Assign a value for the state that a user selects
to perform.</p>
      <p>Then, the engine offers a trip information to
determine the user’s interest and records the interactions and
behaviors from the last surfing including clicking characteristics
in browsing travel information.</p>
    </sec>
    <sec id="sec-5">
      <title>Personalization</title>
    </sec>
    <sec id="sec-6">
      <title>Structure</title>
    </sec>
    <sec id="sec-7">
      <title>Travel</title>
    </sec>
    <sec id="sec-8">
      <title>Support</title>
    </sec>
    <sec id="sec-9">
      <title>Engine</title>
      <p>User</p>
      <sec id="sec-9-1">
        <title>Interfacewebsite</title>
        <p>TripData
Database
Userbehavior
Logvisit</p>
      </sec>
      <sec id="sec-9-2">
        <title>PersonalizationLearnerby</title>
      </sec>
      <sec id="sec-9-3">
        <title>UserBehavior</title>
        <p>PersonalizationLearner
User Profile</p>
        <p>Database</p>
      </sec>
      <sec id="sec-9-4">
        <title>PersonalizationLearner by</title>
      </sec>
      <sec id="sec-9-5">
        <title>GroupProperties</title>
      </sec>
      <sec id="sec-9-6">
        <title>PersonalizationRanking</title>
        <p>In this part, users can surf and view any websites. PTS records
the information that the web users always visit, analyzes the
user behaviors from each visit. Then system offers the trip
information that matches the user’s unique requirements.</p>
        <p>Personalization Learner is the process of learning and
analyzing of website usage behavior to understand
user’s interest.</p>
        <p>Personalization Ranking. Its function is to rank the
trip information for the web users. The work process
3.</p>
        <p>is based on the initial weight of learning and the
user’s interests on each trip.</p>
        <p>User Profile Database. This is the database of web
users, which is operated for travel management.
Depending on the user’s behaviors, the database will
be processed in mapping the trip list to the user’s
requirements. Profile database is categorized into two
types: User’s properties data and User’s behavior.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Personalization Learner</title>
      <p>To perceive individual user’s interests, one has to
study user’s behaviors by means of the information from the
Interface Web Site that records two categories of data.
1.</p>
      <p>Web user profile includes user name, age, and
sex.</p>
      <p>2. Traveling Information includes identification
number, duration, categories, trip lowest price, trip highest price
and destination country.</p>
      <p>There are two learning approaches using in this study:
personalization learner by group properties and by user
behavior.</p>
      <p>
        Personalization Learner by Group Properties: System learns
from all users in one group to find the group interests of travel
information by using given data on user ages and genders.
Personalization Learner by User Behavior: Recorded data is
analyzed with user behaviors and the travel information in order
to find the unique interest of each web user. Reinforcement
learning algorithm, called Q Learning is applied at this stage.
Q Learning is used to maximize a reward to the item on the list
which is clicked and award a penalty to the item that is not
clicked, as shown in Eq. (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
^
Q ( s t , a t ) ← α ⎢⎡ r + γ
⎣
max
a t +1
^ ⎤
Q ( s t + 1, a t + 1) ⎥
⎦
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Whereas max Q is defined as:
if user clicks the provided trip information
if user doesn’t click the trip information on the web
site, where n is total number of trips per page
trips information on the database which are not
recommended by the system, where p is the total
number of trips in the system
1
-1/n
1/p
given α is the learning rate valued at 0.2, and it is the
discount rate valued at 0.8
      </p>
    </sec>
    <sec id="sec-11">
      <title>Trip features</title>
      <p>
        Trip features associate to user interests in tourist programs, they
are as follows: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Trip Duration (Qt) is numbers of days
offering by each trip. (2) Trip Categories (Qc) is type of trip
including shopping, eco tour, scuba diving and trekking. (3)
Trip Lowest Price (Qmp) is the lowest prices for trip expenses.
(4) Trip Highest Price (Qxp) is the lowest prices for trip
expenses. and (5) Trip Destination (Qd) is the country of
visitation.
      </p>
    </sec>
    <sec id="sec-12">
      <title>Personalization Ranking</title>
      <p>The display area for Personalization Ranking was divided into
two parts. Part one is the main box. When a user explores a
website to find any travel information, the engine will rank the
trip by using reinforcement theory and given data from group
1
2
3
4
5
6
7
8
9</p>
      <sec id="sec-12-1">
        <title>Thai Gulf-Koh TaoKoh Nang YuanChumphon</title>
      </sec>
      <sec id="sec-12-2">
        <title>Rafting Kheg RiverKang Song WaterfallPitsanulok</title>
      </sec>
      <sec id="sec-12-3">
        <title>Discovery Pattaya Package (3D2N)</title>
      </sec>
      <sec id="sec-12-4">
        <title>Wonderful</title>
        <p>Similan Island</p>
      </sec>
      <sec id="sec-12-5">
        <title>Thai:</title>
      </sec>
      <sec id="sec-12-6">
        <title>Mae Sot Package 3 days 2 nights</title>
      </sec>
      <sec id="sec-12-7">
        <title>Loei Package 3 days 2 nights</title>
      </sec>
      <sec id="sec-12-8">
        <title>Kanchanaburi Night Safari Tour 2 days</title>
      </sec>
      <sec id="sec-12-9">
        <title>Kanchanaburi</title>
        <p>Health 2days</p>
      </sec>
      <sec id="sec-12-10">
        <title>Good 10</title>
      </sec>
      <sec id="sec-12-11">
        <title>Rafting Hin Peang,</title>
        <p>Winery, Water fall
Mo Koh Surin
properties, fundamental data that the all user registers such as
ages and genders and historical data when visiting the websites.
Part two is the Recommend Box. When a user explores a
website to find any travel information, the engine will display
trip information randomly at the first visit. After that it will
display travel information which has been analysed, and learned
from historical user transactions, and trip database. The travel
information which is top five ranking will be offered on the web
page.</p>
        <p>The ranking score is evaluated from the equation:
Qr = WtQt+WxpQxp+WmpQmp+WcQc+WdQd
The first approach is learning by user behavior. The Qt, Qxp,
Qmp, Qc and Qd are calculated by using input data from user
transactions on surfing PTS web sites and Q learning equation.
Wt, Wxp, Wmp, Wc, and Wd are weights of each feature
obtained from learning. After that the total score (Qr) is the
summation of Qt, Qxp, Qmp, Qc and Qd multiply their
corresponded weights. Next Qr score from each trip is ranked
in descending order. The five maximum Qr scores are selected
and recommended for trips to the users on PTS web sites.
For the second approach is learning by group property or
clustering users by ages and sex. The ranking of trip provided to
users is depended on user profile and user behaviors or web
surfing transactions. In this approach users are clustered into
group by using age and gender. Then, the value of interesting
trip in each group is calculated by using user behavior or
transaction on PTS web site. The process of trip ranking in this
approach is the same as the above paragraph. The recommended
trips are shown in Figure 3. Area number 1 which is in the
middle of web page is the main box. Area number 2 which is in
the right hand sight is the recommended box.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>4. EXPERIMENTAL RESULTS</title>
      <p>This experiment describes the prototype of the personalization
support engine which is implemented for recording, and
analysing the user interactions and behaviors. Then this engine
presents and recommends interesting trips to user. User profile
includes user name, age and gender. The trip list includes
Categories (art and culture, diving, shopping, ….and eco tour),
Country (Thailand, Nepal, China), Duration (3, 4, 5 days),
Minimal Price (400 bahts), and Maximal Price (10000 bahts).
The prototype of the PTS engine implemented in this study
include approximately 100 trips. In each transaction, PTS
automatically provides five trips in Recommend Box and 10
trips in Main box. In this experiment, there is 115 participants
includes 73 males and 35 females. They are undergraduate
students in one Thai university.</p>
      <p>Users have accessed PST at least two times, given the time
different from the first and second access is at least 24 hours.
Weights of five features have been calculated from user
behaviors and trip profile on PST. Results show that trip
destination feature has maximum weight (0.27). The second
largest is trip minimum price weight (0.23). The third one is trip
maximum price weight (0.19). The fourth is trip category
weight (0.19). Lastly, trip duration weight is about 0.14. Then
all feature weights have been assembled in the following
equation.</p>
    </sec>
    <sec id="sec-14">
      <title>Evaluation of System Effectiveness</title>
      <p>The purpose of this evaluation is to test the performance of the
personalization support engine. In this study, we used precision
recall and harmonic mean to estimate the system effectiveness.
Precision is the ratio of interested trips over the total number of
recommended trips. Precision is calculated by dividing the
number of trips that users click on the personalization engine by
the number of recommended trips. While, recall is the ratio of
trip interested users over the total number of clicked trips.
Recall is calculated by dividing number of recommended trips
by number of clicked trips in user’s transaction. Finally, F1 is
also used to represent the effects of combining precision and
recall via the harmonic mean (F1) function. F1 is calculated
from the product of two multiplied by precision and recall then
divided by the sum of precision and recall. F1 assumes a high
value only when precision and recall are both high.
Accordingly, Table 3 depicts the effectiveness of the engine by
comparing precision, recall and F1 values evaluated from user
click stream before and after learning. The precision is 0.34 for
the unlearned system (first access). After twenty four hours the
system has been leaned by using Q learning, then users access
PTS for the second time. The precision for the second access
has been increased to 0.50 (about 47.06%). This pattern is the
same for recall (0.50 for first access and 0.65 for second access)
and harmonic mean values (0.40 for first access and 0.57 for
second access). Thus, the growth rate for both precision and
recall increase about 47% and 30%, respectively.</p>
      <p>As well, Srikumar (2004) studied on personalized product
selection of user behaviors on the Internet. System performance
has been evaluated by using recall which is about 0.64. The
recall for Srikumar’s system is close to PTS’s which is about
0.65. Unfortunately, the former study used only one dimension
measurement, recall. So it can not conclude that among the two
studies which personalisation systems has better performance in
terms of both precisions and recalls.</p>
    </sec>
    <sec id="sec-15">
      <title>5. CONCLUSIONS</title>
      <p>In this study, the personalized support system that recommends
trips for tourists based on user behaviors and group properties
has been proposed. The system starts learning from user profile,
trip database and user historical transactions in accessing PTS
web sites. The learning process is using a Q-learning equation
which is based on the reinforcement theory. The main concept
of the system is that users can surf on the PTS web site to find
out interesting trips. Then the top five trips are suggested for
users after all candidate trips are ranked in terms of multiple
criteria, these trips may be dynamically changed according to
user behavior on PTS sites. Results show that both precision
and recall of the system had been improved after the system had
learned from user transactions and databases. With
recommended trips based on significant data of user surfing
and profile, it has the potential to increase the success rate of
product promotion, and user acceptance.</p>
      <p>Focusing on user’s interest gives the satisfied results since the
information offered to the users is based on historical data and
statistical analysis. The advantages of Reinforcement Learning
Algorithm is due to its simplicity, quickness and easy to
implement. Since there is no need to find the best travel list but
it provides the most appropriate information at the current time.
Comparing to the traditional manual system which takes longer
time and needs a lot of user supports.</p>
      <p>This prototype can be applied to business intelligent agent for
an e-Commerce. This agent can recommend interesting trips to
target users by personalized marketing for new trip or product
promotions. Enterprises can use this personalized or one to one
marketing to increase numbers of sales and services growth
through this channel.</p>
    </sec>
    <sec id="sec-16">
      <title>6. REFERENCES</title>
      <p>[6] Yuan, S. T. A personalized and integrative
comparisonshopping engine and its applications, Decision Support
Systems, 2003, 139-156.
[7] Weng, S. and Liu M. Feature-based Recommendations for
one-to-one marketing. Expert Systems with Application, 26,
2004, 493 – 508.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Changchien</surname>
            ,
            <given-names>S.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin-Feng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu-Jung</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>On-line personalized sales promotion in electronic commerce</article-title>
          ,
          <source>Expert Systems with Applications</source>
          ,
          <year>2004</year>
          ,
          <fpage>35</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>