Intelligent Agent for e-Tourism: Personalization Travel
        Support Agent using Reinforcement Learning

                       Anongnart Srivihok                                              Pisit Sukonmanee

      Department of Computer Science, Faculty of                        Department of Computer Science, Faculty of
                      Science,                                                          Science,
           Kasetsart University, Bangkok 10900                               Kasetsart University, Bangkok 10900
                    Phone 662 9428026-7                                              Phone 662 9428026-7
             email: anongnart.s@ku.ac.th                                       email: g4464007@ku.ac.th

                                                                     Internet marketing, it is compulsory to offer customers with
ABSTRACT                                                             products or services which match for each customer [1]. During
                                                                     the past few years online massive marketing by using a push
Web personalization and one to one marketing have been
                                                                     technology and informative websites always containing a great
introduced as strategy and marketing tools. By using historical
                                                                     deal of information have been introduced to users. The existing
and present information of customers, organizations can learn,
                                                                     search engines do not allow users to find the relevant
predict customer's behaviors and develop products to fit
                                                                     information easily. Due to these challenging, web
potential customers. In this study, a Personalization Travel
                                                                     personalization and one to one marketing have been introduced
Support System is introduced to manage traveling information
                                                                     to the e-commerce business, including tourist sector, retail,
for user. It provides the information that matches the users’
                                                                     banking and finance, and entertainments [7].
interests. This system applies the Reinforcement Learning to
analyze, learn customer behaviors and recommend products to          In this study Personalization Travel Support System is
meet customer interests. There are two learning approaches           introduced to arrange traveling information for users. This
using in this study. First, Personalization Learner by Group         system applies the Reinforcement Learning to analyze the
Properties is learning from all users in one group to find the       customer behaviors and studying customer interests.
group interests of travel information by using given data on user
ages and genders. Second, Personalization Learner by User
Behavior: user profile, user behaviors and trip features will be
                                                                     2. RELATED WORKS
analyzed to find the unique interest of each web user. The           Joachims et al. (1997) developed Web Watcher Program that
results from this study reveal that it is possible to develop        analyzed user’s interactions with specific websites. In this
Personalization Travel Support System. Using weighted trip           program, a Reinforcement Learning theory was adopted. The
features improve effectiveness and increase the accuracy of the      purpose is to offer the most suitable information to user by
personalized engine. Precision, Recall and Harmonic Mean of          showing links in HTML.
the learned system are higher than the original one. This study
offers useful information regarding the areas of personalization     The WAIR system [3] proposed information filtering
of web support system.                                               techniques, by using reinforcement learning program. The
                                                                     system learnt the user’ interests by observing his or her
Keywords: Personalization, Reinforcement Learning, intelligent       behaviors while interacting with the system. Then personalized
agent, recommendation algorithm                                      information was provided to target users. Comparing with the
                                                                     other techniques, it was found that Reinforcement learning
1. INTRODUCTION                                                      technique was the most efficient in information retrieval.

At present information technology (IT) plays an important role       Yuan introduced the comparison shopping system [6] which
in working environments, many organizations use IT as a tool         supported the personalization system. Comparison shopping
in making their business run smoother and competing faster in        feature keeps the record of users, analyzes users’ behavior,
the market. In many industries, the Internet and WWW have            manage the record and gives the reward to the products based
                                                                     on those records. This method is called Temporal Difference
significant roles in business processes. Online business is more     Reinforcement Learning, which is one of the effective
competitive than traditional one since there are plenty of low       Reinforcement Learning process.
cost online stores offering products and services on the Internet.
Further, customer royalty for online business is low comparing
to traditional market so that it is challenging for a company to
                                                                     3. DESIGN OF PERSONALIZATION
attract new and keep customers in e-Commerce. Traditional            TRAVEL SUPPORT ENGINE
marketing is not always successful on the Internet, and thus
                                                                     The characteristic of reinforcement learning [5] is a trial-and-
more specific online system such as one-to-one marketing
                                                                     error feature. A reward will be given when the answer to a
should be helpful. In order to be more competitive on the
                                                                     question is correct, while the penalty will be awarded when
                                                                     there is an error. This goal-oriented approach is to explore
WWW 2005, May 10--14, 2005, Chiba, Japan.                            personal interests by maximizing the reward to the item which
                                                                     user concerns and awarding the penalty to the items that user
                                                                     does not concern.
                                                                              Environment (state): A trip list which users can select
           Agent: An agent records data from user behaviors on                                                is based on the initial weight of learning and the
clicking and reading on the web sites. Then it analyzes users’                                                user’s interests on each trip.
interests, and gives rewards and/or penalties.
                                                                                                         3.   User Profile Database. This is the database of web
          Action: Filtering the travel list according to the                                                  users, which is operated for travel management.
agent’s analysis.                                                                                             Depending on the user’s behaviors, the database will
                                                                                                              be processed in mapping the trip list to the user’s
          Reward: Assign a value for the state that a user selects
                                                                                                              requirements. Profile database is categorized into two
to perform.
                                                                                                              types: User’s properties data and User’s behavior.
         Then, the engine offers a trip information to
determine the user’s interest and records the interactions and
behaviors from the last surfing including clicking characteristics                                Personalization Learner
in browsing travel information.
                                                                                                            To perceive individual user’s interests, one has to
Personalization                      Travel                 Support                      Engine   study user’s behaviors by means of the information from the
Structure                                                                                         Interface Web Site that records two categories of data.
                                                                                                              1.   Web user profile includes user name, age, and
                                                                                                  sex.
                                                  User
                                                                                                            2. Traveling Information includes identification
                                                                                                  number, duration, categories, trip lowest price, trip highest price
                                            Interface website
                                                                                                  and destination country.
                                                                                                  There are two learning approaches using in this study:
                                                                                                  personalization learner by group properties and by user
                        User behavior                                  User Profile               behavior.
                          Log visit                                     Database
                                                                                                  Personalization Learner by Group Properties: System learns
 Trip Data                                                                                        from all users in one group to find the group interests of travel
 Database          Personalization Learner by                   Personalization Learner by        information by using given data on user ages and genders.
                        User Behavior                               Group Properties
                                                                                                  Personalization Learner by User Behavior: Recorded data is
               Personalization Learner                                                            analyzed with user behaviors and the travel information in order
                                                                                                  to find the unique interest of each web user. Reinforcement
                                        Personalization Ranking                                   learning algorithm, called Q Learning is applied at this stage.

Figure 1. Personalization Travel Support System Structure                                         Q Learning is used to maximize a reward to the item on the list
                                                                                                  which is clicked and award a penalty to the item that is not
In this part, users can surf and view any websites. PTS records                                   clicked, as shown in Eq. (1).
the information that the web users always visit, analyzes the
                                                                                                  Q ( s t , a t ) ← α ⎡ r + γ max Q ( s t + 1 , a t + 1 ) ⎤
                                                                                                   ^                                         ^                    (1)
user behaviors from each visit. Then system offers the trip
information that matches the user’s unique requirements.                                                                ⎢⎣          a t +1                   ⎥⎦
                                                                                                  Whereas max Q is defined as:
                                                                                                  1           if user clicks the provided trip information
                                                                                                  -1/n        if user doesn’t click the trip information on the web
                                                                                                              site, where n is total number of trips per page
                                                                                                  1/p         trips information on the database which are not
                                                                                                              recommended by the system, where p is the total
                                                                                                              number of trips in the system
                                                                                                  given    α      is the learning rate valued at 0.2, and it is the
                                                                                                  discount rate valued at 0.8

                                                                                                  Trip features
                                                                                                  Trip features associate to user interests in tourist programs, they
      Figure 2. web site provides travel information                                              are as follows: (1) Trip Duration (Qt) is numbers of days
                                                                                                  offering by each trip. (2) Trip Categories (Qc) is type of trip
                                                                                                  including shopping, eco tour, scuba diving and trekking. (3)
                                                                                                  Trip Lowest Price (Qmp) is the lowest prices for trip expenses.
The Personalization Travel Support System Structure includes
                                                                                                  (4) Trip Highest Price (Qxp) is the lowest prices for trip
the followings:
                                                                                                  expenses. and (5) Trip Destination (Qd) is the country of
      1.     Personalization Learner is the process of learning and                               visitation.
             analyzing of website usage behavior to understand
             user’s interest.                                                                     Personalization Ranking
      2.     Personalization Ranking. Its function is to rank the                                 The display area for Personalization Ranking was divided into
             trip information for the web users. The work process                                 two parts. Part one is the main box. When a user explores a
                                                                                                  website to find any travel information, the engine will rank the
                                                                                                  trip by using reinforcement theory and given data from group
properties, fundamental data that the all user registers such as    Table 2. The ranking values of trip calculated by using user
ages and genders and historical data when visiting the websites.    transactions as input data of Q-learning equation.
Part two is the Recommend Box. When a user explores a               Rank Trip Name                  Qt Qmp Qxp Qc           Qd    Qr
website to find any travel information, the engine will display
trip information randomly at the first visit. After that it will         Thai Gulf-Koh Tao-
display travel information which has been analysed, and learned     1    Koh Nang Yuan-
from historical user transactions, and trip database. The travel         Chumphon           0.410 0.100 0.522 0.001 0.410 1.421
information which is top five ranking will be offered on the web
                                                                         Rafting Kheg River-
page.
                                                                    2    Kang Song Waterfall-
The ranking score is evaluated from the equation:                        Pitsanulok           0.001 0.410 0.522 0.100 0.410 1.398
Qr = WtQt+WxpQxp+WmpQmp+WcQc+WdQd                                   3    Mo Koh Surin              0.190 0.100 0.522 0.100 0.410 1.300
The first approach is learning by user behavior. The Qt, Qxp,            Discovery    Pattaya
                                                                    4
Qmp, Qc and Qd are calculated by using input data from user              Package (3D2N)       0.001 0.410 0.522 0.001 0.410 1.299
transactions on surfing PTS web sites and Q learning equation.
Wt, Wxp, Wmp, Wc, and Wd are weights of each feature                     Wonderful         Thai:
                                                                    5
obtained from learning. After that the total score (Qr) is the           Similan Island            0.190 0.100 0.522 0.001 0.410 1.201
summation of Qt, Qxp, Qmp, Qc and Qd multiply their                      Mae Sot Package 3
corresponded weights. Next Qr score from each trip is ranked        6
                                                                         days 2 nights     0.001 0.100 0.522 0.001 0.410 1.001
in descending order. The five maximum Qr scores are selected
and recommended for trips to the users on PTS web sites.                 Loei Package 3 days 2
                                                                    7
                                                                         nights                0.001 0.100 0.522 0.001 0.410 1.001
For the second approach is learning by group property or
clustering users by ages and sex. The ranking of trip provided to        Kanchanaburi Night
users is depended on user profile and user behaviors or web         8
                                                                         Safari Tour 2 days 0.001 0.100 0.522 0.001 0.410 1.001
surfing transactions. In this approach users are clustered into
group by using age and gender. Then, the value of interesting            Kanchanaburi      Good
                                                                    9
trip in each group is calculated by using user behavior or               Health 2days              0.001 0.100 0.522 0.001 0.410 1.001
transaction on PTS web site. The process of trip ranking in this         Rafting Hin Peang,
approach is the same as the above paragraph. The recommended        10
                                                                         Winery, Water fall 0.001 0.001 0.522 0.100 0.410 0.990
trips are shown in Figure 3. Area number 1 which is in the
middle of web page is the main box. Area number 2 which is in
the right hand sight is the recommended box.
                                                                    Table 2 shows PTS analysis for one user. After learning from
                                                                    user transactions by using Q learning, value of trip features are
                                                                    as follows. The first rank ID 43: Thai Gulf-Koh Tao-Koh Nang
                                                                    Yuan-Chumphon which its Duration 4 days is 0.410, Minimal
                                                                    Price 4,500 bahts is 0.100, Maximal Price 4,500 bahts is 0.522,
                                                                    Categories: Beach Holiday is 0.001 and Country: Thailand is
                                                                    0.410. Total value is 1.421. This trip will be recommended to
                                                                    user firstly.
                                                                    Users have accessed PST at least two times, given the time
                                                                    different from the first and second access is at least 24 hours.
                                                                    Weights of five features have been calculated from user
                                                                    behaviors and trip profile on PST. Results show that trip
                                                                    destination feature has maximum weight (0.27). The second
                                                                    largest is trip minimum price weight (0.23). The third one is trip
                                                                    maximum price weight (0.19). The fourth is trip category
                                                                    weight (0.19). Lastly, trip duration weight is about 0.14. Then
 Figure 3. Travel information provided after learning.              all feature weights have been assembled in the following
                                                                    equation.
4. EXPERIMENTAL RESULTS                                             Qr = 0.14Qt + 0.19Qxp + 0.23Qmp + 0.17Qc+ 0.27Qd
This experiment describes the prototype of the personalization
support engine which is implemented for recording, and
analysing the user interactions and behaviors. Then this engine     Evaluation of System Effectiveness
presents and recommends interesting trips to user. User profile
includes user name, age and gender. The trip list includes          The purpose of this evaluation is to test the performance of the
Categories (art and culture, diving, shopping, ….and eco tour),     personalization support engine. In this study, we used precision
Country (Thailand, Nepal, China), Duration (3, 4, 5 days),          recall and harmonic mean to estimate the system effectiveness.
Minimal Price (400 bahts), and Maximal Price (10000 bahts).         Precision is the ratio of interested trips over the total number of
                                                                    recommended trips. Precision is calculated by dividing the
The prototype of the PTS engine implemented in this study           number of trips that users click on the personalization engine by
include approximately 100 trips. In each transaction, PTS           the number of recommended trips. While, recall is the ratio of
automatically provides five trips in Recommend Box and 10           trip interested users over the total number of clicked trips.
trips in Main box. In this experiment, there is 115 participants    Recall is calculated by dividing number of recommended trips
includes 73 males and 35 females. They are undergraduate            by number of clicked trips in user’s transaction. Finally, F1 is
students in one Thai university.                                    also used to represent the effects of combining precision and
recall via the harmonic mean (F1) function. F1 is calculated         and profile, it has the potential to increase the success rate of
from the product of two multiplied by precision and recall then      product promotion, and user acceptance.
divided by the sum of precision and recall. F1 assumes a high
value only when precision and recall are both high.                  Focusing on user’s interest gives the satisfied results since the
                                                                     information offered to the users is based on historical data and
 Table 3. Average precision and recall of click recommended          statistical analysis. The advantages of Reinforcement Learning
        trips by user before and after system learning               Algorithm is due to its simplicity, quickness and easy to
                                                                     implement. Since there is no need to find the best travel list but
                          Unlearn          After learning            it provides the most appropriate information at the current time.
      Precision             0.34                0.50                 Comparing to the traditional manual system which takes longer
                                                                     time and needs a lot of user supports.
      Recall                0.50                0.65
                                                                     This prototype can be applied to business intelligent agent for
      F1                    0.40                0.57                 an e-Commerce. This agent can recommend interesting trips to
                                                                     target users by personalized marketing for new trip or product
                                                                     promotions. Enterprises can use this personalized or one to one
Accordingly, Table 3 depicts the effectiveness of the engine by      marketing to increase numbers of sales and services growth
comparing precision, recall and F1 values evaluated from user        through this channel.
click stream before and after learning. The precision is 0.34 for
the unlearned system (first access). After twenty four hours the     6. REFERENCES
system has been leaned by using Q learning, then users access
PTS for the second time. The precision for the second access         [1] Changchien, S.W., Chin-Feng, L. and Yu-Jung, H. On-line
has been increased to 0.50 (about 47.06%). This pattern is the       personalized sales promotion in electronic commerce, Expert
same for recall (0.50 for first access and 0.65 for second access)   Systems with Applications, 2004, 35–52.
and harmonic mean values (0.40 for first access and 0.57 for
                                                                     [2] Joachims, T., Freitag, D. and Mitchell, T. M. WebWatcher:
second access). Thus, the growth rate for both precision and
                                                                     A tour guide for the World Wide Web, Proceedings of
recall increase about 47% and 30%, respectively.
                                                                     International Joint Conference on Artificial Intelligence, 1997.
As well, Srikumar (2004) studied on personalized product             770-775.
selection of user behaviors on the Internet. System performance
                                                                     [3] Seo, Y. W. and Zhang, B. T. Personalized Web-Document
has been evaluated by using recall which is about 0.64. The
                                                                     Filtering Using Reinforcement Learning, Applied Artificial
recall for Srikumar’s system is close to PTS’s which is about
                                                                     Intelligence, 2001, 665-685.
0.65. Unfortunately, the former study used only one dimension
measurement, recall. So it can not conclude that among the two       [4] Srikumar, K., Bhasker B. Personalized Product Selection
studies which personalisation systems has better performance in      in Internet Business. Journal of Electronic Commerce Research.
terms of both precisions and recalls.                                (5), 2004, 216–227.
                                                                     [5] Sutton, R.S. and Barto, A.G. Reinforcement Learning: An In
5. CONCLUSIONS                                                       troduction, MIT Press, Cambridge, 1998.
In this study, the personalized support system that recommends       [6] Yuan, S. T. A personalized and integrative comparison-
trips for tourists based on user behaviors and group properties      shopping engine and its applications, Decision Support
has been proposed. The system starts learning from user profile,     Systems, 2003, 139-156.
trip database and user historical transactions in accessing PTS
web sites. The learning process is using a Q-learning equation       [7] Weng, S. and Liu M. Feature-based Recommendations for
which is based on the reinforcement theory. The main concept         one-to-one marketing. Expert Systems with Application, 26,
of the system is that users can surf on the PTS web site to find     2004, 493 – 508.
out interesting trips. Then the top five trips are suggested for
users after all candidate trips are ranked in terms of multiple
criteria, these trips may be dynamically changed according to
user behavior on PTS sites. Results show that both precision
and recall of the system had been improved after the system had
learned from user transactions and databases. With
recommended trips based on significant data of user surfing