=Paper= {{Paper |id=Vol-3177/paper17 |storemode=property |title=Inverse Reinforcement Learning and Point of Interest Recommendations |pdfUrl=https://ceur-ws.org/Vol-3177/paper17.pdf |volume=Vol-3177 |authors=David Massimo,Francesco Ricci |dblpUrl=https://dblp.org/rec/conf/iir/Massimo022 }} ==Inverse Reinforcement Learning and Point of Interest Recommendations== https://ceur-ws.org/Vol-3177/paper17.pdf
Inverse Reinforcement Learning and Point of Interest
Recommendations⋆
Discussion Paper

David Massimo, Francesco Ricci
Free University of Bozen-Bolzano, Italy


                                         Abstract
                                         We here focus on Points of Interest (POIs) Recommender Systems (RSs), aimed at helping users visiting
                                         a city to discover new and relevant POIs. RSs are often assessed in offline settings, hence, measuring
                                         the system’s precision in predicting previously observed user behaviour. However, when deployed, the
                                         system produced recommendations are often of limited use, because they lack novelty. We conjecture
                                         that this phenomenon is primarily due to the limited capability of RSs in extracting from the observed
                                         behaviour general characteristics of POIs that are relevant for different classes of users (tourist types).
                                         We compare an Inverse Reinforcement Learning (IRL) based RS algorithm with more traditional Nearest
                                         Neighbour and Popularity-based ones. Through an offline evaluation, we show that the nearest neighbour
                                         and popularity-based RSs excel in precision (offline) and are perceived as not novel by users of a live-user
                                         study. On the contrary, despite a lower offline precision, the IRL-based RS, which learns the preferences
                                         of tourists for POIs characteristics, can give a better support to a tourist.

                                         Keywords
                                         recommender systems, tourism, user behaviour learning, evaluation




1. Introduction
This paper focuses on Recommender Systems (RSs) in the tourism domain and summarises the
results of Massimo and Ricci on Points of Interest (POIs) recommendation [1].
    In [2] is introduced a next POI-visit recommendation strategy, named Q - B A S E , which uses
Inverse Reinforcement Learning. It consists of three steps. Firstly, it identifies different tourist
clusters based on POI-visit sequence observations. Then, by analysing the sequential consump-
tion pattern of clustered users’ POI visits, it learns their preferences for POI features and the
reward a (generic) user belonging to a cluster obtains in conducting certain POI visits. Finally,
Q - B A S E computes the state action-value function 𝑄 which tells how much total reward the user
will gain if she selects to visit any POI and keeps choosing POIs according to the optimal
visit selection policy which is optimal for their cluster. In an offline evaluation [2] Q - B A S E has
been compared to the sequence aware recommendation strategy S K N N [3]. Given a set of users’
POI-visit sequences and a target user (partial) POI-visit sequence, S K N N recommends to the target

IIR2022: 12th Italian Information Retrieval Workshop, June 29 - July 1, 2022, Milan, Italy
⋆
    This paper presents some of the results published in Massimo, D., Ricci, F. Popularity, novelty and relevance in
    point of interest recommendation: an experimental analysis. Inf Technol Tourism 23, 473–508 (2021).
Envelope-Open davmassimo@unibz.it (D. Massimo); fricci@unibz.it (F. Ricci)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
user next POI visits, which are prevalent in the POI visit sequences of the most similar users.
An offline evaluation showed that S K N N offers recommendations that more precisely predict the
expected user behaviour, but are not novel. Besides, Q - B A S E excelled in suggesting novel POI
visits that are also more rewarding than S K N N . This comes at the cost of a lower precision. It was
conjectured that in an online RS, when Q - B A S E is again compared to S K N N , its recommendations
will be perceived as relevant because they contain novel POIs with a larger reward. Besides, it
was supposed that S K N N ’s higher precision is due to its bias in suggesting popular items. We
therefore investigated two research questions. (RQ1) If S K N N achieves higher precision by being
biased towards popular items, can Q - B A S E be modified, by biasing its recommendations towards
more popular items, to achieve a similar precision of S K N N ? (RQ2) Will online users like the
precise recommendations of S K N N more than those generated by Q - B A S E , which are more novel
and yet relevant?
    In order to investigate these questions we also proposed a novel and adaptable RS, called Q - P O P
P U S H , derived from Q - B A S E and that generates recommendations that simultaneously optimise
two criteria: the reward of the recommendations (as for Q - B A S E ) but also their popularity. Q -
P O P P U S H computes the harmonic means of the scores given by Q - B A S E to a POI visit and the
popularity of the POI (visit occurrences in the observed data). To weigh the relative importance
of Q - B A S E original score and the popularity-based score, Q - P O P P U S H uses a parameter 𝛼. With
𝛼 > 0.5 the popularity of the POI-visit has a higher importance, whereas with 𝛼 < 0.5 the Q - B A S E
component is weighted more. Equal importance is given with 𝛼 = 0.5. We have tested Q-POP
PUSH in an offline experiment by comparing its performance with Q - B A S E , a popularity baseline
and two Nearest Neighbour next-item RSs: S K N N and s - S K N N [3, 4]. Finally, we have assessed the
user perception of the recommendations generated by the best performing (offline) models in a
user study.
    The rest of this paper summarises the offline experiment, the user study and concludes with
a discussion.


2. Experimental analysis
The offline experiment was conducted along the classical Machine Learning evaluation approach,
with train and test sets [5, 6, 7]. The train set is used to train the models 1 and the test set is used
for recommendations generation and evaluation. In particular, 70% of a test POI-visit sequence
is used to identify the recommendations and the remaining 30% to assess the recommendation
performance. We used a dataset of 1663 POI-visit sequences (over 500 POIs) in the Italian city of
Florence, done by 1163 anonymous users. The metrics used to assess Q - B A S E , Q - P O P P U S H , S K N N ,
s - S K N N and the popularity baseline POP are: reward as in [2]; popularity intended as a proxy of
novelty; precision; and similarity of a recommendation list to the list generated by S K N N . Since
S K N N is very precise but also affected by a popularity bias, it is interesting to measure how much
Q - B A S E and Q - P O P P U S H deviates from S K N N . All the metrics range in [0, 1], where values close
to 0 (1) indicate a low (high) performance. For more details about procedures and metrics, we
refer to [1].


1
    Nearest neighbour models and POP do not need any training, but use train data at test time.
  In Table 1 we report the metric values for Top-5 recommendations, obtained in the offline
evaluation. S K N N and s - S K N N perform essentially the same on the reward, precision and popularity

Table 1
Performance (top-5) of the considered RSs. Best value for each metric is shown in boldface.

                   Q-BASE                       Q-POP PUSH                        SKNN      s-SKNN      POP
                               𝛼 = 0.009     𝛼 = 0.1     𝛼 = 0.5     𝛼 = 0.8
   Reward          0.032*        0.020         -0.001      -0.009     -0.009     -0.010      -0.010    -0.015
   Precision        0.045        0.062          0.063       0.060      0.060     0.068        0.063     0.050
   Popularity      0.319*        0.517          0.634       0.643      0.643      0.528       0.570     0.733
   SimKNN           0.061        0.307          0.441       0.451      0.450          -      0.530      0.352
* There is a significant difference (𝑝 < 0.05) between the best performing model and S K N N (two-tailed
    paired t-test). The test is run for the metrics Rew, Prec and Pop for 5 repeated train-test split.

metrics. Not surprisingly, s - S K N N produces, on average, recommendations lists that overlap up
to 53% to the list of SKNN (SimKNN ). When comparing SKNN to POP, we note that the overlap
of the recommendations is quite low, but still, POP has a reasonably good precision, considering
the simplicity of the approach. Q - B A S E suggests much less popular items than SKNN and also
with higher reward.
   The analysis of Q - P O P P U S H performance allows to address the research question RQ1. By intro-
ducing a popularity bias to Q - B A S E , as it is done in Q - P O P P U S H , the generated recommendations
become more similar to S K N N . In fact, when 𝛼 is increased, the popularity of the recommendations
produced by Q - P O P P U S H rises, it reaches and passes that of S K N N . However, it is interesting to
note that with a rather small value of 𝛼 = 0.009 Q - P O P P U S H obtains a precision of 0.062, which
is very close to the precision of S K N N (0.068) and s - S K N N (0.063). With this setting Q - P O P P U S H has
still a positive reward (0.02), while S K N N and s - S K N N have negative rewards. This means that a
small popularity bias in Q - B A S E could be beneficial to improve the system’s precision. We must
also note that with larger values of 𝛼 the reward of Q - P O P P U S H is becoming smaller and smaller
and approaching that of the two nearest neighbour methods. Clearly, Q - P O P P U S H can be tuned
to balance two objectives: precision and reward.
   To answer the second research question RQ2 we have designed a live user study aimed at
measuring the user’s perceived novelty and appreciation of the recommendations generated
by Q - B A S E , Q - P O P P U S H , with parameter 𝛼 = 0.5 (to give equal importance to POIs’ popularity
and reward) and the S K N N baseline. We have developed an online system to assess the quality of
next-POI recommendations offered to users who have already visited some POIs. The system
initially asks to declare visited POIs, which are used to create a hypothetical itinerary the user is
supposed to have already visited. Then, the system generates next-POI recommendations with
the three tested RS algorithms, combines them in a unique list, and asks the user to evaluate them
based on a description of the recommended items. The user does not know which algorithm
recommends the displayed recommendations. The training data of the online system is the
same we have used in the offline study. The user evaluates each POI by marking it with one or
more of the following labels: “I already visited it”, “I like it” for a next visit and “I didn’t know
it”. The online user study participants were recruited via social media and mailing lists. Out of
202 users that accessed the application, we identified 158 reliable recommendation sessions.

Table 2
Probability to evaluate a POI recommendation as visited, novel, liked and both novel and liked.
    Recommender System                     Visited   Novel      Liked              Liked & Novel
    Q-BASE                                 0.165*    0.517*     0.361*                  0.091
    Q-POP PUSH                             0.245     0.376      0.464                   0.076
    SKNN                                   0.238     0.371      0.466                   0.082
    * indicates significant difference from the other two RSs perf. (two proportion z-test, 𝑝 < 0.05)

   Table 2 shows the estimated probability that a user marks as “visited”, “novel”, “liked” or
both “liked” and “novel” a POI recommended by a specific RS. Q - B A S E recommends POIs that
are less likely to have been already visited by the user and more likely to be novel than those
suggested by Q - P O P P U S H and S K N N . As in the offline experiment, Q - P O P P U S H and S K N N performs
similarly. Interestingly, Q - B A S E suggests fewer POIs that are liked when compared to the other
two strategies. Hence, apparently a more precise RS, based on an offline test, also recommends
online items that the user will like more. Besides, the obtained results falsify our hypothesis
that optimising the reward of a recommendation, as Q - B A S E do, will produce recommendations
that the user will like more. However, Q - B A S E suggests more novel POIs and, interestingly, more
recommendations that are both liked and novel (last column in Table 2).
   In a successive analysis we computed the probability that a user will like a recommendation
given the following three conditions: she knows the POI but has not yet visited it; she has
already seen it; the item is novel (unknown). We have derived the following conclusions. The
users liked more the novel POI-visit suggestions generated by S K N N and Q - P O P P U S H than those
produced by Q - B A S E . This is due to the tendency of Q - B A S E to suggest POI visits that even if
they have the properties typically liked by the user, e.g., they are of the same type of the POIs
liked by the user, they are also not popular POIs. Hence, these suggested POIs are hard to be
appreciated. The conclusion we derived from the user study is that users tend to like more the
items they are familiar with, e.g., previously visited items or items that are not novel.


3. Discussion
The results of our experiments seem to confirm that RSs that precisely predict the user choices
(offline) are also liked most by real users. In our case, this means that Q - P O P P U S H and S K N N are
better RSs than Q - B A S E . Our explanation of this result is that both high offline precision and
large probability of liked recommendations (online) are influenced by the popularity of the
recommended items. In fact, these popular items are often in the users’ test sets, and users are
likely to be familiar with them.
   Despite its lower offline precision performance and a lower extent of liked recommendations
in the user study, Q - B A S E is the RS that may better accomplish the true goal of a tourism RS: it
suggests more next-POIs that are both liked and novel. So, by optimising the reward Q - B A S E
is capable of discovering novel items that are also appreciated (when users are able to assess
them).
References
[1] D. Massimo, F. Ricci, Popularity, novelty and relevance in point of interest recommendation:
    an experimental analysis, J. Inf. Technol. Tour. 23 (2021) 473–508. URL: https://doi.org/10.
    1007/s40558-021-00214-5. doi:1 0 . 1 0 0 7 / s 4 0 5 5 8 - 0 2 1 - 0 0 2 1 4 - 5 .
[2] D. Massimo, F. Ricci, Harnessing a generalised user behaviour model for next-poi recom-
    mendation, in: S. Pera, M. D. Ekstrand, X. Amatriain, J. O’Donovan (Eds.), Proceedings of
    the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada,
    October 2-7, 2018, ACM, 2018, pp. 402–406.
[3] D. Jannach, I. Kamehkhosh, L. Lerche, Leveraging multi-dimensional user models for
    personalized next-track music recommendation, in: A. Seffah, B. Penzenstadler, C. Alves,
    X. Peng (Eds.), Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech,
    Morocco, April 3-7, 2017, ACM, 2017, pp. 1635–1642. URL: https://doi.org/10.1145/3019612.
    3019756. doi:1 0 . 1 1 4 5 / 3 0 1 9 6 1 2 . 3 0 1 9 7 5 6 .
[4] M. Ludewig, D. Jannach, Evaluation of session-based recommendation algorithms, User
    Model. User-Adapt. Interact. 28 (2018) 331–390.
[5] A. Bellogín, A. Said, Recommender Systems Evaluation, Springer New York, New York, NY,
    2018, pp. 2095–2112. URL: https://doi.org/10.1007/978-1-4939-7131-2_110162. doi:1 0 . 1 0 0 7 /
    978- 1- 4939- 7131- 2_110162.
[6] J. L. Herlocker, J. A. Konstan, L. G. Terveen, J. T. Riedl, Evaluating collaborative filtering
    recommender systems, ACM Transactions on Information Systems (TOIS) 22 (2004) 5–53.
[7] P. Cremonesi, Y. Koren, R. Turrin, Performance of recommender algorithms on top-n
    recommendation tasks, in: Proceedings of the Fourth ACM Conference on Recommender
    Systems, RecSys ’10, Association for Computing Machinery, New York, NY, USA, 2010, p.
    39–46. URL: https://doi.org/10.1145/1864708.1864721. doi:1 0 . 1 1 4 5 / 1 8 6 4 7 0 8 . 1 8 6 4 7 2 1 .