RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                             1


                    Users’ Evaluation of Next-POI Recommendations
                           David Massimo                                                             Francesco Ricci
                         damassimo@unibz.it                                                           fricci@unibz.it
                      Free University of Bolzano                                                Free University of Bolzano
                                Italy                                                                       Italy

ABSTRACT                                                                       tourist may prefer to relax in a park on a sunny day while to visit a
The performance of a Recommender System (RS) is often assessed                 museum when it is raining. In order to address this type of requests,
offline, by measuring the system accuracy in predicting or recon-              Context-Aware RSs (CARS) have been developed [1]. Moreover,
structing the observed user ratings or choices. As a consequence,              since individuals typically consume more than a service or perform
RSs optimised for that performance measure may suggest items                   more than one activity in a single visit to a destination, session-
that the user would evaluate correct but uninteresting, because                and sequence-aware recommender systems have been introduced
lacking novelty. In fact, these systems are hardly able to generalise          [10]. In tourism applications these methods are used to implement
the preferences directly derived from the user’s observed behaviour.           next-POI (Point of Interest) recommendation: recommendations
To overcome this problem a novel RS approach has been proposed.                for significant places that the user may be interested to visit next,
It applies clustering to users’ observed sequences of choices in or-           i.e., after she has visited already some other places (the same day
der to identify like-behaving users and to learn a user behavioural            or previously).
model for each cluster. It then leverages the learned behaviour                    In a previous research we developed a novel context-aware rec-
model to generate novel and relevant recommendations, not di-                  ommendation technology (here called Q-BASE) for suggesting a
rectly the users’ predicted choices. In this paper we assess in a              sequence of items after the users has already experienced some of
live user study how users evaluate recommendations produced by                 them. It models with a reward function the “satisfaction” that a
more traditional approaches and the proposed one along differ-                 Point of Interest, with some given features, provides to a user [8, 9].
ent dimensions. The obtained results illustrate the differences of             This technique learns the reward function by using only the obser-
the compared approaches, the benefits and the limitations of the               vation of the users’ sequences of visited POIs. This is an important
proposed RS.                                                                   advantage, since typically in on-line systems users scarcely provide
                                                                               feedback on the used services or the visited places. The reward
CCS CONCEPTS                                                                   function is estimated by Inverse Reinforcement Learning (IRL), a
                                                                               behaviour learning approach that is widely used in automation and
• Information systems → Recommender systems; • Human-
                                                                               behavioural economics [3, 5]. Moreover, since it is hard to have at
centered computing → User studies.
                                                                               disposal the full knowledge, or a huge part of the user history of
                                                                               travel related choices, which would be needed to learn the reward
KEYWORDS                                                                       function of a single individual, in [8, 9] IRL is instead applied to
recommender systems, inverse reinforcement learning, clustering,               clusters of users, and a single learned reward function is therefore
user study                                                                     shared by all the users in a cluster. For this reason we say that the
                                                                               system has learned a generalised, one per cluster, tourist behaviour
1    INTRODUCTION                                                              model, which identifies the action (POI visit) that a user in a cluster
The tourism industry grounds on fulfilling the needs, e.g., accom-             should try next. We studied the proposed approach and compared
modation and transportation, of people when moving to a place, for             it with popular baseline algorithms for next-item recommendation
leisure or business purposes [14]. In this industry companies offer            [4, 7]. In an offline analysis we have shown that a session-based
online to tourists a wide spectrum of services and activities, such as,        nearest neighbour algorithm (SKNN ) generates more precise rec-
city tours, accommodations and food services [15]. However, often              ommendations while Q-BASE, our technique, suggests POIs that are
the set of available options is so rich that choosing suitable ones            more novel and higher in reward. Hence, we conjectured that, in a
can be overwhelming. In order to address this problem, ICT practi-             real scenario, the latter recommendations may be more satisfying
tioners and far-sighted industries started to develop and employ               for the user.
ad-hoc RSs techniques. Nowadays, the business of companies such                    In this paper we want to verify that hypothesis, i.e, that users
as Expedia1 , Booking2 and Kayak3 is rooted on recommendation                  like more the recommendations produced by Q-BASE. Moreover,
technologies.                                                                  we conjecture that an important difference between the Q-BASE
   In fact, recommender systems are software tools that aim at                 method and those based on SKNN relies in the “popularity bias”:
easing human decision making [16]. In the tourism domain some                  SKNN tends to recommend items that have been chosen often by
special dimensions of the recommendation process play an impor-                the observed users, while Q-BASE is not influenced directly by
tant role. First of all, the demand of activities that a tourist may ask       the popularity of the items, but rather by the popularity of their
varies in the type and quantity in different contexts. For instance, a         features. Hence, we introduce here two novel hybrid algorithms
                                                                               that are based on Q-BASE, but they deviate from Q-BASE by using
1 www.expedia.com                                                              a popularity score: more popular items tend to be recommended
2 www.booking.com
3 www.kayak.com
                                                                               more. These two hybrid algorithms are called Q-POP COMBINED


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                           2


and Q-POP PUSH. They both combine (in a slightly different way)                    • An offline study where we show that the proposed hybrid
the item score derived from the reward of the item with a score                      models can obtain precisions similar to those obtained by
derived from the item popularity in the users’ behaviour data set:                   SKNN and s-SKNN.
more often chosen items (popular) receive a larger score. The items                • In a user study we show that when the precision of an al-
with the largest combined scores are recommended.                                    gorithms is estimated by leveraging the real user feedback
    We have here repeated the offline analysis of the original Q-BASE                as ground truth, rather than by using the standard ML fic-
algorithm and compared its performance with the performance                          tional splitting of train/test, Q-BASE performs better than
of the two above-mentioned hybrid algorithms, and of two kNN                         SKNN and Q-POP PUSH in recommending novel items that
algorithms: SKNN that recommends next-item to a user by con-                         are liked by the user but it is not better in recommending
sidering her current session (e.g., visit trajectory) and seeking for                generic items that are liked.
similar sessions in the dataset; and sequential session-based kNN
(s-SKNN ) that leverages a linear decay function to weight more                  The paper structure is as follows. In Section 2 the most related
in the prediction formula the neighbor trajectories that contain               works are presented. Then, Section 3 describes how the original IRL-
the user’s last selected item. Repeating the offline analysis was              based recommendations are generated [8] and introduces two IRL-
necessary to validate the conjecture that a significant performance            based hybrid models. Then, we show how the proposed algorithms
difference between Q-BASE- and the SKNN - based models is due to               compares offline against: the original IRL-based model and the
the popularity bias of KNN methods. We measure the algorithms                  KNN baselines. Section 5 introduces the system developed for the
offline performance in terms of reward, precision and novelty as it            user evaluation and the evaluation procedure. Then, we present the
was done in [8]. Moreover, we investigate the effect of the above              evaluation results. Finally, in Section 7 the conclusion and future
mentioned hybridization of Q-BASE; whether this approach can                   works of this study are discussed.
generate recommendations similar to those computed by SKNN. To
this end, we compare the Jaccard similarity, of the recommenda-
tions (sets) produced by Q-BASE and the hybrid variants, with the              2    RELATED WORK
recommendations produced by SKNN.                                              Our research is focussed on behaviour learning and recommender
    The results of the offline evaluation confirm our conjecture: hy-          systems that leverage such behaviour models. Our application sce-
bridizing Q-BASE with item popularity, although it reduces novelty,            nario is tourism: the goal is to support tourists in identifying what
it increases (offline) precision, aproaching the precision of SKNN.            POI they could visit next, given their current location and the in-
Moreover, we show that Q-POP COMBINED can still achieve a high                 formation about their past visited places.
reward, whereas Q-POP PUSH looses some reward but obtains the                     Processing and analysing sequences of actions in order to un-
same precision of SKNN. It is worth noting that as the precision of            derstand the user behaviour to support human decision-making
the proposed hybrid models increase, more and more their produced              has been already explored in previous research. In [10] is proposed
recommendations overlap with those generated by SKNN.                          a framework for online experience personalization that leverages
    The second major contribution discussed in this paper is an inter-         users interactions (e.g., clicks) in the form of a sequence. The ap-
active online system aimed at assessing with real users the novelty            proach is based on pattern mining techniques in order to identify
of and the user satisfaction for the recommendations generated by:             candidate items, which are present in other users’ sequences, that
the original Q-BASE model, one of the two hybrid models (Q-POP                 are suitable for recommendations. Another pattern-discovery ap-
PUSH ) and the same SKNN baseline used in the previously con-                  proach applied to tourism is presented in [13]. Here, the authors
ducted offline studies. In the online system the users can enter the           propose a RS that identifies next-POI to visit relying on users’
set of POI that they previously visited (in Florence) and can receive          check-in sequences data. At first, a directed graph is built from the
suggestions for next POIs to visit.                                            check-in data and then it is used to identify neighbours of a target
    By analysing the users evaluations of the POIs recommended                 user given her check-in data. When neighbours are identified, the
in the online test, we found a confirmation that Q-BASE suggests               POIs in their check-in data are scored. The recommended POI is
more novel items while SKNN, as well as the proposed hybrid                    the one with the maximal score.
model Q-POP PUSH, offers suggestions that the users like more.                    Other, more general, pattern-discovery methods are described in
We conjecture that, since many items suggested by Q-BASE are                   [4, 7]. Here the authors present nearest neighbour RS approaches
novel for the users, they are difficult to be evaluated (and liked).           that leverage user behaviour logs: session-based KNN (SKNN) and
We further analyse this aspect by considering recommended items                sequence-aware SKNN (s-SKNN). SKNN seeks for similar users in
that have been evaluated as “liked and novel” by the users. The                the system stored logs and identifies the next-item to be recom-
results show that Q-BASE is better than SKNN and Q-POP PUSH                    mended given the current user log (session). The s-SKNN weights
in suggesting novel and relevant items, which we believe is the                more weight the neighbours sessions containing the most recent
primary goal of a recommender system.                                          (observed) items of the target user sequence. These methods have
    In conclusion, in this paper we extend the state of the art in             been applied to different next-item recommendation tasks showing
next-POI recommender system with the following contributions:                  good performance.
                                                                                  The common aspect of pattern-discovery approaches is that they
    • Two novel models, Q-POP COMBINED and Q-POP PUSH, that                    extract common patterns from user behaviour logs and then learn
      hybridize the IRL model presented in [8] with a score derived            a predictive model for the next most likely observed user action.
      from item popularity.                                                    That said, these approaches are opaque in explaining the predicted


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                    3


user behaviour, i.e., users’ preferences and their action-selection           visit Giardino di Boboli (action a 1 ) in the afternoon can arrive to the
policy.                                                                       desired POI with either a rainy weather (state s 2 ) or a clear sky (state
   To fulfil the need of learning an explainable user behavioural             s 3 ). The transition probabilities may be equal, T (s 2 , a 1 |s 1 ) = 0.5 and
model imitation learning is a viable solution. It is typically addressed      T (s 3 , a 1 |s1) = 0.5. The function r : S → R models the reward a
by solving Markov Decision Problems (MDP) via Inverse Reinforce-              user obtains from visiting a state. This function is unknown and
ment Learning (IRL)[12]. Given a demonstrated behaviour (e.g., user           must be learnt. We take the restrictive assumption that we do not
actions sequences) IRL models solve the target MDP by computing               know the reward the user receives from visiting a POI (the user is
a reward (utility) function that makes the behaviour induced by a             not supposed to reveal it). But, we assume that if the user visits a
policy (the learning objective) close to the demonstrated behaviour.          POI and not another (nearby) one then this signals that the first
In [21] the authors developed an IRL approach based on the prin-              POI gives her a larger reward than the second. Finally, γ ∈ [0, 1] is
ciple of maximum entropy that is applied in the scenario of road              used to measure how future rewards are discounted with respect
navigation. The approach is based on a probabilistic method that              to immediate ones.
identifies a choice distribution over decision sequences (i.e., driving
decisions) that matches the reward obtained by the demonstrated                3.2    User Behavior Learning
behaviour. This technique is useful to model route preferences as
                                                                              Given a MDP, our goal is to find a policy π ∗ : S → A that maximises
well as to infer destinations based on partial trajectories. In [3] the
                                                                              the cumulative reward that the decision maker obtains by acting
authors propose an IRL-based solution to the problem of learning
                                                                              according to π ∗ (optimal policy). The value of taking a specific
a user behaviour at scale. The application scenario is migratory
                                                                              action a in state s under the policy π , is computed as Q π (s, a) =
pastoralism, where learning involves spatio-temporal preferences
                                                                              Es,a, π [ k∞=0 γ k r (sk )], i.e., it is the expected discounted cumulative
                                                                                         P
and the target reward function represents the net income of the
                                                                              reward obtained from a in state s and then following the policy π .
economic activity. Similarly, in [5] it is proposed a method for com-
                                                                              The optimal policy π ∗ dictates to a user in state s to perform the
puting the reward humans get by their movements decisions. The
                                                                              action that maximizes Q. The problem of computing the optimal
paper presents a tractable econometric model of optimal migration,
                                                                              policy for a MDP is solved by reinforcement learning algorithms
focusing on expected income as the main economic influence on
                                                                              [18].
migration. The model covers optimal sequences of location deci-
                                                                                  We denote with ζu a user u trajectory, which is a temporally
sions and allows for many alternative location choices. All these
                                                                              ordered list of states (POI-visits). For instance, ζu1 = (s 10 , s 5 , s 15 )
works, focus on designing a choice model without studying their
                                                                              represent a user u 1 trajectory starting from state s 10 , moving to s 5
application to RSs.
                                                                              and ending to s 15 . With Z we represent the set of all the observed
   In this work we present two variants of the IRL-based recom-
                                                                              users’ trajectories which can be used to estimate the probabilities
mender system presented in [8]. There is proposed a RS that first
                                                                              T (s ′ |s, a).
learns users behaviour via IRL and then harnesses it to generate
                                                                                  Since, typically users of a recommender system scarcely pro-
next-item recommendations. In an offline evaluation we showed
                                                                              vide feedback on the consumed items (visited POIs), the reward a
that the approach excels in novelty and reward, whereas, more
                                                                              user gets by consuming an item is not known. Therefore, the MDP,
precise recommendations are generated by SKNN-based techniques.
                                                                              which is essential to compute the user policy, cannot be solved by
In this paper we argue that the ability of pattern-discovery meth-
                                                                              standard Reinforcement Learning techniques. Instead, by having
ods to score high in precision is related to the fact that they are
                                                                              at disposal only the set of POI-visit observations of a user (i.e., the
discriminative and are influenced by the observed popularity of
                                                                              users’ trajectories), a MDP for each user could be solved via Inverse
the items in the training data. Therefore, in order to leverage item
                                                                              Reinforcement Learning (IRL) [12]. In particular, IRL enables to
popularity also in an IRL model, we extend the it by hybridizing its
                                                                              learn a reward function whose optimal policy (the learning objec-
scoring function (Q function) with item popularity.
                                                                              tive) dictates actions close to the demonstrated behavior (the user
                                                                              trajectory). In this work we have used Maximum likelihood IRL [2].
3 RECOMMENDATION TECHNIQUES
3.1 User Behavior Modelling                                                    3.3    Clustering Users with Similar Behavior
In this paper, user (tourist) behaviour modelling is based on Markov           Having the knowledge of the full user history of travel related
Decision Processes (MDP). A MDP is defined by a tuple (S, A,T , r , γ ).       choices, which would be needed to learn the reward function of a
S is the state space and, in our scenario, a state models the visit            single individual, is generally hard to obtain. Therefore, IRL is here
to a POI in a specific context. The contextual dimensions are: the             applied to clusters of users (trajectories) [8, 9]. This allows to learn
weather (visiting a POI during a sunny, rainy or windy time); the day          a reward function that is shared by all the users in a cluster. Hence,
time (morning, afternoon or evening); and the visit temperature                we say that the system has learned a generalized tourist behavior
conditions (warm or cold). A is the action space; in our case it               model, which identifies the action (POI visit) that a user in a cluster
represents the decisions to move to a POI. Hence, POIs and actions             should try next.
are in biunivocal relation. A user that is in a specific POI and context          Clustering the users’ trajectories is done by grouping them ac-
can reach all the other POIs in a new context. T is a finite set of            cording to a common semantic structure that can explain the re-
probabilities. T (s ′ |s, a) is the probability to make a transition from      sulting clusters. This is accomplished by employing Non Negative
state s to s ′ when action a is performed. For example, a user that            Matrix Factorization (NMF) [6]. NMF extracts topics, i.e., lists of
visits Museo del Bargello in a sunny morning (state s 1 ) and wants to         words, that describe groups of documents. Therefore, in order to


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                              4


apply NMF, we build a document-like representation of a user tra-                 SKNN [4] recommends the next-item (visit action) to a user by
jectory that is based on the features (terms) that describe the states         considering her current session (trajectory) and seeking for similar
visited in a trajectory. Hence, a document-like representation is              sessions (neighbourhood) in the data-set. The neighbourhood, i.e.,
build for each trajectory in the set Z .                                       the closest trajectories to the current trajectory, are obtained by
                                                                               computing the binary cosine similarity between the current trajec-
3.4    Recommending Next-POI visits                                            tory ζ and those in the dataset ζi : c (ζ , ζi ). Given a set of nearest
Here we propose two new next-POI recommendations techniques,                   neighbours N ζ the score of a visit action a can be computed as:
Q-POP COMBINED and Q-POP PUSH, that extend the pure IRL-
                                                                                                                 X
                                                                                             scoresknn (a, ζ ) =        c (ζ , ζn )1ζn (a)
based Q-BASE model, already introduced in [8] (where it was called
                                                                                                                  ζ n ∈N ζ
CBR).
                                                                              With 1ζn we denote the indicator function: it is 1 if the POI selected
  Q-BASE. The behavior model of the cluster the user belongs to               by action a appears in the neighbour trajectory ζn (0, otherwise). In
is used to suggest the optimal action this user should take next,             our data set we cross validated the optimal number of neighbours,
after the last visited POI. The optimal action is the action with the         and this number is close to the full cardinality of the data set. The
highest Q value in the user current state [8].                                recommended actions are those with the highest scores.
                                                                                 s-SKNN [7] extends SKNN by employing a linear decay func-
 Q-POP COMBINED. In order to recommend more popular items,                    tion w ζ to weight more in the prediction formula the neighbor
we propose to hybridise the generalized tourist behavior model                trajectories that contain the user’s last observed visit action and
learnt for the cluster to which the user belongs to with the item             less the earlier visits. The current user trajectory’s neighborhood is
popularity. In particular, given the current state s of a user, for           obtained as in SKNN, while the computation of the score of a visit
each possible POI-visit action a that the user can make, we apply             action is as following:
                                             Q (s,a)
the following transformation Q ′ (s, a) = |A|        and then we
                                                                                                                X
                                               Σi Q (s,a i )                            scores–sknn (a, ζ ) =         w ζ (a)c (ζ , ζn )1ζn (a)
multiply Q ′ (s, a) by the probability that a POI appears in a user                                            ζ n ∈N ζ
trajectory (in a given data set Z ). The result of the multiplication          For instance, let us say that a 3 is the third observed visit action
is a distribution that is used to sample the next-POI visit action             in the user trajectory ζ (where |ζ | = 5) and that a 3 appears in the
recommended to the user.                                                       trajectory ζn ∈ N ζ , then the weight defined by the decay function
   Sampling from a distribution derived from functions composition             is w ζn = 3/5. Also for s-SKNN, the recommended actions are those
is widely done in simulation [17]. The approach tries to simulate              with the highest scores.
the decision making process of a user that has all the elements to
decide how to act next, i.e., she knows the reward of her future
                                                                               4.2    Evaluation Metrics
action (the Q values), but she is also biased to select popular items.
We conjecture that this method recommends more popular items                   The evaluation metrics used to assess the algorithm performance are
that have a large reward as well.                                              reward, as defined in [8], precision, novelty and recommendations
                                                                               similarity. Let us denote with Recu,s a list of recommendations for
  Q-POP PUSH. The second hybrid recommendation method in-                      the user u in state s, and ao the observed (next) POI-visit (test item).
troduces even a higher popularity bias to the recommendations                  Reward measures the average increase in reward that the recom-
generated by Q-BASE. We conjecture that it can obtain even a better            mended actions give compared to the observed one:
precision than Q-POP COMBINED, closer to the precision of the
                                                                                                               X
                                                                                   reward (Recu,s , ao ) = (         Q (s, a) − Q (s, ao ))/|Recu,s |
SKNN -based methods. Q-POP PUSH scores the visit action a in state                                          a ∈Recu,s
s as following:
                                                                               Novelty estimates how unpopular are the recommended visit ac-
                                  2     Q (s, a) · pop(a)                      tions and ranges in [0, 1]. A POI is assumed to be unpopular if its
            score (s, a) = (1 + β )
                                    (Q (s, a) + pop(a) · β 2 )                 visits count is lower than the median of this variable in the training
                                                                               set. Let U be the set of unpopular POIs and 1U (a) its indicator
This is the harmonic mean of Q (s, a) and pop(a), which is the scaled
                                                                               function (it is 1 if a ∈ U and 0 otherwise), novelty is defined as
(i.e., min-max scaling) counts c Z (a) (in the data set Z ) of the occur-
                                                                               follows:
rences of the POI-visit corresponding to action a. The harmonic
                                                                                                                      a ∈Recu, s 1U (a)
                                                                                                                     P
mean is widely used in information retrieval to compute the F1-                                 novelty(Recu,s ) =
                                                                                                                         |Recu,s |
score. In our case the parameter β was set to 1. The action recom-
mended to the user is the one with the highest score.                          Let obsu be the set of observed POI-visit actions in the user u
                                                                               trajectory (test set). The indicator function 1obsu (a) is 1 if a ∈ obsu
4 OFF-LINE ALGORITHM ANALYSIS                                                  and 0 otherwise. Precision is then computed as follows:
                                                                                                                   X
4.1 Baselines                                                                            precision(Recu,s ) = (          1obsu (a))/|Recu,s |
                                                                                                                a ∈Recu, s
We compare here the performance of the recommendations gener-
ated by the above mentioned methods with two nearest neighbor                  Finally, we estimate the Similarity of two lists of recommenda-
baselines: SKNN and s-SKNN.                                                    tions by computing their Jaccard index. In this study, we compute


        Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                               5


the Jaccard index of the recommendations generated by our pro-                              Table 1: Recommendation performance
posed methods and those generated by SKNN. The goal is to verify
whether the proposed hybrid methods, which recommend more                               Models   Q-BASE    Q-POP C    Q-POP P    SKNN     s-SKNN
popular items, improve some of the performances of the pure IRL                         Rew@1    0.073     0.023      -0.002     -0.007   -0.009
                                                                                        Prec@1   0.043     0.057      0.099      0.109    0.109
method Q-BASE and if they recommend items more similar to those                         Nov@1    0.061     0.029      0.000      0.000    0.000
recommended by SKNN.                                                                    Jacc@1   0.085     0.106      0.424      -        0.791
                                                                                        Rew@5    0.032     0.017      -0.009     -0.010   -0.010
4.3     Off-line Study Results                                                          Prec@5   0.045     0.049      0.060      0.068    0.063
                                                                                        Nov@5    0.122     0.040      0.000      0.000    0.000
In this study we used an extended version of the POI-visit data-set                     Jacc@5   0.061     0.063      0.192      -        0.530
presented in [11]. It consists of tourist trajectories reconstructed
from the public photo albums of users of the Flickr4 platform. From
the information about the GPS position and time of each single                  aforementioned three models, without being informed of which
photo in an album the corresponding Wikipedia page is queried                   algorithm recommends what. The data used by the system to train
(geo query) in order to identify the name of the related POI. The               the models and compute recommendations is the same of the offline
time information is used to order the POI sequence derived from an              study, a catalogue of 793 items.
album. In [9] the dataset has been extended by adding information
about the context of the visit (weather summary, temperature and
                                                                                5.1    Online Evaluation System
part of the day), as well as POI content information (historic period           The interaction with the system unfolds as follow: landing phase;
of the POI, POI type and related public figure). In this paper we used          introduction to the experiment and start up questions; preference
an extended version of the dataset that contains 1668 trajectories              elicitation phase; recommendation generation and evaluation.
and 793 POIs.                                                                       Once the user accesses the website she can select the language
   Trajectories clustering identified 5 different clusters, as in the           (Italian or English) and then, if the user accepts to participate to the
previous study. In Table 1 we report the performances of Top-                   experiment, she is asked whether has already been in Florence. If she
1 and Top-5 recommendations for the considered methods. We                      replies “no” the procedure ends. Otherwise, the user is considered
immediately observe that SKNN scores higher in precision, whereas               to have some experience of the city and can declare which POIs
Q-BASE suggests more novel and with higher reward items. These                  has already visited. In this case, the preference elicitation phase is
results confirm previous analysis [8, 9]. SKNN and s-SKNN perform               supported by a user interface (Figure 1) that enables the user to
very similarly, hence, in this data-set, the sequence-aware extension           select as many POIs she remembers to have visited in Florence. The
of SKNN seems not to offer any advantage.                                       selection can be performed in two non-exclusive modalities. The
   When comparing Q-POP COMBINED and Q-POP PUSH with the                        first one is a lookup bar with auto-completion, while the second
two SKNN -based methods we found that Q-POP COMBINED has a                      is a selection pane that contains the most popular 50 POIs. If the
good trade-off between reward and precision. In particular, reward              user hovers or taps on an item the system renders a media card
is 4 times (Top-1) the reward of both SKNN and s-SKNN while                     presenting content extracted from Wikipedia: a picture and a textual
precision increases considerably with respect to Q-BASE. The same               description. When the user selects a POI as visited, this is added
is observed for Top-5 recommendations. But novelty is penalised                 to an (editable) list. The selected POIs are meant to build a user
by the popularity bias of this method.                                          profile which is then used to identify the best representative user’s
   By looking at the performance of Q-POP PUSH we can confirm                   trajectory cluster, among the 5 clusters of previously collected
our study conjecture: a stronger popularity bias enables the algo-              training data (the details of this computation are explained in the
rithm to generate recommendations that are more precise and in                  next section).
particular the precision of Q-POP PUSH is equal to that of SKNN                     Then the system generates a short itinerary (5 POIs) composed
and s-SKNN. But, as expected, reward and novelty are penalised.                 by a small sample of the POIs that the users previously declared to
   With regard to the similarity (Jaccard index) of the recommenda-             have visited (Figure 2). This is the itinerary that the user is supposed
tions generated by the proposed methods with those of SKNN, we                  to have followed just before asking a recommendation for a new
can clearly see that the more the precision increases, the higher the           point to visit. We decided to generate a fictitious itinerary because
Jaccard index becomes. So, the methods are more precise as they                 we did not want to ask the user to remember any previous visit
are more similar to SKNN.                                                       itinerary, but we also tried to generate a trajectory that is likely
                                                                                to have been followed (by sampling among the POIs that entered
5     ONLINE USER EVALUATION                                                    in the profiling stage). By showing a hypothetical itinerary to the
                                                                                user, followed up to the current point, we wanted to reinforce in
We conducted an online user-study in order to measure the users’
                                                                                the user the specific setting of the supported recommendation task:
perceived novelty and satisfaction for the recommendations gen-
                                                                                next-POI recommendation.
erated by the Q-BASE model, the hybrid model Q-POP PUSH and
                                                                                    That said, the recommendation generation and evaluation phase
the SKNN baseline used in the offline study. We designed an online
                                                                                present a user interface that is organized as follows. At the top of
system which first profiles the user by asking her to enter as many
                                                                                the page there is an information box containing the (previously
as possible previously visited POIs (in Florence). Then the user
                                                                                mentioned) hypothetical (5-POIs) trajectory that the user should
is asked to evaluate a list of recommendations generated by the
                                                                                assume has followed (Figure 2). Below, there is an info box that
4 www.flickr.com                                                                explains the participant to assume that she has visited the selected


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                             6


                                                         Figure 1: POI selection UI detail.


                     Figure 2: Itinerary detail.


attractions, in the presented order. Finally, the participant is in-
formed that the beneath box (Figure 3) contains a list of POIs that
she can visit (recommendations) after the last POI in the itinerary.
The user is asked to mark the recommendations with one or more
of the following labels: “I already visited it” (eye icon), “I like it” for
a next visit (thumb up icon) and “I didn’t know it” (exclamation
mark icon).
   We recruited the experiment participants via social media and
mailing lists and we collected over 300 responses of which 202 are                               Figure 3: Evaluation. UI detail.
from users that visited Florence. After excluding unreliable replies
(e.g., survey completed in less than 2 minutes) we counted 158 users.
The number of recommended next-POI visits shown to the users is
1119 (approximately three by each of the three methods per user,                profile and then we run a nearest neighbor classifier where the
excluding the items recommended by two or more method simulta-                  training data are the existent trajectories in the data set, already
neously). Hence on average a user has seen 7.1 recommendations.                 classified in the 5 clusters. We assessed the classifier performance
                                                                                by splitting the trajectories data set: 80% of the dataset has been
5.2    Recommendation List Generation                                           used for training the classifier and the remaining 20% has been
In order to generate recommendations using Q-BASE and Q-POP                     used as test set. In a 10-fold cross-validation the classifier showed
PUSH an online user must be associated to one of the five existing              an accuracy of 0.67. Hence, the quality of this classifier is not very
trajectories’ clusters. In fact, the user behavioural model is consid-          high. This may have penalised both Q-BASE and Q-POP PUSH in
ered to be shared with the other users in the same cluster, and it is           the online study.
learned by using the trajectories already present in the cluster.
                                                                                5 POIS Fictitious Itinerary. Once the user is associated to a clus-
Matching a user to a cluster. In order to associate an online user              ter, among all the trajectories in the cluster we identify the trajec-
to a pre-existent cluster (among the 5 that we created) we built a              tory in the cluster with the highest overlap (intersection) with the
tf-idf representation of the POIs (documents) that are in the user              POIs selected by the study participant (randomly breaking ties). On


         Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                        7


the user interface, as we mentioned above, in order to avoid infor-                     a lower probability that the recommended item be already visited
mation overload, we show to the user at most 5 items, of her user                       (16%) and the highest probability that the recommended item be
profile, ordered according to the matched itinerary, found in the                       novel (52%). This is in line with the offline study where Q-BASE
matched cluster. The itinerary is shown to the user as her current                      excels in recommending novel items.
(hypothesized) sequence of visited POIs in order to evaluate the                            Considering now the user satisfaction for the recommendations
next-POI recommendations as appropriate or not to complete the                          (liked), we conjectured that a high reward of an algorithm mea-
initiated itinerary.                                                                    sured offline, corresponds to a high perceived satisfaction (likes)
                                                                                        measured online. But, by looking at the results in Table 2 we have
Recommendations. Given the fictitious hypothesized itinerary                            a different outcome. Q-BASE, which has the highest offline reward
followed by the user so far, next-POI recommendations are inde-                         recommends items that an online user likes with the lowest prob-
pendently generated leveraging the algorithms Q-BASE, Q-POP and                         ability (36%). Q-POP PUSH and SKNN recommend items that are
kNN. Then, from the recommendations generated by the algorithms                         more likely to be liked by the user (46%).
we filter out (post-filtering) the POIs already in the user profile. This                   Another measure of system precision that we computed is the
is an important feature of our study: we wanted to suggest POIs                         probability that a user likes a novel recommended POI, i.e., a POI
that are good for a next visit, i.e., that the user has not yet visited 5 .             that the recommender presented for the first time to the user (“Liked
Moreover, in order to avoid biases in the recommendation evalu-                         & Novel” in Table 2). We note that this is the primary goal of a
ation phase we do not reveal to the user which recommendation                           recommender system: to enable users to discover items that are
algorithm has produced which POI recommendation.                                        interesting for them, not to suggest items that the user likes, but
    Furthermore, to control the “position bias” [19, 20], i.e., the ten-                that she is already aware of, or she has already consumed. There is
dency of users to select top positioned items in a list, regardless                     poor utility of such a functionality. In this case, Q-BASE (highest
of their relevance, we aggregate the top-3 suggestions of each al-                      reward and lowest precision offline) recommends items that a user
gorithm without giving to any algorithm a particular priority. In                       will find novel and also like with the highest probability (0.09%),
fact, at first, we (randomly for each user) generate an order that we                   whereas SKNN and Q-POP PUSH recommends items that the user
follow to pick items from the three lists of the top-3 suggestions                      will find novel and will like with a lower probability(0.08%). We
generated by the three considered algorithms. Then we aggregate                         believe that the online computed “Liked & Novel” probability is
the three ranked list by picking up, in turn, the items from the top                    a better measure of the precision of a RS. In fact, the standard
to the bottom of the sorted lists. For instance, if the generated order                 offline estimation of precision, which is computed on the base of an
is Q-BASE, kNN and Q-POP, then, the aggregated list of recommen-                        artificial split of the available liked items into train/test is not able
dations (max length 9) that is shown to the user, contains in the                       to estimate how, not yet experienced items that the recommender
first position the top recommendation of Q-BASE, then the top item                      suggests may be liked by the user. It is also worth noting the low
suggested by kNN and then that suggested by Q-POP. The same                             scores of this metric: it is hard to observe a user that liked a novel
pattern is applied for the remaining positions: the fourth item in                      item. This aspect is further discussed below.
the aggregated list is the second best POI suggested by Q-BASE                              In order to further study the online user evaluation of the rec-
and at the fifth and sixth positions are placed the second best POIs                    ommended items, we have computed the probability that a user
suggested by kNN and Q-POP. In the case a POI is suggested by                           will like recommendations given the fact that she knows the item
more than one algorithm, the item is shown only once.                                   but has not yet visited it (“Known & Not Visited”), she visited it
                                                                                        (“Visited”) or the item is “Novel” for her. The results of this analysis
6    RESULTS OF THE ONLINE USER STUDY                                                   are shown in Table 3. The novel POIs recommendations generated
The results of the recommendation generation and evaluation phase                       by SKNN and Q-POP PUSH are liked more (20% and 22%) than those
are shown in Table 2. We show here the probabilities that a user                        produced by Q-BASE (17%). We believe that this is because often
marks as “visited”, “novel”, “liked” (for a next visit) or both “liked”                 Q-BASE suggests items that are very specific and users may find
and “novel” an item recommended by an algorithm. They are com-                          hard to evaluate them. For instance, Q-BASE suggests often “Porta
puted by dividing the total number of items marked as, visited,                         della Mandorla” which is a door of the “Duomo”. This POI can be
liked, novel and both liked and novel, for each algorithm, by the                       perceived as a niche item and much less attractive than the “Duomo”
total number of items shown by an algorithm. By construction, each                      itself. Moreover, by conducting post-survey activities participants
algorithm contributes with 3 recommendations in the aggregated                          declared that it is difficult to like something that is unknown.
list shown to each user. It is worth stressing that a user marked as                        In fact, the probability that a user likes a recommended POI that
“liked” an item that she judged as a good candidate for a next POI                      she has visited tends to be much larger. This probability is 31% and
visit. Hence, here a “like” is not a generic appreciation of the item,                  28% for Q-POP PUSH and SKNN respectively. Whereas, Q-BASE
but takes (partially) into account the visit context (what items the                    also here performs worse (26%). We think that the performance
user has already visited).                                                              difference is again due to the fact that both SKNN and Q-POP tend
    We note that the POIs recommended by SKNN and Q-POP have                            to recommend popular POIs (easier to judge), whereas Q-BASE
the highest probability (24%) that the user has already visited them,                   recommends more “niche” items.
and the lowest probability to be considered as novel. Q-BASE scores                         Considering now the probability that a user will like an item that
                                                                                        she knows but has not yet visited we see again a similar pattern
5 Still some recommendations can be not novel because the user will never declare all   as before: Q-POP PUSH and SKNN suggest items that will be liked
the POIS that she visited or she knows in the city.                                     with a higher probability (81% and 80%) than Q-BASE (71%). These


          Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
RecTour 2019, September 19th, 2019, Copenhagen, Denmark.                                                                                                                  8


Table 2: Probability to evaluate a recommendation of an al-                         Another future work direction is the analysis of the users’ percep-
gorithm as visited, novel and liked.                                             tion of the recommendations generated by the different algorithms
                                                                                 given the possibly different users’ knowledge of the target destina-
                                 Q BASE        Q POP   SKNN                      tion.
             Visited             0.165         0.245   0.238
             Novel               0.517         0.376   0.371                     ACKNOWLEDGMENTS
             Liked               0.361         0.464   0.466                     The research described in this paper was developed in the project
             Liked & Novel       0.091         0.076   0.082                     Suggesto Market Space in collaboration with Ectrl Solutions and
                                                                                 Fondazione Bruno Kessler.
Table 3: Probability that a user likes a suggested item given
that she visited, knew or is unaware of it.                                      REFERENCES
                                                                                  [1] G. Adomavicius and A. Tuzhilin. 2011. Context-Aware Recommender Systems.
                                                                                      In Recommender Systems Handbook, F. Ricci, Lior Rokach, Bracha Shapira, and
                                           Q BASE      Q POP    SKNN                  Paul B. Kantor (Eds.). 217–253.
    P(Liked | Novel)                       0.176       0.202    0.222             [2] M. Babes-Vroman, V. Marivate, K. Subramanian, and M. Littman. 2011. Appren-
                                                                                      ticeship learning about multiple intentions. In Proceedings of the 28th International
    P(Liked | Visited)                     0.256       0.310    0.283                 Conference on Machine Learning - ICML’11. 897–904.
    P(Liked | Known & Not Visited)         0.717       0.810    0.806             [3] S. Ermon, Y. Xue, R. Toth, B. Dilkina, R. Bernstein, T. Damoulas, P. Clark, S.
                                                                                      DeGloria, A. Mude, C. Barrett, and C. P. Gomes. 2015. Learning Large Scale Dy-
                                                                                      namic Discrete Choice Models of Spatio-Temporal Preferences with Application
                                                                                      to Migratory Pastoralism in East Africa. Proceedings of the Twenty-Ninth AAAI
probabilities are very large. We conjecture that this is because these                Conference on Artificial Intelligence Pattern, 644–650.
are popular items that the user has not yet visited. In fact, if we               [4] D. Jannach and L. Lerche. 2017. Leveraging Multi-Dimensional User Models for
                                                                                      Personalized Next-Track Music Recommendation. In Proceedings of the Sympo-
compare the probabilities that a user will like an item given that                    sium on Applied Computing - SAC’17. 1635–1642.
is novel, visited or known but not yet visited, we see that it is the             [5] J. Kennan and J. R. Walker. 2011. The Effect of Expected Income on Individual
largest for the latter items (> 70%), it is lower for the visited items (>            Migration Decisions. Econometrica 79, 1 (2011), 211–251.
                                                                                  [6] D. D. Lee and H. S. Seung. 1999. Learning the parts of objects by non-negative
26%) and and the lowest for the novel items (< 22%). This reinforce                   matrix factorization. Nature 401, 6755 (1999), 788–791.
the conjecture that users tends to like items they are familiar with              [7] M. Ludewig and D. Jannach. 2018. Evaluation of session-based recommendation
(but they have not yet consumed).                                                     algorithms. User Model. User-Adapt. Interact. 28, 4-5 (2018), 331–390.
                                                                                  [8] D. Massimo and F. Ricci. 2018. Harnessing a generalised user behaviour model
                                                                                      for next-POI recommendation. In Proceedings of the 12th ACM Conference on
7     CONCLUSION AND FUTURE WORK                                                      Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018.
                                                                                      402–406.
In this paper we extend the state of the art in IRL-based next-POI                [9] D. Massimo and F. Ricci. 2019. Clustering Users’ POIs Visit Trajectories for
RSs. We started our analysis by hypothesising that users like more                    Next-POI Recommendation. In Information and Communication Technologies in
                                                                                      Tourism 2019, ENTER 2019, Proceedings of the International Conference in Nicosia,
the recommendations produced by IRL-models and that the poor                          Cyprus, January 30-February 1, 2019. 3–14.
offline accuracy of these models, compared to KNN approaches, is                 [10] B. Mobasher, H. Dao, T. Luo, and M. Nakagawa. 2002. Using Sequential and
due to the total absence of a popularity bias in the recommendation                   Non-Sequential Patterns in Predictive Web Usage Mining Tasks. In Proceedings
                                                                                      of the IEEE International Conference on Data Mining - ICDM ’02. 669–672.
generation. For that reason we designed two new hybrid models                    [11] C. I. Muntean, F. M. Nardini, F. Silvestri, and R. Baraglia. 2015. On Learning
that bias the pure IRL-model Q-BASE to suggest more popular items:                    Prediction Models for Tourists Paths. ACM Transactions on Intelligent Systems
                                                                                      and Technology 7, 1 (2015), 1–34.
Q-POP COMBINED and Q-POP PUSH.                                                   [12] A. Ng and S. Russell. 2000. Algorithms for inverse reinforcement learning. In
   We show with an offline experiment that the hybridization of                       Proceedings of the 17th International Conference on Machine Learning - ICML ’00.
Q-BASE results in an increase of precision: Q-POP PUSH performs                       663–670.
                                                                                 [13] S. Oppokhonov, S. Park, and I. K. E. Ampomah. 2017. Current Location-based
equally to SKNN-based approaches.                                                     Next POI Recommendation. In Proceedings of the International Conference on Web
   With an online test we show that the Q-BASE model excels in                        Intelligence (WI ’17). ACM, New York, NY, USA, 831–836.
suggesting novel items, whereas SKNN and Q-POP PUSH suggests                     [14] World Tourism Organization. 1995. Collection of Tourism Expenditure Statistics.
                                                                                      World Tourism Organization (UNWTO).
items that are “liked” more. We also show that if we consider the                [15] Revfine.com. 2019. Travel and Tourism Industry; An complete Overview of All
combined feedback “liked and novel”, i.e., recommendations that                       Activities. https://www.revfine.com/travel-and-tourism
                                                                                 [16] F. Ricci, L. Rokach, and B. Shapira. 2015. Recommender Systems: Introduction and
are liked and are novel to the user, Q-BASE outperforms both SKNN                     Challenges. In Recommender Systems Handbook, Francesco Ricci, Lior Rokach,
and Q-POP PUSH. Hence, we show that Q-BASE may be able to                             and Bracha Shapira (Eds.). 1–34.
better accomplish the most important task of a RS for tourism:                   [17] C. R. P. Robert and G. Casella. 2010. Introducing Monte Carlo Methods with R.
                                                                                      EU-Nachrichten, Themenheft, Vol. 30. Springer, New York, NY u.a.
suggesting relevant POIs that are unknown for a user and also                    [18] R. S Sutton and A. G. Barto. 2014. Reinforcement Learning: An Introduction (Second
relevant.                                                                             edition, in progress). The MIT Press.
   We emphasize here that the objective of this research is a next-              [19] E. C. Teppan and M. Zanker. 2015. Decision Biases in Recommender Systems.
                                                                                      Journal of Internet Commerce 14, 2 (2015), 255–275.
POI RS that harnesses a generalized tourist behavior model. While                [20] X. Wang, M. Bendersky, D. Metzler, and M. Najork. 2016. Learning to Rank with
in this work we showed the benefits of such a RS through a web-                       Selection Bias in Personal Search. In Proceedings of the 39th International ACM
                                                                                      SIGIR Conference on Research and Development in Information Retrieval (SIGIR
based study we are now conducting a novel user study with real                        ’16). ACM, New York, NY, USA, 115–124.
tourists interacting with a system while visiting a destination (South           [21] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. 2008. Maximum Entropy
Tyrol)6 .                                                                             Inverse Reinforcement Learning. In Proceedings of the 23rd National Conference
                                                                                      on Artificial Intelligence - AAAI’08. 1433–1438.
6 http://wondervalley.unibz.it
https://beacon.bz.it/wp-6/beaconrecommender/


          Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).