Sequential Recommendations in IoT scenarios with a Generalized User Behaviour Model David Massimo and Francesco Ricci Free University of Bozen-Bolzano, Bolzano, Italy damassimo@inf.unibz.it, fricci@unibz.it Abstract. Recommender Systems (RSs) are typically used to support users in finding web content of their interest. We consider here an al- ternative scenario: to support human decision making in the physical world. In particular, we focus on Internet of Things (IoT), where, for instance, users exploration of a sensor enabled city can be tracked and the knowledge of their choices (visit to points of interest, POIs) can be used to generate recommendations for not yet visited POIs. We lever- age two distinct components: a generalised user behavioural model and a complementary recommender system; here recommendations can deviate from the usual approach to directly use the learned behaviour model and suggest the most likely actions the user will take next. We also propose techniques for simulating user behaviour and analysing the collective dynamics of a population of users. Moreover, we tackle the lack of data produced by interactions of users with IoT augmented areas, by design- ing a simulator that can be used to collect user preferences and monitor their decisions. 1 Introduction RSs are techniques and tools that support human decision making by identifying items or actions relevant to a user [10]. Since users’ preferences and behaviour may also be influenced by contextual factors, such as, the items they previously experienced, context-aware RSs have been introduced [1]. Moreover, in order to leverage the knowledge derived from the order in which the user has consumed items, pattern-discovery [6, 2, 9] and reinforcement learning [11, 7] approaches have also been proposed. The first approach extracts common patterns from users’ behaviour logs and learns a predictive model of the next user action. The latter generates recommendations by using the optimal choice model (policy) of the user. Their common feature is to recommend items predicted by the user learnt behaviour, i.e., they suggest the user’s predicted next choice. Moreover, while the first approach can only suggest items that have been already observed (new item problem), the second assumes that the system knows the utility the IIR 2018, May 28-30, 2018, Rome, Italy. Copyright held by the author(s). 2 D. Massimo and F. Ricci user receives from actions, even though, in practice, users rarely provide explicit feedback (e.g., ratings). We believe that these solutions tend to generate recommendations that lack novelty and therefore tend to be not interesting for the user (especially the first one, since it suffers from the cold-start problem). We therefore propose a recommendation technique that leverages a more general and explainable user behavioural model, which is learnt by observing users that are “similar” to the target one, and decouples behaviour learning from recommendations generation, so that, recommendations can deviate from the strategy to recommend the most likely next action of the user. We see this as a prerequisite to design novel and more compelling recommendation strategies to the user. In order to learn the generalised user behaviour (choice policy) we exploit the observation of the choices of a population that acts in physical areas. Observations are acquired via Internet of Things (IoT) technologies, i.e., sensors networks that can register users’ activities (e.g., movements) and environment conditions (e.g., weather) [3]. We also propose to reuse the predicted behavioural model to simulate the collective behaviour of a population with alternative environment conditions. Moreover, since testing IoT solutions in real life can be expensive (development and infrastructure cost) and suitable datasets for RS research in IoT augmented areas are scarcely available, we propose a simulation environment for generating realistic data logs of user/object interactions. 2 Research Progress Up To Date In order to learn the generalised user behaviour, we have used Inverse Reinforce- ment Learning (IRL)[8], which has been already successfully applied to eco- nomics and robotics to learn an observed agent behaviour, but not yet to RSs. IRL, compared with other techniques for learning/predicting action sequences, such as, recurrent neural networks, has two distinguish aspects. Firstly, it learns to which extent items and context features are “liked” by a user (contribute to the acquired reward), whereas the other mentioned techniques do not. Secondly, it is able to learn even from small samples. IRL is used by us to learn the sequential decision making behaviour of a group of users by leveraging observations of their actions sequences, i.e., without knowing the users’ reward (e.g. rating feedback). For instance, in the POI visit scenario the observations are the temporally ordered attraction-visit actions. Moreover, IRL assumes that the user’s reward (utility) for a state (e.g., visiting a POI) is a linear function r = ϑT ·ϕ (reward function), where ϕ is the state feature vector and ϑ is the (unknown) user preference vector for the state features. State features can describe the location or the visit context of the user, e.g., the crowdedness level of the POI. From the reward function r, by assuming that a user chooses actions that maximise her reward, it can be derived the user (optimal) action-selection policy, i.e., the policy that given the current user’s state tells her to act so that her expected utility is maximized. 3 IRL typically learns the user reward function from a set of observed user’s actions sequences. Since in many cases user observed actions are few, and noisy, a better model of the user behaviour can be learned by grouping (e.g., clustering) similar users (e.g., the users sharing the same visit goals) and then by learning user behaviour from grouped observations (generalised user behaviour of the group). We have implemented this approach to behaviour learning in two case studies in the tourism domain: the first one in an indoor environment (i.e., a museum) [4] and the second one in an outdoor environment, i.e., a tourist area (under development). Our initial results show that IRL is capable of learning a group of users’ reward for actions and their action-selection policy even from small and noisy datasets. After the generalised user behaviour model of a group (group action-selection policy) is learnt, action recommendations for a user can be generated by con- sidering the observed sequence of her actions, as follows. If there are just a few observations (e.g., the sequence of actions performed by the user so far), then the generalised user behaviour (predictive model) of the group the user belongs to is used to suggest the optimal action that this user should do after the last visited POI. Conversely, in case there are more action observations for a user, recommendations can be generated by aggregating the group generalised user behaviour with an individual user preference model (built from the user obser- vations). In practice, the two actions’ rankings derived from the generalised user behaviour and the user preference model, are aggregated and the suggested ac- tion is the top one in this ranking. For instance, if a commuter is understood to like arts because he mostly visited museums (individual user preference model) and a close-by exhibition is estimated to be an optimal choice for group members, then the system could recommend it to her. We are currently studying an approach to group users’ actions sequences according to a common “semantic” structure that motivates the resulting bun- dles. Grouping is done by first representing action sequences according their state features and then by performing clustering (i.e., topic modelling). Groups formed by following the described approach are matching distinguished alterna- tive tourist types and their differences can be further illustrated by the diverse reward functions that are learnt for the groups. The proposed techniques are going to be implemented in a mobile applica- tion that exploits an IoT infrastructure to support tourists’ decision making by enabling them to discover new places or shops in an Italian alpine valley. As mentioned above, since testing of IoT solutions is expensive and suitable datasets are missing, in order to bootstrap the application, we have developed a simulation tool that allows individuals or groups to experience a simulated itinerary and visit to POIs [5]. Collected data will be used to initially assess the proposed behaviour learning and recommendation approaches. Evaluations will be performed in both on-line (e.g., A/B testing, questionnaires) and off-line (e.g., algorithms performance) settings. www.inf.unibz.it/∼damassimo/video/IoT demo.mp4 4 D. Massimo and F. Ricci Finally, we intend to use the learnt behavioural model in order to generate synthetic action sequences of a population under conditions of interest (e.g., the opening of a new road or of a new mall). In this way the global system dynamics and the recommender performance can be evaluated even in novel IoT configurations, further reducing the testing costs. Acknowledgments The research described in this paper is supported by the project Suggesto Market Space, funded by the Autonomous Province of Trento with Ectrl Solutions and Fondazione Bruno Kessler. References 1. Adomavicius, G., Tuzhilin, A.: Context-Aware Recommender Systems, pp. 217– 253. Springer US, Boston, MA (2011) 2. Jannach, D., Kamehkhosh, I., Lerche, L.: Leveraging multi-dimensional user mod- els for personalized next-track music recommendation. In: Proceedings of the Sym- posium on Applied Computing. pp. 1635–1642. SAC ’17, ACM, New York, NY, USA (2017) 3. Li, S., Xu, L.D., Zhao, S.: The internet of things: a survey. Information Systems Frontiers 17(2), 243–259 (2015) 4. Massimo, D., Elahi, M., Ricci, F.: Learning User Preferences by Observing User- Items Interactions in an IoT Augmented Space. In: 2017 Conf. on User Modelling, Adaptation and Personalization. ACM (2017) 5. Massimo, D., Not, E., Ricci, F.: User behaviour analysis in a simulated iot aug- mented space. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion. IUI ’18 Companion, ACM, New York, NY, USA (2018) 6. Mobasher, B., H. Dao, T. Luo, Nakagawa, M.: Using Sequential and Non-Sequential Patterns in Predictive Web Usage Mining Tasks. 2002 IEEE International Confer- ence on Data Mining, 2002. Proceedings. pp. 669–672 (2002) 7. Moling, O., Baltrunas, L., Ricci, F.: Optimal radio channel recommendations with explicit and implicit feedback. Proceedings of the 6th ACM conference on Recom- mender systems - RecSys ’12 p. 75 (2012) 8. Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: 17th Int. Conf. on Machine Learning. pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000) 9. Palumbo, E., Rizzo, G., Baralis, E.: Predicting Your Next Stop-over from Location- based Social Network Data with Recurrent Neural Networks. In: RECSYS 2017, 2nd ACM International Workshop on Recommenders in Tourism (RecTour’17), CEUR Proceedings Vol. 1906, August 27-31, 2017, Como, Italy,. pp. 1–8 (2017) 10. Ricci, F., Rokach, L., Shapira, B.: Recommender Systems: Introduction and Chal- lenges, pp. 1–34. Springer US, Boston, MA (2015) 11. Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. Journal of Machine Learning Research pp. 1265–1295 (2005)