J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 240–245 CEUR Workshop Proceedings Vol. 1885, ISSN 1613-0073, c 2017 L. Peska Multimodal Implicit Feedback for Recommender Systems Ladislav Peska Department of Software Engineering, Faculty of Mathematics and Physics Charles University, Prague, Czech Republic Abstract. In this paper, we present an overview of our modelling, combining and utilizing novel/enhanced sources work towards utilization of multimodal implicit feedback in of information, foremost various implicit feedback features, recommender systems for small e-commerce enterprises. i.e., features based on the observed user behavior. We focus on deeper understanding of implicit user feedback Contrary to the explicit feedback, usage of implicit as a rich source of heterogeneous information. We present feedback [5], [17], [18], [28] requires no additional effort a model of implicit feedback for e-commerce, discuss from the users. Monitoring implicit feedback in general important contextual features affecting its values and varies from simple features like user visits or play counts to describe ways to utilize it in the process of user preference more sophisticated ones like scrolling or mouse movement learning and recommendation. We also briefly report on tracking [12], [29]. Due to its effortlessness, data are our previous experiments within this scope and describe a obtained in much larger quantities for each user. On the publicly available dataset containing such multimodal other hand, data are inherently noisy, messy and harder to implicit feedback. interpret [10]. Figure 1 depicts a simplified view of human-computer interaction on small e-commerce 1 Introduction enterprises with accent on the implicit feedback provided by the user. Recommender systems belong to the class of automated Our work lies a bit further from the mainstream of the content-processing tools, aiming to provide users with implicit feedback research. To our best knowledge, the vast unknown, surprising, yet relevant objects without the majority of researchers focus on interpreting single type of necessity of explicitly query for them. The core of implicit feedback [6], proposing various latent factor recommender systems are machine learning algorithms models [10], [26], its adjustments [9], [19] or focusing on applied on the matrix of user to object preferences. In large other aspects of recommendations using implicit feedback enterprises, user preference is primarily derived from based datasets [2], [25]. Also papers using binary implicit explicit user rating (also referred as explicit feedback) and feedback derived from explicit user rating are quite collaborative-filtering algorithms [11] usually outperforms common [16], [19]. other approaches [3]. In contrast to the majority of research trends, we In our research, however, we foucus on small or consider implicit feedback as multimodal and context- medium-sized e-commerce enterprises. This domain dependent. As our aim in this direction is a long-term one, introduce several specific problems and obstacles making we already published some of our findings [17], [18], [20], the deployment of recommender systems more challenging. [22], [23]. In our aim towards improving recommender Let us briefly list the key challenges: systems on small e-commerce enterprises, we focused on  High concurrency has a negative impact on user following aspects of implicit feedback: loyalty. Typical sessions are very short, users quickly  Cover the multimodality of implicit feedback. leave to other vendors, if their early experience is not  Propose relevant context of collected feedback. satisfactory enough. Only a fraction of users ever  Derive models of negative preference based on returns. implicit feedback  For those single-time visitors, it is not sensible to provide any unnecessary information (e.g., ratings, reviews, registration details).  Consumption rate is low, users often visit only a handful (0-5) of objects. All the mentioned factors contribute to the data sparsity problem. Although the total number of users may be relatively large (hundreds or thousands per day), explicit feedback is very scarce. Also the volume of visited objects per user is limited and utilizing popularity-based approaches w.r.t. purchases is questionable at best. Furthermore the identification of unique user is quite challenging. Despite these obstacles, the potential benefit of Figure 1: Simplified state diagram of human-computer recommender systems is considerable, it can contribute interaction in e-commerce: User enters the site via query or towards better user experience, increase user loyalty and object’s detail. He/she can navigate through category or consumption and thus also improve vendor’s key success search result pages implicitly updating his/her query, or metrics. proceed to evaluate details of selected objects and eventually Our work within this framework aims to bridge the execute steps to buy them. data sparsity problem and the lack of relevant feedback by Multimodal Implicit Feedback for Recommender Systems 241 We reserve Section 2.1 to the description of multimodal implicit feedback, Section 2.2 to the contextualization of user feedback and Section 2.3 to the problem of learning negative preference. For each problem, we describe relevant state of the art, current challenges as well as our proposed methods and models. Finally, we remark on the evaluation of proposed methods in Section 3 and conclude in Section 4. 2 Materials and Methods 2.1 Multimodal Implicit Feedback Despite the large volume of research based on a single implicit feedback feature, we consider implicit feedback to be inherently multimodal. Users utilize various I/O devices (mouse, keybord) to interact with different webpage’s GUI controls, so there is an abundant amount of potentially Figure 2: An example of mouse movement-based feedback interesting user actions. As the complexity of such collection on an e-commerce product detail page. Cursor environment is overwhelming, we imposed some positions (red line) are sampled periodically. Based on the restrictions: samples, approximated mouse in-motion time (green boxes)  Limit to the feedback related directly to some and travelled distance (blue line) are calculated. Cursor specific object, i.e., collect the feedback only from motion log is also stored for later reasoning. the object’s detail page.  Aggregate the same types of user actions on per this data gap was done quite recently in RecSys Challenges session and per object basis. 2016 and 20172. Both challenges’ datasets focused on job recommendation and contained several types of positive  Focus only on user actions which can be and negative user feedback. Although the dataset was not numerically aggregated, i.e., the desired feedback made publicly available, some approaches proposed features have numerical domain. relevant methods to deal with multimodal implicit See Figure 2 for an example of feedback features feedback, e.g., fixed weighting scheme [31], hierarchical derived from user actions. In the following experiments, we model of features [15] or utilizing features separately [28]. consider these implicit feedback features1: Some authors also mention the probability of re-interaction  Number of views of the page with objects on some domains [4], [33].  Dwell Time (i.e., the time spent on the object)  Total distance travelled by the mouse cursor. 2.1.1 Methods Utilizing Multimodal Feedback  Total mouse in motion time. Vast majority of the state-of-the-art approaches  Total scrolled distance. transforms multimodal implicit feedback into a single  Scroll Time (i.e., the time spent by scrolling) numeric output 𝑟̅, which can be viewed as a proxy towards  Clicks count (i.e., the volume of mouse clicks) user rating. However, these methods mostly use some fixed  Purchase (i.e., binary information whether user model of implicit feedback (i.e., predefined weights or bought this object). hierarchy of feedback features), or perform predictions Although multimodal implicit feedback is not a based on each feedback feature separately [28]. mainstream research topic, we were able to trace some In contrast to the other approaches, we aim on research papers. One of the first paper mentioning implicit estimating 𝑟̅ via machine learning methods applied on a feedback was Claypool et al. [5], which compared three purchase prediction task. Our approach is based on the fact implicit preference indicators against explicit user rating. that the only measurable implicit feedback with direct More recently Yang et al. [29] analyzed several types of interpretation of preference is buying an object. Such user behavior on YouTube. Authors described both positive events are however too scarce to be used as a sole user and negative implicit indicators of preference and proposed preference indicator. However, we can define a linear model to combine them. Also Lai et al. [12] work on classification task to determine, based on the values of RSS feed recommender utilizing multiple reading-related other feedback types, whether the object will be purchased user actions. by the user. The estimated rating 𝑟̅ is defined as the However, the lack of publicly available datasets probability of the purchased class. containing multimodal implicit feedback significantly We evaluated several machine learning methods, such as hinders advance of the area. Some work towards bridging decision trees, random forests, boosting, lasso regression and linear regressions. We also evaluated approaches based 1 Please note that the dataset used for the experiments on the more feedback the better heuristics, i.e., the higher contains also other feedback features such as number of value of particular feedback feature implies higher user page prints, followed links count, several non-numeric preference. In order to make the domains of all feedback feedback features etc. These features seemed not relevant features comparable, we utilized either standardization of for the current task, however they may be utilized in the 2 future. [2016|2017].recsyschallenge.com 242 L. Peska Figure 4: Two examples of relevant presentation context. A: overall length of text (and possibly also amount of other objects, e.g., images) affects the necessary reading time accessed through the dwell time feedback feature. B: the difference between page dimensions and device displaying size affects the necessity to scroll the content and thus, e.g., scrolled distance feature. feedback values (denoted as Heuristical with STD in signifficantly improve over the baseline methods while results), or used empirical cumulative distribution instead using presentation context features as additional input of of raw feedback values (Heuristical with CDF). The the machine learning methods described in Section 2.1.1. estimated rating 𝑟̅ is defined as the mean of STD or CDF values of all feedback features for the respective user and 2.3 Negative Implicit Feedback and Preferential object. For more details on heuristical approaches please Relations refer to [23], for more details on machine learning One of the open problems in implicit feedback utilization approaches please refer to [18]. is learning negative preference from implicit feedback. Several approaches were proposed for this task including 2.2 Context of User Feedback uniform negative preference of all unvisited objects [10], Although the user feedback may be processed directly, considering low volume of feedback as negative the perceived feedback values are significantly affected by preference [20] or defining special negative feedback the presentation of the page (i.e., device parameters) and feature [13], [29], [31]. also by the amount of contained information. Both the We propose to utilize negative preference as relations displaying device and the page’s complexity can be among less and more preferred objects, i.e. to model described by several numeric parameters, which we a partial ordering 𝑜1 <𝑝 𝑜2 . This model is based on the generally denote as the presentation context. See Figure 4 for some examples of relevant presentation context. We can trace some notions of presentation context in the literature. Yi et al. [30], proposed to use dwell time as an indicator of user engagement. Authors discussed the usage of several contextual features, e.g., content type, device type or article length as a baseline dwell time estimators. Furthermore, Radlinski et al. [24] and Fang et al. [8] considered object position as a relevant context for clickstream events. The presentation context differs significantly from more commonly used user context [1] in both its definition, methods for feature collection as well as ways to incorporate it into the recommending pipeline. While the nature of user context is rather restrictive [32], we interpret presentation context as a baseline predictor or an input feature for machine learning process. In our work, we considered following presentation context features: Figure 3: An example of preference relations based on the  Volumes of text, images and links on the page.  Page dimensions (width, height). feedback on a category page. Initially, objects 𝒐𝟏 – 𝒐𝟒 are visible. After some time, user scrolls and also object 𝒐𝟓 gets  Browser’s visible area dimensions (width, height). to the visible window. However, objects 𝒐𝟔 and 𝒐𝟕 remains  Visible area ratio. outside of the visible area. If user clicks on object 𝒐𝟑 , his/her  Hand-held device indicator. behavior induces negative feedback on objects 𝒐𝟏,𝟐,𝟒,𝟓 and We evaluated several approaches utilizing biases for we collect relations 𝒐𝟏,𝟐,𝟒,𝟓 <𝒑 𝒐𝟑 . However, the intensity of feedback feature values based on the current presentation 𝒐𝟓 <𝒑 𝒐𝟑 is smaller than for the other objects, because 𝒐𝟓 context. Such approaches, however, were not very was visible only for a short period of time and thus it is more successful and in particular often did not improve over the probable that user did not notice it. baselines without any context at all. On the other hand, we Multimodal Implicit Feedback for Recommender Systems 243 work of Eckhardt et al. [7] proposing to consider user 3.1 Evaluation Procedure ratings (implicit feedback in our scenario) in the context of The dataset of multimodal user feedback (including other objects available on the current page. Implicit presentation context) was collected by observing real preferential relations can be naturally obtained from visitors of a mid-sized Czech travel agency. The dataset implicit feedback collected on category pages, search was collected by the IPIget tool [17] over the period of results or similar pages. In such cases, user usually selects more than one year, contains over 560K records and is one (or more) objects out of the list of available objects for available for research purposes3. In addition to the feedback further inspection. By this behavior, user also implicitly features, the dataset also contains several content-based provides negative feedback on the ignored choices and thus attributes of objects and thus enables usage of content- induce a preferential relation 𝑖𝑔𝑛𝑜𝑟𝑒𝑑 <𝑝 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑. based recommender systems as well. However, we need to approach to such negative feedback In evaluation of the methods, we considered following with caution as some of the options might not be visible for tasks: the user at all, or only for a very short time. This is quite serious problem, because, in average, only 47% of the  Purchase prediction based on the other feedback catalogue page content was visible in the browser window available for particular user-object pair. This in our dataset. Thus, we also introduce intensity of the scenario provides preliminary results for the relation <𝑝 based on the visibility of the ignored object. methods aiming to estimate user rating 𝑟̅. Figure 3 illustrates this.  Recommending purchased objects. In this scenario, We incorporate preferential relations into the we employ leave-one-out cross-validation protocol recommendation pipeline by extending collected relations on purchased objects (i.e., for each purchased object, along the content-based similarity of both ignored and all other feedback is used as a train set and we aim selected objects (decreasing level of similarity effectively to recommend object, which was actually purchased decreases also the intensity of the relation). Afterwards, we by the user). apply re-ranking approach taking output of some baseline  Recommending “future” user actions. In this recommender and re-order the objects so that the relations scenario, we use older user feedback (usually 2/3 of with higher intensity holds. Re-ranking algorithm considers available feedback per user) as a train set. During the relations according to the increasing intensity and the recommendation phase, we recommend top-k corrects the ordering induced by the relation. Thus, more objects to each user, while the objects from the test intense relations should be preferred in case of conflicts. set visited by the user should appear on top of the Details of the re-ranking algorithm can be found in [22]. list. In several of our previous works (see, e.g., [21] or the 3 Evaluation results of Matrix Factorization [11] in Table 3) we have shown that purely collaborative methods are not very In this section, we would like to report on the suitable for small e-commerce enterprises due to the experiments conducted to evaluate models and methods ongoing cold-start problem. Thus, we mostly focus on the utilizing multimodal implicit feedback. However, let us content-based and hybrid recommending techniques. More first briefly describe the dataset and evaluation procedures. specifically, we utilized Vector Space Model (VSM) [14], Table 1: Results (nDCG) of purchase prediction task based on multimodal implicit feedback and presentation context. The task was considered as ranking (i.e., purchased objects should appear on top of the list of all visited objects). Method Dwell Time Multimodal Feedback + Feedback + Feedback + feedback Context aggregated BP individual BP Heuristical with CDF 0.663 0.712 0.780 0.696 0.690 Heuristical with STD 0.747 0.703 0.856 0.695 0.704 Linear Regression 0.747 0.789 0.917 0.804 0.925 J48 decision tree 0.663 0.722 0.908 0.839 0.876 Table 2: Results (nDCG) of recommending purchased objects task. Combination of most-popular and VSM recommender was used to derive the list of objects. Aggregated BP denotes baseline predictors aggregated for a particular feedback feature over all available context, individual BP introduces a baseline predictor for each pair of contextual and feedback feature, Feedback + Context treats contextual features as additional input oft he methods estimating 𝑟̅. Method Binary Dwell Time Multimodal Feedback + Feedback + Feedback + Feedback feedback Context aggregated BP individual BP Heuristics with CDF 0.255 0.257 0.253 0.258 0.257 Heuristics with STD 0.208 0.174 0.196 0.161 0.158 0.255 Linear Regression 0.256 0.254 0.176 0.252 0.251 J48 decision tree 0.238 0.256 0.273 0.240 0.248 3 http://bit.ly/2tWtRg2 244 L. Peska may be further benefits of using preferential relations in Table 3: Results (nDCG) of recommending future online scenarios. interactions task with re-ranking based on the preferencial relations. 4 Conclusions Method nDCG In this paper, we describe our work in progress towards VSM + Preferential Relations 0.4381 utilizing multimodal implicit feedback in small e- VSM 0.4376 commerce enterprises. Specifically, we focused on three Popular SimCat + Preferential Relations 0.3982 related tasks. Integrate multiple types of feedback collected Popular SimCat 0.3962 Matrix Factorization + Preferential Relations 0.220 on the detail of an object into an estimated user rating 𝑟̅, Matrix Factorization 0.138 incorporate presentation context into the previous model and utilize negative implicit feedback collected on category its combination with the most popular recommendations pages. We propose models and methods for each of the and a hybrid algorithm proposing most popular objects task and also provide evaluation w.r.t. top-k ranking. from the categories similar (based on collaborative Although the proposed methods statistically filtering) to the visited ones [22]. As we consider signifficantly improved over the baselines, the relative recommending problem as a ranking optimization, all improvement is not too large, so our work is not finished methods were evaluated w.r.t. normalized discounted yet. One of the important future tasks is to perform online cumulative gain (nDCG). evaluation as the offline evaluation was focused on the exploitation only. Further tasks are to propose context 3.2 Results and Discussion incorporation models specific for some context-feedback feature pairs, explore other possibilities to incorporate Results of several methods aiming to learn estimated negative feedback and also to evaluate unified approach rating 𝑟̅ based on various feature sets are displayed in Table integrating all presented methods. 1 (purchase prediction task) and Table 2 (recommending purchased objects task). As we can see in Table 1, Acknowledgment multimodal feedback significantly improves purchase prediction capability across all methods. Usage of The work on this project was supported by Czech grant presentation context can further improve the results, if used P46. Source codes and datasets for incorporation of as additional input feature (Feedback + Context). However, presentation context can be obtained from if the contextual features are used as baseline predictors, http://bit.ly/2rJZzg3, source codes of preferential relations the results across all methods are inferior to the results of approach can be obtained from http://bit.ly/2symm17 and Feedback + Context with just one exception. In several raw dataset can be obtained from http://bit.ly/2tWtRg2. cases, the results are worse than using multimodal feedback alone. This observation indicates that some more complex References dependence exists between implicit feedback, presentation context and user preference. Although it seems that the [1] Adomavicius, G. & Tuzhilin, A. Context-Aware examined machine learning methods can partially discover Recommender Systems. Recommender Systems this relation, another option to try is to hand-pick only Handbook, Springer US, 2015, 191-226 [2] Baltrunas, L. & Amatriain, X.: Towards time-dependant several relevant contextual scenarios instead of the global recommendation based on implicit feedback. In CARS 2009 model applied so far. Results of recommendation task also (RecSys). revealed a potential problem of overfitting on the purchase [3] de Campos, L. M.; Fernandez-Luna, J. M.; Huete, J. F. prediction task. Linear regression, although it performed & Rueda-Morales, M. A. Combining content-based the best in purchase prediction scenario, did not improve and collaborative recommendations: A hybrid over binary feedback baseline. On the other hand, we can approach based on Bayesian networks. International conclude that if a suitable rating prediction is selected, Journal of Approximate Reasoning , 2010, 51, 785 – multimodal implicit feedback together with presentation 799 context can improve the list of recommended objects. [4] Carpi, T.; Edemanti, M.; Kamberoski, E.; Sacchi, E.; Results of re-ranking approach based on preferential Cremonesi, P.; Pagano, R. & Quadrana, M. Multi-stack relations are depict in Table 3. Re-ranking based on Ensemble for Job Recommendation. In proceedings of preferential relations improved results of all evaluated the Recommender Systems Challenge, ACM, 2016, 8:1- recommending algorithms, although the improvement was 8:4 rather modest in case of VSM. During evaluation, we [5] Claypool, M,; Le, P.; Wased, M. & Brown, D.: Implicit observed that in case of VSM, only the relations with interest indicators. In IUI '01, ACM, 2001, 33-40. highest intensity should be applied to improve the results. [6] Cremonesi, P.; Garzotto, F.; Turrin, R.: User-Centric vs. For matrix factorization approach, on the other hand, also System-Centric Evaluation of Recommender Systems. In relations with very low intensities should be incorporated. INTERACT 2013, Springer LNCS 8119, 2013, 334-351 Another point is that the offline evaluation is naturally [7] Eckhardt, A.; Horvath, T. & Vojtas, P. PHASES: A User focused on mere learning past user behavior and both VSM Profile Learning Approach for Web Search. In WI-IAT ‘07, IEEE Computer Society, 2007, 780-783 and Popular SimCat are largely biased towards exploitation [8] Fang, Y. & Si, L. A Latent Pairwise Preference Learning in exploration vs. exploitation problem [27]. Hence, there Approach for Recommendation from Implicit Feedback. In CIKM 2012, ACM, 2012, 2567-2570 Multimodal Implicit Feedback for Recommender Systems 245 [9] Hidasi, B. & Tikk, D.: Initializing Matrix Factorization [30] Yi, X.; Hong, L.; Zhong, E.; Liu, N. N. & Rajan, S. Beyond Methods on Implicit Feedback Databases. J. UCS, 2013, 19, Clicks: Dwell Time for Personalization. In RecSys 2014, 1834-1853 ACM, 2014, 113-120. [10] Hu, Y.; Koren, Y. & Volinsky, C.: Collaborative Filtering for Implicit Feedback Datasets. [31] Zhang, C. & Cheng, X. An Ensemble Method for Job In ICDM 2008, IEEE 2008, 263-272 Recommender Systems. In proceedings of the [11] Koren, Y.; Bell, R. & Volinsky, C. Matrix Factorization Recommender Systems Challenge, ACM, 2016, 2:1-2:4 Techniques for Recommender Systems. Computer, IEEE [32] Zheng, Y.; Burke, R. & Mobasher, B. Computer Society Press, 2009, 42, 30-37 Recommendation with Differential Context Weighting. [12] Lai, Y., Xu, X., Yang, Z., Liu, Z. User interest prediction In UMAP 2013, Springer, 2013, 152-164 based on behaviors analysis. Int. Journal of Digital Content [33] Zibriczky, D. A Combination of Simple Models by Technology and its Applications, 6 (13), 2012, 192-204 Forward Predictor Selection for Job Recommendation. [13] Lee, D. H. and Brusilovsky, P.: Reinforcing Recommendation Using Implicit Negative Feedback. In In proceedings of the Recommender Systems UMAP 2009, Springer, LNCS, 2009, 5535, 422-427 Challenge, ACM, 2016, 9:1-9:4 [14] Lops, P.; de Gemmis, M. & Semeraro, G. Content-based Recommender Systems: State of the Art and Trends. Recommender Systems Handbook, Springer, 2011, 73-105 [15] Mishra, S. K. & Reddy, M. A Bottom-up Approach to Job Recommendation System. In proceedings of the Recommender Systems Challenge, ACM, 2016, 4:1-4:4 [16] Ostuni, V. C.; Di Noia, T.; Di Sciascio, E. & Mirizzi, R.: Top-N recommendations from implicit feedback leveraging linked open data. In RecSys 2013, ACM, 2013, 85-92. [17] Peska, L.: IPIget – The Component for Collecting Implicit User Preference Indicators. In ITAT 2014, Ustav informatiky AV CR, 2014, 22-26, http://itat.ics.upjs.sk/workshops.pdf [18] Peska, L.: Using the Context of User Feedback in Recommender Systems. MEMICS 2016, EPTSC 233, 2016, 1-12. [19] Peska, L.: Linking Content Information with Bayesian Personalized Ranking via Multiple Content Alignments. To appear in HT 2017, ACM, 2017 [20] Peska, L. & Vojtas, P.: Negative implicit feedback in e- commerce recommender systems In WIMS 2013, ACM, 2013, 45:1-45:4 [21] Peska, L. & Vojtas, P.: Recommending for Disloyal Customers with Low Consumption Rate. In SOFSEM 2014, Springer, LNCS 8327, 2014, 455-465 [22] Peska, L. & Vojtas, P. Using Implicit Preference Relations to Improve Recommender System. J Data Semant, 6(1), 2017, 15-30. [23] Peska, L. & Vojtas, P. Towards Complex User Feedback and Presentation Context in Recommender Systems. BTW 2017 Workshops, GI-Edition, LNI, P-266, 2017, 203-207 [24] Radlinski, F. & Joachims, T. Query chains: learning to rank from implicit feedback. In ACM SIGKDD 2005, ACM, 2005, 239-248 [25] Raman, K.; Shivaswamy, P. & Joachims, T.: Online Learning to Diversify from Implicit Feedback. In KDD 2012, ACM, 2012, 705-713 [26] Rendle, S.; Freudenthaler, C.; Gantner, Z. & Schmidt- Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009, AUAI Press, 2009, 452-461. [27] Rubens, N.; Kaplan, D. & Sugiyama, M. Active Learning in Recommender Systems. In Recommender Systems Handbook, Springer US, 2011, 735-767 [28] Xiao, W.; Xu, X.; Liang, K.; Mao, J. & Wang, J. Job Recommendation with Hawkes Process: An Effective Solution for RecSys Challenge 2016. In proceedings of the Recommender Systems Challenge, ACM, 2016, 11:1-11:4 [29] Yang, B.; Lee, S.; Park, S. & Lee, S.: Exploiting Various Implicit Feedback for Collaborative Filtering. In WWW 2012, ACM, 2012, 639-640.