Real-Time Recommendations in a Multi-Domain Environment Emanuel Lacic KTI Graz University of Technology Graz, Austria elacic@know-center.at ABSTRACT to analyze a lot of data, to support various data types and to han- Recommender systems are acknowledged as an essential instru- dle streams of new data (i.e., volume, variety and velocity defin- ment to support users in finding relevant information. However, ing the Big Data problem). In such large-scale settings, traditional adapting to different domain specific data models is a challenge, recommender systems usually analyze the data offline and update which many recommender frameworks neglect. Moreover, the ad- the generated model in regular time intervals. However, in many vent of the big data era has posed the need for high scalability and domains, choices made by users depend on factors which are sus- real-time processing of frequent data updates, and thus, has brought ceptible to change anytime. Lets take a shopping mall for example, new challenges for the recommender systems’ research community. where a user triggers frequent indoor location updates via a smart- In this work, we show how different item, social and location data phone application while moving through the mall. Employing an features can be utilized and supported to provide real-time recom- offline model update strategy that lasts hours or days may poten- mendations. We further show how to process data updates online tially miss the current location context of the user and fail to pro- and capture user’s real-time interest without recalculating recom- vide the right recommendations to match user’s real-time demand. mendations. The presented recommendation framework provides As a consequence, being able to capture user’s real-time interests a scalable and customizable architecture suited for providing real- is gaining momentum and is currently of high demand [5, 16, 4]. time recommendations to multiple domains. We further investigate the impact of an increasing request load and show how the runtime 2. BACKGROUND can be decreased by scaling the framework. Most existing work, which focuses on real-time recommenda- tions (e.g., Netflix [1], Microsoft [17], among others, e.g., [3, 23]), Keywords use offline batch processing frameworks like Apache Hadoop, Ma- hout or Spark. Other approaches use a relational database system scalability; real-time recommendations; Apache Solr; multi-domain; to provide near real-time recommendations by querying the recom- mendations from a generated data model [19]. However, to capture user’s real-time interest, streaming data needs to be processed on- 1. MOTIVATION line, thus needing to tackle the conflicting accuracy, real-time and In the past decade, there has been a vast amount of research in big data requirements. For example, recent research from Huang the field of recommender systems. Most of that work focuses on et al. [5] and Chandramouli et al. [4] goes into that direction by developing novel approaches [22] and improving accuracy [16]. utilizing a scalable Item-based Collaborative Filtering approach to Thus, many well known methods are available, such as Content- provide real-time recommendations. Based Filtering [15], Collaborative Filtering [21] or Matrix Fac- But, by focusing on the common user-item interactions, addi- torization [13], all having their unique strengths and weaknesses. tional contextual information is usually neglected. As such, the These approaches are traditionally adapted and applied with the fo- research community has also looked into exploiting social or lo- cus on a single domain model (e.g., marketplace, hotel, conference, cation data (e.g., [2, 13]). In doing so, personalized recommenda- etc.). However, to support a diverse set of domains is becoming an tions using Matrix Factorization dominate the literature. Jamali et important issue for modern recommender systems [12]. al. [6] predicted ratings using a Matrix Factorization model that In most domains, the prediction task is usually viewed as a two- incorporates social relations. Ma et al. [13] improved both Mean dimensional problem which one needs to solve (e.g., utilizing user- Absolute Error and Root-Mean-Square Error by incorporating so- item interactions). But nowadays it is not enough to support mul- cial information, using social regularization in two Matrix Factor- tiple domains on the basis of only one common data feature. With ization models. In general, Matrix Factorization based approaches the arrival of the big data era, recommender systems are expected need to be retrained, when the data changes. This tends to be time- consuming, especially in case of frequent data updates where it fails to capture user’s real-time demand. Furthermore, empirical studies showed that a large number of factors are needed so that Matrix Factorization based approaches can deal with sparse data [18]. 3. APPROACH AND METHODS In this work, we are interested in finding out to what extent dif- ferent data features (i.e., item, social or location) can be utilized or even be combined for real-time recommendation. To perform 1000 MP 1000 MP 1000 MP CFR CFR CFR 800 CFCL 800 CFCL 800 CFCL CFLN CFLN CFLN Time [ms] Time [ms] Time [ms] 600 Hyb 600 Hyb 600 Hyb 400 400 400 200 200 200 0 0 0 16 32 64 128 16 32 64 128 16 32 64 128 Threadpool size Threadpool size Threadpool size (a) 1 processing node (b) 2 processing nodes (c) 4 processing nodes Figure 1: Scalability experiment with five recommendation approaches (having the hybrid run the four approaches in parallel), mak- ing 325,005 independent recommendation requests to process. The exponentially increasing request loads (simulated by threadpools that continuously fire requests) are handled in three scenarios: (a) localy with only 1 processing node being deployed, (b) scaling the framework with 2 distributed nodes, and (c) having 4 distributed nodes to process the incoming recommendation requests. this task, we rely on data crawled from the virtual world of Sec- commender system is able to process data updates in real-time and ondLife 1 and perform an extensive evaluation in terms of nDCG immediately consider these updates (i.e., user’s real-time interest) and User Coverage [7] of the different content- and network-based in the recommendation process without the need for recalculations. data features. The main reason for choosing SecondLife data over In [9] we also presented the first open-source recommender frame- other sources are manifold, but mainly due to the fact that currently work based on the Apache Solr search engine. But as previously there are no other datasets available that comprise extensive item, mentioned, we considered the scalability issues only on the data social and location data of users at the same time. Building up on side of a domain, and not within the framework (e.g., handling an these results, the aim is to provide a general framework which can increased request rate). For that purpose, we recently presented (1) process streaming data online while providing real-time recom- ScaR [11]. ScaR adopts the microservices architecture and was mendations, (2) support a multi-domain environment and the cor- built with the focus on providing a scalable and customizable ar- responding data features, and (3) provide a scalable architecture to chitecture suited for providing real-time recommendations to mul- cope with increasing request loads. tiple domains. Different domains can run (and scale) the frame- Recently, search engines have gained attention in the context of work in isolated environments. The domain specific data features recommender systems [14]. While the results are promising, they and recommendation approaches can be dynamically customized do not provide explanations and evaluations of how such an ap- using a dedicated microservice which synchronizes the change to proach would perform in a big data, nor in a real-time multi-domain all domain-relevant nodes. environment. As such, the aim of this work is to proof the bene- To demonstrate ScaR’s scalability performance, Figure 1 reports fits of using search engines to support different data features while a runtime experiment on the Foursquare dataset with an increas- providing real-time recommendations. One issue, however, is that ing number of request loads. As described in [11], we requested in this way scalability problems are only tackled on the data side of five different recommendation approaches for 65,001 users, making the domain. In order to truly be able to support multiple domains, it 325,005 independent recommendations requests to process. We a recommender framework is needed which can additionally (1) be performed this experiment by simulating an increasing request rate customized with domain specific models and approaches, and (2) (load) to the system, having 16, 32, 64 and 128 threads simultane- cope with an increasing request load a domain could experience. ously requesting recommendations. These experiments were then Using the already mentioned SecondLife dataset, but also a much repeated three times: first having 1 local processing node and then, larger Foursquare dataset [20], we simulate an increasing recom- scaling it to 2 and 4 distributed nodes. As seen, the local deploy- mendation request load which such a framework needs to handle. ment has an exponential increase in the runtime as the load grows. Such behaviour is somewhat expected as the number of incoming recommendation requests cause a load spike and the processing 4. OUTCOMES threads consequently cause to much context switching. But, as we In [7] we showed to what extent different data features (derived deploy additional nodes, we can see a significant decrease in the from item, social and location data) can be utilized for recom- growth of the mean processing runtime when compared to the lo- mending items, low-level and top-level categories. In our results, cal deployment, which is crucial in cases when a maximal runtime we showed that approaches which utilize social data features can needs to be guarantied. outperform the ones based on item or location features in case of recommending items. In case of recommending categories, these differences get substantially smaller and even change in favor to item and location data. Moreover, our results suggests that com- bining the data sources should result into more robust recommen- 5. PLAN AND TIMELINE dations, especially in cases of recommendation tasks on different With respect to the future research workplan, the aim is to fur- levels of specialization (i.e., categories). In a similar fashion, we ther look into feasible strategies to balance the trade-off between also showed in [10] that location data can especially be helpful in accuracy and runtime in a multi-domain environment. For a thesis tackling cold-start users which have no interaction data whatsoever. conclusion, the idea is to find out how recent the utilized history In [8], we proofed the benefits of using the search engine Apache data and the candidate recommendations need to be (i.e., by con- Solr 2 to provide real-time recommendations. We showed that a re- sidering the exact time or a sliding window approach) in order to even better recommend user’s real-time interest. This would not 1 http://secondlife.com/ only lead to better accuracy but also to a better performance, as 2 http://lucene.apache.org/solr/ less data will need to be processed. 6. REFERENCES [11] E. Lacic, M. Traub, D. Kowald, and E. Lex. Scar: Towards a [1] X. Amatriain. Big & personal: Data and models behind real-time recommender framework following the netflix recommendations. In Proc. of BigMine ’13. microservices architecture. [2] K. Bischoff. We love rock ’n’ roll: Analyzing and predicting [12] Q. Liu and D. R. Karger. Kibitz: End-to-end friendship links in last.fm. In Proceedings of the 4th Annual recommendation system builder. In Proceedings of the 9th ACM Web Science Conference, WebSci ’12, pages 47–56. ACM Conference on Recommender Systems. ACM, 2015. ACM, 2012. [13] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. In [3] S. Chan, T. Stone, K. P. Szeto, and K. H. Chan. Predictionio: Proceedings of the Fourth ACM International Conference on a distributed machine learning server for practical software Web Search and Data Mining, WSDM ’11, pages 287–296. development. In Proc. of CIKM ’13. ACM, 2011. [4] B. Chandramouli, J. J. Levandoski, A. Eldawy, and M. F. [14] D. Parra, P. Brusilovsky, and C. Trattner. User controllability Mokbel. Streamrec: A real-time recommender system. In in an hybrid talk recommender system. In Proceedings of the Proceedings of the 2011 ACM SIGMOD International ACM 2014 International Conference on Intelligent User Conference on Management of Data, SIGMOD ’11, pages Interfaces, IUI ’14, pages 305–308. ACM, 2014. 1243–1246, 2011. [15] M. J. Pazzani and D. Billsus. Content-based recommendation [5] Y. Huang, B. Cui, W. Zhang, J. Jiang, and Y. Xu. Tencentrec: systems. In The adaptive web, pages 325–341. Springer, Real-time stream recommendation in practice. In 2007. Proceedings of the 2015 ACM SIGMOD International [16] C. Rana and S. K. Jain. A study of the dynamic features of Conference on Management of Data, SIGMOD ’15, pages recommender systems. Artificial Intelligence Review, 227–238, 2015. 43(1):141–153, 2015. [6] M. Jamali and M. Ester. A matrix factorization technique [17] R. Ronen, N. Koenigstein, E. Ziklik, M. Sitruk, R. Yaari, and with trust propagation for recommendation in social N. Haiby-Weiss. Sage: Recommender engine as a cloud networks. In Proceedings of the Fourth ACM Conference on service. In Proceedings of the 7th ACM Conference on Recommender Systems, pages 135–142. ACM, 2010. Recommender Systems, RecSys ’13, pages 475–476, 2013. [7] E. Lacic, D. Kowald, L. Eberhard, C. Trattner, D. Parra, and [18] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix L. Marinho. Utilizing online social network and factorization using markov chain monte carlo. In location-based data to recommend products and categories in Proceedings of the 25th International Conference on online marketplaces. In Mining, Modeling, and Machine Learning, ICML ’08, pages 880–887. ACM, 2008. Recommending ’Things’ in Social Media, pages 96–115. [19] M. Sarwat, J. Avery, and M. F. Mokbel. Recdb in action: Springer, 2015. Recommendation made easy in relational databases. Proc. [8] E. Lacic, D. Kowald, D. Parra, M. Kahr, and C. Trattner. VLDB Endow., 6(12):1242–1245, Aug. 2013. Towards a scalable social recommender engine for online [20] M. Sarwat, J. J. Levandoski, A. Eldawy, and M. F. Mokbel. marketplaces: The case of apache solr. In Proceedings of the Lars*: An efficient and scalable location-aware Companion Publication of the 23rd International Conference recommender system. IEEE Trans. on Knowl. and Data on World Wide Web Companion, WWW Companion ’14, Eng., 26(6):1384–1399, June 2014. pages 817–822. International World Wide Web Conferences [21] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. The Steering Committee, 2014. adaptive web. chapter Collaborative Filtering Recommender [9] E. Lacic, D. Kowald, and C. Trattner. Socrecm: A scalable Systems, pages 291–324. Springer-Verlag, 2007. social recommender engine for online marketplaces. In [22] G. Shani and A. Gunawardana. Evaluating recommendation Proceedings of the 25th ACM Conference on Hypertext and systems. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Social Media, HT ’14, pages 308–310, 2014. editors, Recommender Systems Handbook, pages 257–297. [10] E. Lacic, D. Kowald, M. Traub, G. Luzhnica, J. Simon, and Springer US, 2011. E. Lex. Tackling cold-start users in recommender systems [23] S. G. Walunj and K. Sadafale. An online recommendation with indoor positioning systems. system for e-commerce based on apache mahout framework. In Proceedings of the 2013 annual conference on Computers and people research, pages 153–158. ACM, 2013.