Real-Time Recommendations in a Multi-Domain
                            Environment

                                                            Emanuel Lacic
                                                                KTI
                                                    Graz University of Technology
                                                           Graz, Austria
                                                      elacic@know-center.at

ABSTRACT                                                               to analyze a lot of data, to support various data types and to han-
Recommender systems are acknowledged as an essential instru-           dle streams of new data (i.e., volume, variety and velocity defin-
ment to support users in finding relevant information. However,        ing the Big Data problem). In such large-scale settings, traditional
adapting to different domain specific data models is a challenge,      recommender systems usually analyze the data offline and update
which many recommender frameworks neglect. Moreover, the ad-           the generated model in regular time intervals. However, in many
vent of the big data era has posed the need for high scalability and   domains, choices made by users depend on factors which are sus-
real-time processing of frequent data updates, and thus, has brought   ceptible to change anytime. Lets take a shopping mall for example,
new challenges for the recommender systems’ research community.        where a user triggers frequent indoor location updates via a smart-
In this work, we show how different item, social and location data     phone application while moving through the mall. Employing an
features can be utilized and supported to provide real-time recom-     offline model update strategy that lasts hours or days may poten-
mendations. We further show how to process data updates online         tially miss the current location context of the user and fail to pro-
and capture user’s real-time interest without recalculating recom-     vide the right recommendations to match user’s real-time demand.
mendations. The presented recommendation framework provides            As a consequence, being able to capture user’s real-time interests
a scalable and customizable architecture suited for providing real-    is gaining momentum and is currently of high demand [5, 16, 4].
time recommendations to multiple domains. We further investigate
the impact of an increasing request load and show how the runtime      2.    BACKGROUND
can be decreased by scaling the framework.                                Most existing work, which focuses on real-time recommenda-
                                                                       tions (e.g., Netflix [1], Microsoft [17], among others, e.g., [3, 23]),
Keywords                                                               use offline batch processing frameworks like Apache Hadoop, Ma-
                                                                       hout or Spark. Other approaches use a relational database system
scalability; real-time recommendations; Apache Solr; multi-domain;
                                                                       to provide near real-time recommendations by querying the recom-
                                                                       mendations from a generated data model [19]. However, to capture
                                                                       user’s real-time interest, streaming data needs to be processed on-
1.   MOTIVATION                                                        line, thus needing to tackle the conflicting accuracy, real-time and
   In the past decade, there has been a vast amount of research in     big data requirements. For example, recent research from Huang
the field of recommender systems. Most of that work focuses on         et al. [5] and Chandramouli et al. [4] goes into that direction by
developing novel approaches [22] and improving accuracy [16].          utilizing a scalable Item-based Collaborative Filtering approach to
Thus, many well known methods are available, such as Content-          provide real-time recommendations.
Based Filtering [15], Collaborative Filtering [21] or Matrix Fac-         But, by focusing on the common user-item interactions, addi-
torization [13], all having their unique strengths and weaknesses.     tional contextual information is usually neglected. As such, the
These approaches are traditionally adapted and applied with the fo-    research community has also looked into exploiting social or lo-
cus on a single domain model (e.g., marketplace, hotel, conference,    cation data (e.g., [2, 13]). In doing so, personalized recommenda-
etc.). However, to support a diverse set of domains is becoming an     tions using Matrix Factorization dominate the literature. Jamali et
important issue for modern recommender systems [12].                   al. [6] predicted ratings using a Matrix Factorization model that
   In most domains, the prediction task is usually viewed as a two-    incorporates social relations. Ma et al. [13] improved both Mean
dimensional problem which one needs to solve (e.g., utilizing user-    Absolute Error and Root-Mean-Square Error by incorporating so-
item interactions). But nowadays it is not enough to support mul-      cial information, using social regularization in two Matrix Factor-
tiple domains on the basis of only one common data feature. With       ization models. In general, Matrix Factorization based approaches
the arrival of the big data era, recommender systems are expected      need to be retrained, when the data changes. This tends to be time-
                                                                       consuming, especially in case of frequent data updates where it fails
                                                                       to capture user’s real-time demand. Furthermore, empirical studies
                                                                       showed that a large number of factors are needed so that Matrix
                                                                       Factorization based approaches can deal with sparse data [18].

                                                                       3.    APPROACH AND METHODS
                                                                          In this work, we are interested in finding out to what extent dif-
                                                                       ferent data features (i.e., item, social or location) can be utilized
                                                                       or even be combined for real-time recommendation. To perform
            1000    MP                                                      1000   MP                                                        1000   MP
                    CFR                                                            CFR                                                              CFR
             800    CFCL                                                     800   CFCL                                                       800   CFCL
                    CFLN                                                           CFLN                                                             CFLN
Time [ms]


                                                                Time [ms]


                                                                                                                                 Time [ms]
             600    Hyb                                                      600   Hyb                                                        600   Hyb

             400                                                             400                                                              400


             200                                                             200                                                              200


               0                                                               0                                                                0
                    16            32                 64   128                      16           32                 64      128                      16           32                 64   128
                                   Threadpool size                                               Threadpool size                                                  Threadpool size

                           (a) 1 processing node                                         (b) 2 processing nodes                                           (c) 4 processing nodes

Figure 1: Scalability experiment with five recommendation approaches (having the hybrid run the four approaches in parallel), mak-
ing 325,005 independent recommendation requests to process. The exponentially increasing request loads (simulated by threadpools
that continuously fire requests) are handled in three scenarios: (a) localy with only 1 processing node being deployed, (b) scaling the
framework with 2 distributed nodes, and (c) having 4 distributed nodes to process the incoming recommendation requests.

this task, we rely on data crawled from the virtual world of Sec-                                         commender system is able to process data updates in real-time and
ondLife 1 and perform an extensive evaluation in terms of nDCG                                            immediately consider these updates (i.e., user’s real-time interest)
and User Coverage [7] of the different content- and network-based                                         in the recommendation process without the need for recalculations.
data features. The main reason for choosing SecondLife data over                                          In [9] we also presented the first open-source recommender frame-
other sources are manifold, but mainly due to the fact that currently                                     work based on the Apache Solr search engine. But as previously
there are no other datasets available that comprise extensive item,                                       mentioned, we considered the scalability issues only on the data
social and location data of users at the same time. Building up on                                        side of a domain, and not within the framework (e.g., handling an
these results, the aim is to provide a general framework which can                                        increased request rate). For that purpose, we recently presented
(1) process streaming data online while providing real-time recom-                                        ScaR [11]. ScaR adopts the microservices architecture and was
mendations, (2) support a multi-domain environment and the cor-                                           built with the focus on providing a scalable and customizable ar-
responding data features, and (3) provide a scalable architecture to                                      chitecture suited for providing real-time recommendations to mul-
cope with increasing request loads.                                                                       tiple domains. Different domains can run (and scale) the frame-
   Recently, search engines have gained attention in the context of                                       work in isolated environments. The domain specific data features
recommender systems [14]. While the results are promising, they                                           and recommendation approaches can be dynamically customized
do not provide explanations and evaluations of how such an ap-                                            using a dedicated microservice which synchronizes the change to
proach would perform in a big data, nor in a real-time multi-domain                                       all domain-relevant nodes.
environment. As such, the aim of this work is to proof the bene-                                             To demonstrate ScaR’s scalability performance, Figure 1 reports
fits of using search engines to support different data features while                                     a runtime experiment on the Foursquare dataset with an increas-
providing real-time recommendations. One issue, however, is that                                          ing number of request loads. As described in [11], we requested
in this way scalability problems are only tackled on the data side of                                     five different recommendation approaches for 65,001 users, making
the domain. In order to truly be able to support multiple domains,                                        it 325,005 independent recommendations requests to process. We
a recommender framework is needed which can additionally (1) be                                           performed this experiment by simulating an increasing request rate
customized with domain specific models and approaches, and (2)                                            (load) to the system, having 16, 32, 64 and 128 threads simultane-
cope with an increasing request load a domain could experience.                                           ously requesting recommendations. These experiments were then
Using the already mentioned SecondLife dataset, but also a much                                           repeated three times: first having 1 local processing node and then,
larger Foursquare dataset [20], we simulate an increasing recom-                                          scaling it to 2 and 4 distributed nodes. As seen, the local deploy-
mendation request load which such a framework needs to handle.                                            ment has an exponential increase in the runtime as the load grows.
                                                                                                          Such behaviour is somewhat expected as the number of incoming
                                                                                                          recommendation requests cause a load spike and the processing
4.                 OUTCOMES                                                                               threads consequently cause to much context switching. But, as we
   In [7] we showed to what extent different data features (derived                                       deploy additional nodes, we can see a significant decrease in the
from item, social and location data) can be utilized for recom-                                           growth of the mean processing runtime when compared to the lo-
mending items, low-level and top-level categories. In our results,                                        cal deployment, which is crucial in cases when a maximal runtime
we showed that approaches which utilize social data features can                                          needs to be guarantied.
outperform the ones based on item or location features in case of
recommending items. In case of recommending categories, these
differences get substantially smaller and even change in favor to
item and location data. Moreover, our results suggests that com-
bining the data sources should result into more robust recommen-
                                                                                                          5.            PLAN AND TIMELINE
dations, especially in cases of recommendation tasks on different                                            With respect to the future research workplan, the aim is to fur-
levels of specialization (i.e., categories). In a similar fashion, we                                     ther look into feasible strategies to balance the trade-off between
also showed in [10] that location data can especially be helpful in                                       accuracy and runtime in a multi-domain environment. For a thesis
tackling cold-start users which have no interaction data whatsoever.                                      conclusion, the idea is to find out how recent the utilized history
   In [8], we proofed the benefits of using the search engine Apache                                      data and the candidate recommendations need to be (i.e., by con-
Solr 2 to provide real-time recommendations. We showed that a re-                                         sidering the exact time or a sliding window approach) in order to
                                                                                                          even better recommend user’s real-time interest. This would not
1
    http://secondlife.com/                                                                                only lead to better accuracy but also to a better performance, as
2
    http://lucene.apache.org/solr/                                                                        less data will need to be processed.
6.   REFERENCES                                                      [11] E. Lacic, M. Traub, D. Kowald, and E. Lex. Scar: Towards a
 [1] X. Amatriain. Big & personal: Data and models behind                 real-time recommender framework following the
     netflix recommendations. In Proc. of BigMine ’13.                    microservices architecture.
 [2] K. Bischoff. We love rock ’n’ roll: Analyzing and predicting    [12] Q. Liu and D. R. Karger. Kibitz: End-to-end
     friendship links in last.fm. In Proceedings of the 4th Annual        recommendation system builder. In Proceedings of the 9th
     ACM Web Science Conference, WebSci ’12, pages 47–56.                 ACM Conference on Recommender Systems. ACM, 2015.
     ACM, 2012.                                                      [13] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. In
 [3] S. Chan, T. Stone, K. P. Szeto, and K. H. Chan. Predictionio:        Proceedings of the Fourth ACM International Conference on
     a distributed machine learning server for practical software         Web Search and Data Mining, WSDM ’11, pages 287–296.
     development. In Proc. of CIKM ’13.                                   ACM, 2011.
 [4] B. Chandramouli, J. J. Levandoski, A. Eldawy, and M. F.         [14] D. Parra, P. Brusilovsky, and C. Trattner. User controllability
     Mokbel. Streamrec: A real-time recommender system. In                in an hybrid talk recommender system. In Proceedings of the
     Proceedings of the 2011 ACM SIGMOD International                     ACM 2014 International Conference on Intelligent User
     Conference on Management of Data, SIGMOD ’11, pages                  Interfaces, IUI ’14, pages 305–308. ACM, 2014.
     1243–1246, 2011.                                                [15] M. J. Pazzani and D. Billsus. Content-based recommendation
 [5] Y. Huang, B. Cui, W. Zhang, J. Jiang, and Y. Xu. Tencentrec:         systems. In The adaptive web, pages 325–341. Springer,
     Real-time stream recommendation in practice. In                      2007.
     Proceedings of the 2015 ACM SIGMOD International                [16] C. Rana and S. K. Jain. A study of the dynamic features of
     Conference on Management of Data, SIGMOD ’15, pages                  recommender systems. Artificial Intelligence Review,
     227–238, 2015.                                                       43(1):141–153, 2015.
 [6] M. Jamali and M. Ester. A matrix factorization technique        [17] R. Ronen, N. Koenigstein, E. Ziklik, M. Sitruk, R. Yaari, and
     with trust propagation for recommendation in social                  N. Haiby-Weiss. Sage: Recommender engine as a cloud
     networks. In Proceedings of the Fourth ACM Conference on             service. In Proceedings of the 7th ACM Conference on
     Recommender Systems, pages 135–142. ACM, 2010.                       Recommender Systems, RecSys ’13, pages 475–476, 2013.
 [7] E. Lacic, D. Kowald, L. Eberhard, C. Trattner, D. Parra, and    [18] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix
     L. Marinho. Utilizing online social network and                      factorization using markov chain monte carlo. In
     location-based data to recommend products and categories in          Proceedings of the 25th International Conference on
     online marketplaces. In Mining, Modeling, and                        Machine Learning, ICML ’08, pages 880–887. ACM, 2008.
     Recommending ’Things’ in Social Media, pages 96–115.            [19] M. Sarwat, J. Avery, and M. F. Mokbel. Recdb in action:
     Springer, 2015.                                                      Recommendation made easy in relational databases. Proc.
 [8] E. Lacic, D. Kowald, D. Parra, M. Kahr, and C. Trattner.             VLDB Endow., 6(12):1242–1245, Aug. 2013.
     Towards a scalable social recommender engine for online         [20] M. Sarwat, J. J. Levandoski, A. Eldawy, and M. F. Mokbel.
     marketplaces: The case of apache solr. In Proceedings of the         Lars*: An efficient and scalable location-aware
     Companion Publication of the 23rd International Conference           recommender system. IEEE Trans. on Knowl. and Data
     on World Wide Web Companion, WWW Companion ’14,                      Eng., 26(6):1384–1399, June 2014.
     pages 817–822. International World Wide Web Conferences         [21] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. The
     Steering Committee, 2014.                                            adaptive web. chapter Collaborative Filtering Recommender
 [9] E. Lacic, D. Kowald, and C. Trattner. Socrecm: A scalable            Systems, pages 291–324. Springer-Verlag, 2007.
     social recommender engine for online marketplaces. In           [22] G. Shani and A. Gunawardana. Evaluating recommendation
     Proceedings of the 25th ACM Conference on Hypertext and              systems. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,
     Social Media, HT ’14, pages 308–310, 2014.                           editors, Recommender Systems Handbook, pages 257–297.
[10] E. Lacic, D. Kowald, M. Traub, G. Luzhnica, J. Simon, and            Springer US, 2011.
     E. Lex. Tackling cold-start users in recommender systems        [23] S. G. Walunj and K. Sadafale. An online recommendation
     with indoor positioning systems.                                     system for e-commerce based on apache mahout framework.
                                                                          In Proceedings of the 2013 annual conference on Computers
                                                                          and people research, pages 153–158. ACM, 2013.