=Paper= {{Paper |id=Vol-1887/paper7 |storemode=property |title=Tailoring Recommendations for a Multi-Domain Environment |pdfUrl=https://ceur-ws.org/Vol-1887/paper7.pdf |volume=Vol-1887 |authors=Emanuel Lacic,Dominik Kowald,Elisabeth Lex |dblpUrl=https://dblp.org/rec/conf/recsys/LacicKL17 }} ==Tailoring Recommendations for a Multi-Domain Environment== https://ceur-ws.org/Vol-1887/paper7.pdf
 Tailoring Recommendations for a Multi-Domain Environment
                Emanuel Lacic                                       Dominik Kowald                                       Elisabeth Lex
               Know-Center Graz                                   Know-Center Graz                                 Graz University of Technology
                  Graz, Austria                                     Graz, Austria                                          Graz, Austria
             elacic@know-center.at                             dkowald@know-center.at                                 elisabeth.lex@tugraz.at
ABSTRACT
Recommender systems are acknowledged as an essential instru-
ment to support users in finding relevant information. However,
the adaptation of recommender systems to multiple domain-specific
requirements and data models still remains an open challenge. In
the present paper, we contribute to this sparse line of research with
guidance on how to design a customizable recommender system
that accounts for multiple domains with heterogeneous data. Us-                             (a) E-commerce.                          (b) Hotels.
ing concrete showcase examples, we demonstrate how to setup a
multi-domain system on the item and system level, and we report
evaluation results for the domains of (i) LastFM, (ii) FourSquare,
and (iii) MovieLens. We believe that our findings and guidelines
can support developers and researchers of recommender systems
to easily adapt and deploy a recommender system in distributed
environments, as well as to develop and evaluate algorithms suited
for multi-domain settings.
                                                                                           (c) Scientific talks.                  (d) News articles.
KEYWORDS
Recommender Systems; Multi-Domain Recommendation; Hetero-                      Figure 1: Providing recommendations in multiple domains
geneous Data; Customizing Recommendation Approaches                            requires supporting heterogeneous data structures, allow for
                                                                               domain-specific algorithm customization, as well as to support
                                                                               service isolation and fault tolerance.
1    INTRODUCTION
In the past decade, there has been a vast amount of research in                from one or more source domains in order to generate predictions
the field of recommender systems. Most of these systems offer                  for a target domain. However, they also state that multi-domain
recommendations adapted for items belonging to a single domain                 approaches have mainly focused on the provision of cross-domain
(e.g., movies , music , news , etc.). However, supporting different            recommendations by jointly considering user preferences for items
domain-specific data models is still an open challenge, which is               in various systems.
neglected by many recommender systems and that has just been                      Other related works implicitly share this definition [2] and focus
recently taken up by the recommender systems’ research community.              on cross-domain recommendations rather than on multi-domain ones.
    The work of [2] has actually been the first attempt to define the          Thus, the main focus has been to tackle the data sparsity problem
concept of a domain in the context of recommender systems. The                 (e.g., via active transfer learning [14]) by utilizing Collaborative
authors distinguish between four different domain notations, namely            Filtering [4, 7, 10] and Content-based Filtering approaches [3, 11].
(i) attribute level, where items are of the same type (e.g., movie                In the present paper, we build upon these works and we extend
genres), (ii) type level, where items are of similar types but share           the scope of multi-domain recommender systems by introducing
some attributes (e.g., movies and tv shows), (iii) item level, where           several guidelines concerning topics such as data heterogeneity and
items are not of the same type and differ in most or all attributes            customization that should be taken into consideration. We base
(e.g., movies and books) and, (iv) system level, where items and               our findings on real-world applications (e.g., e-commerce1 , hotels2 ,
users belong to different systems (e.g., LastFM and MovieLens).                scientific talks3 or news articles4 – just to name a few) and propose a
Moreover, they distinguish between the concept of multi-domain                 practical approach on how to support muti-domain recommendations
recommendations and cross-domain recommendations. The goal of                  on both the item and system level as described by [2].
cross-domain recommendation is to utilize the knowledge derived                   To the best of our knowledge, this is the first work which ad-
                                                                               dresses the question of what design decisions should be taken into
RecSysKTL Workshop @ ACM RecSys ’17, August 27, 2017, Como, Italy
© 2017 Copyright is held by the author(s).
                                                                               consideration when building a recommender system for a multi-
                                                                               domain environment.
                                                                               1 http://www.mymanou.com/
                                                                               2 https://www.triprebel.com/
                                                                               3 http://uscn.me/rr220
                                                                               4 http://www.clef-newsreel.org/




                                                                          42
Figure 2: Proposed system architecture of a multi-domain recommendation environment showing how the various modules work
together. Each module is a standalone HTTP server, which is aware of the location (i.e., URL) of its communicating partners. In case
of an item level multi-domain scenario, the same data storage is shared between the domains and customization via recommender
profiles provides the domain-specific algorithm configuration.


2     A MULTI-DOMAIN RECOMMENDATION                                                 Correspondingly, we propose to separate a recommender system’s
      APPROACH                                                                   logic into several microservices by adopting the Microservices ar-
                                                                                 chitecture design pattern5 . An example of such an architecture is
In this section, we categorize and propose guidelines to extend the
                                                                                 shown in Figure 2. Here, five different modules take care of (i) data
notation of a multi-domain recommender system. We base our
                                                                                 handling, (ii) calculating recommendations, (iii) balancing incom-
guidelines on our previous work [5, 6, 12], which we have already
                                                                                 ing recommendation requests, (iv) domain-specific customization,
applied in live settings in various domains (e.g., e-commerce, hotels,
                                                                                 and (v) evaluating recommendations. To support horizontal scaling
conference or news as shown in Figure 1).
                                                                                 and to coordinate all deployed modules, as well as the correspond-
    Thus, we propose four issues that should be addressed when
providing multi-domain recommendation, i.e., (i) service isolation,              ing system level domain assignments, we use Apache ZooKeeper6 .
(ii) data heterogeneity, (iii) recommender customization, and (iv)               This overall approach can be extended with virtualization and con-
fault tolerance. While we focus on item and system level domain                  tainer technologies such as Docker7 or LXC8 . These lightweight re-
notations, our findings can be adapted for both the attribute and                source containers provide features such as portability, more efficient
type level by means of additional recommender customization (e.g.,               scheduling and resource management, as well as less virtualization
filtering by the item category).                                                 overhead, which are beneficial when implementing a multi-domain
                                                                                 recommender on the system level.

                                                                                 2.2      Heterogeneous Data
2.1    Service Isolation                                                         As the amount of data is doubled approximately every 40 months
When supporting multi-domain recommendations on the system                       [8], most recommender systems migrate from traditional databases
level, effective hardware utilization is crucial. Actually, as each              to distributed systems that can scale more easily and handle massive
domain has different requirements with respect to the request load,              streams of heterogeneous data. As such, a multi-domain recomm-
the hardware utilization rate can be improved by sharing the same                ender system needs to handle a diverse set of data (e.g., ratings,
hardware resources across multiple domains. Additionally, in such                views, likes, check-ins, etc.), while at the same time enabling an
a scenario, performance isolation is crucial. Specifically, a high               easy integration of new types by modifying the underlying schema.
request load in combination with possible performance-intensive                     In our case, we leverage the Apache Solr search engine and found
operations needed for one domain should not impact the performance               its schema-less mode9 to be a great fit to support multi-domain
in another domain. For example, news recommender systems usually                 recommendations on an item level, as it allows dynamic schema
have a requirement of providing session-based recommendations                    construction by indexing data without the need to edit it manually.
within 100 milliseconds and need to cope with challenging load
                                                                                 5 http://microservices.io/patterns/microservices.html
peaks during morning hours and the lunch break at working days                   6 http://zookeeper.apache.org/
[13]. Thus, in cases where the request load is too large for a particular        7 http://www.docker.com
domain, it should be possible to dynamically scale the system and                8 https://linuxcontainers.org/

handle such performance intensive load peeks.                                    9 https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode




                                                                            43
{                                                                                id: ktl mp lastfm
     ”id”: ”e12c4fb−ba85−46d5−896d−af65d1f3b48c”,                                # reference for a MP implementation
     ”item”: ”5a2bc423−15dc−47d3−8a8c−f543dc267a7c”,                             algorithm: MostPopularGeneric
     ”users listened”: [3907, 57017]
     ”users listened count”: 2                                                   # algorithm specific parameters
     ”domain”: ”LastFM”                                                          parameters:
},                                                                                 # item−level domain?
{                                                                                  domain: lastFM
     ”id”: ”8fe89bab−9631−4e87−81a0−a8dd3ffba774c”,                                # on what do I calculate the popularity ?
     ”user”: ”23861”,                                                               count fields : [ users listened count ]
     ”item”: 4105,                                                                   user action fields : [ users listened ]
     ”rating”: 1.0
     ”domain”: ”MovieLens”                                                       Listing 2: A MostPopular recommender profile for the LastFM
}                                                                                domain.

Listing 1:   Example of different schema strategies to                           id: ktl ub cf lastfm
store data in the same place for item level multi-domain                         # reference for a user−based−CF implementation
recommendations.                                                                 algorithm: GenericUBCF

For example, as shown in Listing 1, we could easily derive different             # algorithm specific parameters
                                                                                 parameters:
data structures to store and generate recommendations in the corre-
                                                                                   # item−level domain?
sponding music and movie domain. Moreover, we found that the
                                                                                    domain: lastFM
capability for horizontal scaling (i.e., creating shards and replicas) is          # How to calculate user similarity ?
extremely important when integrating new domains on an item-level                   similarity function : OVERLAP # JACCARD, COSINE, etc.
(see Figure 2) as such a strategy easily increases the amount of data               neighbourhood size: 40
that needs to be stored and processed.                                              user action fields : [ users listened ]

2.3      Customizing Recommendation Approaches                                   Listing 3: A Collaborative Filtering recommender profile for
                                                                                 the LastFM domain.
An important aspect of a multi-domain recommender system is cus-
tomization. Typically, different application domains have different              id: ktl hybrid cs lastfm
domain-specific data features, which means that, for example, a                  # reference for a hybrid implementation
music recommender could solely use implicit user interactions (e.g.,             algorithm: CrossSourceHybrid
listened songs), whereas an e-commerce recommender could use
explicit ones (e.g., ratings). On top of that, one would also need               # algorithm specific parameters
to separately determine and setup the correct algorithmic param-                 parameters:
                                                                                   # item−level domain given by combining profiles
eters for each domain. For example, in case of the Collaborative
                                                                                   profile ids : [ ktl mp lastfm , ktl ub cf lastfm ]
Filtering algorithm, the similarity function (e.g., Cosine or Jaccard
                                                                                   recommender weights: [0.1, 0.9]
similarity) and the neighbourhood size need to be determined for
each domain. For other domains, one may need to setup custom                     Listing 4: A Cross-Source Hybrid recommender profile for the
filtering criteria (e.g., recommend items that are suited for minors or          LastFM domain.
are part of a specific category). Thus, a multi-domain recommenda-
tion approach should be aware of the underlying data structures and
domain-specific parameters.                                                      2.4    Fault Tolerance
    As such, we propose to outsource the domain-specific algorithm               Deploying multiple modules in a distributed manner increases the
setup into so-called recommender profiles. In our applications, we               probability of unexpected behaviour (e.g., hardware shutdown, I/O
defined a customizer module (see Figure 2), which enables to set                 problems, software bugs, etc.). As we have proposed to use mi-
domain-specific algorithm configurations and to transfer it to mod-              croservices, it is not necessary to cope with central node failures as
ules utilized in a specific domain environment. This way, we can                 it is the case with a master-slave architecture. In case a module fails,
manage domain-specific configurations and dynamically integrate                  ZooKeeper, or any other orchestration service like Eureka or Consul,
additional domains. An example of such recommender profiles for                  should remove the faulty module from its list of “live nodes” and no
the music domain is given in Listings 2, 3 and 4.                                further requests will be redirected to it.
    Here, we define a configuration by a unique reference id, a ref-                 Thus, the module will not necessarily cause any major problems
erence to the algorithm implementation (e.g., class name) and the                as long as there is another module of the same type available. When
specific domain-relevant parameters. In case a recommender profile               experiencing a high request load, it should be possible to deploy and
is created or updated for a particular domain at runtime, the changes            register an additional module to ZooKeeper on the fly. To further
need to be propagated throughout the whole system to every domain-               improve the reliability of the system, multiple ZooKeeper instances
dependant module. As such, each module from the same domain                      can be used in a cluster in order to overcome the outage of single
will be informed about the changes and the updated profile will be               instances. In such a way, the runtime performance can be guaranteed
used as soon as a new recommendation request is received.                        for both item and system level multi-domain recommendations.




                                                                            44
3    DOMAIN EXPERIMENTS                                                                                     Approach            nDCG@10                UC
In order to demonstrate the application of our guidelines for pro-                                        MP                       .0180              100%
viding recommendations in multiple domains, we performed sev-                                               N = 20                 .1113
                                                                                                                                   .1129




                                                                                       LastFM
eral experiments on well-known datasets used in recommender sys-                                            N = 30




                                                                                                    CF
                                                                                                            N = 40                 .1135             93.70%
tems research, (i.e., LastFM10 , Foursquare11 and MovieLens20M12 ).
                                                                                                            N = 50                 .1120
The LastFM dataset consists of 359, 348 users, 268, 736 artists and                                         N = 60                 .1112
17, 559, 530 implicit user interactions that denote the listening rela-                                  Hybrid                    .1005              100%
tionship between users and artists. Foursquare provides 2, 809, 581                                       MP                       .0256              100%
ratings on a 5-star scale for different venues (i.e., restaurants) and                                      N = 20                 .0364




                                                                                       Foursquare
contains 2, 153, 471 users as well as 1, 143, 092 venues in general.                                        N = 30                 .0403




                                                                                                    CF
The MovieLens20M dataset has 138, 493 users, 27, 032 movies as                                              N = 40                 .0426             49.58%
well as 19, 999, 603 movie reviews on a 5-star rating scale with a step                                     N = 50                 .0440
size of 0.5 (i.e., a 10-star scale).                                                                        N = 60                 .0452
   We focused on providing recommendations for cold-start users                                          Hybrid                    .0339              100%
and as such, we removed all users that interacted with more than 20                                       MP                       .0658              100%
items. Next, we split the remaining data in two different sets (i.e.,                                       N = 20                 .0910




                                                                                       MovieLens
training and test sets) using a method similar to the one described                                         N = 30                 .0945




                                                                                                    CF
in [9]. Thus, for each user, we withheld 10 items that were used                                            N = 40                 .0981              100%
                                                                                                            N = 50                 .0965
for testing and the rest was used for training. This has resulted
                                                                                                            N = 60                 .0984
in an evaluation set of 2, 409 users for LastFM, 41, 628 users for                                       Hybrid                    .0999              100%
FourSquare and 4, 486 users for the MovieLens20M dataset. On
these datasets, we evaluated a simple MostPopular (MP) approach                 Table 1: Evaluation results of our multi-domain experiment.
(e.g., Listing 2), a user-based Collaborative Filtering (CF) approach             For future work, we plan to perform a more elaborate study using
(e.g., Listing 3) as well as a hybrid combination [1] of both (e.g.,           the proposed recommender profiles to study differences in domain-
Listing 4). Additionally, we evaluated different neighborhood sizes            specific configurations (e.g., does a semantic relationship impact
N for the CF approach in order to optimize the results. For reasons            the choice of domain-specific parameters like the similarity metric
of simplicity, we used a naive item overlap metric to measure the              or different filtering criteria?). Moreover, we plan to investigate
similarity between users (i.e., OV (ut ,uc ) = |∆(ut ) ∩ ∆(uc )|, where        datasets with textual contents in order to further explore how hybrid
∆(u) corresponds to the set of items some user u has interacted with           weightings may impact the performance in a specific domain with
in the past). However, as shown in Listing 3 this is a domain-specific         respect of not only accuracy but also diversity.
parameter that could be easily adapted.
   The results of our evaluation are shown in Table 1 by means of the          Acknowledgment. This work was funded by the Horizon 2020
nDCG@10 and User Coverage (UC) metrics. The aim of this simple                 project MoreGrasp (643955).
experiment was to show how different parameter setups can impact
the performance in different domains. As shown, the neighbourhood              REFERENCES
size is one example of a parameter that needs to be optimized for a             [1] S. Bostandjiev, J. O’Donovan, and T. Höllerer. Tasteweights: a visual interactive
                                                                                     hybrid recommender system. In Proc., RecSys ’12, pages 35–42. ACM, 2012.
specific domain. By choosing the best parameter combination for                 [2] I. Cantador, I. Fernández-Tobı́as, S. Berkovsky, and P. Cremonesi. Cross-domain
the hybrid approach, we can provide more robust recommendations                      recommender systems. In Recommender Systems Handbook. Springer, 2015.
                                                                                [3] A. M. Elkahky, Y. Song, and X. He. A multi-view deep learning approach for
for all users in each domain.                                                        cross domain user modeling in recommendation systems. In Proc. WWW’15.
                                                                                [4] S. Gao, H. Luo, D. Chen, S. Li, P. Gallinari, and J. Guo. Cross-domain recom-
                                                                                     mendation via cluster-level latent factor model. In Proc. of ECML-PKDD’13.
4    CONCLUSION                                                                 [5] E. Lacic, D. Kowald, and C. Trattner. Socrecm: A scalable social recommender
In this work, we presented our approach on providing recommen-                       engine for online marketplaces. In Proc. of the ACM Hypertext 2014.
                                                                                [6] E. Lacic, M. Traub, D. Kowald, and E. Lex. Scar: Towards a real-time recomm-
dations in a multi-domain environment. Specifically, we introduced                   ender framework following the microservices architecture. In Proc. of LSRS’15.
the concept of recommender profiles in order to customize existing              [7] B. Loni, Y. Shi, M. Larson, and A. Hanjalic. Cross-domain collaborative filtering
algorithms with domain-specific configuration. Apart from that, we                   with factorization machines. In ECIR, pages 656–661. Springer, 2014.
                                                                                [8] A. McAfee and E. Brynjolfsson. Big Data: The management revolution. Harvard
provided guidelines with respect to service isolation, heterogeneous                 Business Review, 90(10):60–68, 2012.
data and fault tolerance. Finally, we provided customization exam-              [9] D. Parra-Santander and P. Brusilovsky. Improving collaborative filtering in social
                                                                                     tagging systems for the recommendation of scientific articles. In Proc. of WI-IAT
ples as well as evaluation results for the domains of (i) LastFM, (ii)              ’10, pages 136–142. IEEE Computer Society.
FourSquare, and (iii) MovieLens. We believe that our findings and              [10] S. Sahebi and P. Brusilovsky. It takes two to tango: An exploration of domain
proposed guidelines are of use for developers and researchers of                     pairs for cross-domain collaborative filtering. In Proc. of ACM RecSys 2015.
                                                                               [11] S. Sahebi and T. Walker. Content-based cross-domain recommendations using
recommender systems to tailor and develop recommendations for                        segmented models. In CBRecSys@ RecSys, pages 57–64, 2014.
multi-domain and distributed environments.                                     [12] M. Traub, D. Kowald, E. Lacic, P. Schoen, G. Supp, and E. Lex. Smart booking
                                                                                     without looking: Providing hotel recommendations in the triprebel portal. In Proc.
                                                                                     of i-KNOW ’15.
                                                                               [13] S. Werner and A. Lommatzsch. Optimizing and evaluating stream-based news
10 http://mtg.upf.edu/node/1671
                                                                                     recommendation algorithms. In CLEF (Working Notes), pages 813–824, 2014.
11 https://archive.org/details/201309 foursquare dataset umn
                                                                               [14] L. Zhao, S. J. Pan, E. W. Xiang, E. Zhong, Z. Lu, and Q. Yang. Active transfer
12 https://grouplens.org/datasets/movielens/20m/                                     learning for cross-system recommendation. In AAAI, 2013.




                                                                          45