RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 27 A Framework for Recommender Systems Based on a Finite Multidimensional Model Space Leonhard Seyfang Julia Neidhardt Research Unit of E-Commerce, TU Wien Research Unit of E-Commerce, TU Wien Vienna, Austria Vienna, Austria seyfang@ec.tuwien.ac.at neidhardt@ec.tuwien.ac.at ABSTRACT 2 CORE CONCEPTS In this conceptual paper we suggest a framework for flexible and In this section we introduce the essential concepts in theory. Prac- efficient recommender systems. It is based on an unified finite tical aspects will be treated in section 3 and 4. multivariate model space for both user and products. Association functions map each entity to each model-dimension fuzzily. Finally 2.1 Model Space distance- and learning-operations allow efficient operation. The In this framework we use a multidimensional, finite model space. main differences to existing approaches are the reduced model All entities, users, products, or whatever abstract or actual items space and the fuzzy location of entities. The reduced model space is are of interest, are fuzzy–located in the very same model space. most advantageous where item features are inconsistent structured In most cases the number and interpretation of the dimensions or sparse. The association function allows to express a distribution will be defined domain specific. This can be done through domain- of agreement, not just a single location. knowledge or by dimension reduction techniques such as factor analysis (see [3] for a related approach). The latter of course requires CCS CONCEPTS a suitable data corpus. For tourism seven factors have already been • Information systems → Personalization; Recommender sys- identified [5], [4]. tems; Collaborative search; Similarity measures. Alternatively a generic, user oriented data model can be used to obtain a cross-domain recommender system. For example the Big Five personality traits [2] could be used straightforward as dimen- KEYWORDS sions. For a comprehensive work on cross-domain recommenda- recommendation, personalization, feature based recommendation, tions see [1], and for thoughts on personality and recommender similarity measurement, fuzzy mapping systems see [6]. 2.2 Association Function 1 INTRODUCTION Association functions express the degree of accordance between en- Tourism is for many reasons an interesting and challenging field for tities and model-dimensions. They are most comparable to member- recommender systems: Travel experiences are complex and include ship functions in fuzzy logic but should not be confused with prob- various physical and mental aspects. Decisions are mainly based ability density functions. Dimensions are treated independently, so on subconscious, abstract ideas and emotions attached to them. At each entity has a separate association function for each dimension. the same time hard constraints, like the available time frame and In our model space, we think of each dimension as closed interval budget, have to be met. Also multiple persons are usually involved between 0 and 1. We believe that placing an entity on a single point in the decision finding process. Products are very diverse, they are on each dimension is an oversimplification. Instead it should be often inconsistent and incomplete documented. More often than possible to express the spread of conformity over an adjustable not, products themselves do not satisfy the tourists need directly, range. Hence we were looking for a function that: but are prerequisites for the tourists dreams to be fulfilled. With all • Is defined on the closed interval [0, 1]; that challenges in mind, we reach for a flexible generic solution. • Takes values between 0 and 1; Generally, recommender systems aim to provide useful sugges- • Is continuous (sufficiently small changes in x result in arbi- tions to their users. They use any combination of user-, item-, and trarily small changes in f (x)); context- information. • Allows to specify location and dispersion independent of We suggest a recommendation-framework that: each other, hence takes (at least) two parameters; • Is memory-efficient (is specified by as little as possible pa- • Reduces the feature-space to few interpretable (user-related) rameters). and manageable dimensions. • Maps users and products, and other entities of interest to We found the association function defined in (equation 1) fulfilling the model space. all requirements above. • Treats the entity-dimension-relationship fuzzily.    1 if a = b = 0 • Provides a heuristic to efficiently compute distances between x a (1 − x)b  fa,b (x) =    otherwise (1) entities.  a a  a b 1−  • Provides self-learning procedures in near real-time.   a +b a +b  Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 28 f is fully specified by two real parameters a ≥ 0 and b ≥ 0. An (3) Self-learning: Entities – typically users – can learn their position in the model space based on interaction with other 1.0 entities – typically products – that already have been classi- µ = 0.2, ρ = 10 fied (see 2.4 for details). µ = 0.4, ρ = 4 0.8 The association function can also be used to retrieve item properties, 0.6 particularly after a self-learning phase. fa,b(x) 0.4 2.3 Distance 0.2 We define the distance d between two association functions as ( 0 if ρ 1 = 0 or ρ 2 = 0 0.0 d(fa1,b1 , fa2,b2 ) = (6) 1 − fa1,b1 (x) otherwise 0.0 0.2 0.4 0.6 0.8 1.0 where x is uniquely defined by the two properties (without loss of generality we assume from now on that µ 1 ≤ µ 2 ): Figure 1: Two examples of association functions. Solid line: µ = 0.2, ρ = 10, a = 2, b = 8; Dashed line: µ = 0.4, ρ = 4, a = 1.6, µ1 ≤ x ≤ µ2 (7) b = 2.4; fa1,b1 (x) = fa2,b2 (x) (8) In words: x is the place between both modes where the two associ- alternative, more human comprehensible parametrization is given by the location parameter µ ∈ [0, 1] and the precision parameter ρ ≥ 0. Both parametrizations can easily be converted into each 1.0 1.0 d other via (2), (3), (4), and (5). Examples for f are shown in figure 1. 0.8 0.8 a µ = a +b > 0 (2) 0.6 0.6 a +b d ρ = a +b (3) 0.4 0.4 a = µρ (4) 0.2 0.2 b = (1 − µ)ρ (5) 0.0 0.0 The value of fa,b (x) is in [0, 1] for all valid a, b, and x ∈ [0, 1]. If a = 0 and b = 0, f (x) is constant 1. We call f 0,0 the non-informative case. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x µ is not defined in the non-informative case and not needed either. Note: fa,b is proportional to the beta distribution Beta(a + 1, b + 1), Figure 2: In the left example the association functions are but density functions are scaled to an area of 1 while the association very dissimilar hence the distance d is large. In the right ex- function is scaled to the range of [0, 1]. Further, Beta(0.5, 0.5) is ample the association functions are somewhat similar so the called the non-informative prior in the context of Bernoulli trials distance is rather small. in Bayesian statistics. Our case f 0,0 is not intended to possess the same non-informativeness and should not be confused. Realistically ρ should not be to small since f gets increasingly ation functions intersect. d is 1 minus the value of f at x. The basic vague as ρ approaches 0. On the other hand, ρ should not be to idea of d is illustrated in figure 2. large neither as it would suggest an non-existing precision. Determining x requires numerical optimization but a good ap- There are several ways how an entity gets its association func- proximation is given by d: tions: 0   if ρ 1 = 0 or ρ 2 = 0 d(fa1,b1 , fa2,b2 ) =  (1) Per mapping-algorithm: For products, or whatever enti- fa1,b1 (x̂) + fa2,b2 (x̂) ties are considered for recommendations, mapping functions 1 −  otherwise  2 can be defined. A mapping function translates the available (9) feature description into association function. Mapping al- with gorithms can also be used related to users: in [5] users are µ 1w 1 + µ 2w 2 x̂ = (10) mapped according to pictures they have selected. Also a w1 + w2 mapping based on demographic features is possible. and (2) Manually: The graph of f can be used to set up an easy to √ w 1 = 1 + 0.4 (1 + s 1 ) ρ1 (11) use human interface. While using two sliders, one for the √ mode and one for the precision, one could alter the associa- w 2 = 1 + 0.4 (1 − s 2 ) ρ2 (12) tion function until the desired properties are reached. This where s (skewness) is defined as option is favorable if no mapping-algorithm exists. In cases √ where the recommendation is in the foreground, it might be 2(b − a) a + b + 3 s = (13) (a + b + 4) (a + 1)(b + 1) p attractive to offer a tool for user-self-classification. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 29 The closed solution for d is easy to compute and the deviation λ=0 λ=6 |d − d | is limited for a given range of ρ, e.g. |d − d | ≤ 0.039 for 1.0 1.0 the reasonable assumption 0.5 ≤ ρ ≤ 10 (without proof). Obvious 0.8 0.8 properties of d are (also without proof): 0.6 0.6 µ 1 = µ 2 ⇒ d(fa1,b1 , fa2,b2 ) = 0 (14) 0.4 0.4 d(f µ 1, ρ 1 f µ 2, ρ 2 ) < d(f µ 1, ρ 1 f µ 2 +ϵ, ρ 2 ) ϵ>0 (15) 0.2 0.2 d(f µ 1, ρ 1 f µ 2, ρ 2 ) > d(f µ 1, ρ 1 f µ 2, ρ 2 +ϵ ) ϵ > 0, µ 1 , µ 2 (16) 0.0 0.0 The overall distance D between two entities is the weighted mean of the distances of all k dimensions. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 k Figure 3: The tuning parameter λ controls the extent to Õ D= di vi (17) i=1 which the dissimilarity h diminishes a new and bnew . On the left λ = 0 and the precision parameter of the resulting func- The weights v are chosen proportional to the importance of the tion is simply the average of the precision parameters of corresponding dimension. the input functions (dashed line). On the right λ = 6 and the resulting function covers roughly the same area which 2.4 Learning Procedure is covert by the two input functions in conjunction. The learning procedure allows entities (usually users) to adopt their location in the model space according to their interaction with other entities (usually products). It is based on the merge-operation. associative: The merge-operation m translates an ordered set of association m(fa1,b1 , fa2,b2 ) = m(fa2,b2 , fa1,b1 ) (22) functions F into a single association function: m m(fa1,b1 , fa2,b2 ), fa3,b3 , m fa1,b1 , m(fa2,b2 , fa3,b3 ) (23)   m F −→ fanew,bnew (18) 3 USAGE We assume that no element of F is the non-informative function (otherwise those elements are simply removed as they do not hold A standard application works as follows: The model space (the information anyway). The cardinality of F (the number of elements number and interpretation of the dimensions) would be determined in F ) is denoted by n. The new parameter a new is defined as based on expert knowledge or dimensionality reduction methods or both. As mentioned earlier, seven factors have already been    0 if n = 0 determined for the scope of tourism [5], [4]. a 1 if n = 1 Once the model space is specified, mappings from item-descrip-   a new =  n (19) tions to the model dimensions must be implemented (see section Õ д h(F if n > 1      ) (ai wi ) 2.2).  i=1 In tourism, items are very diverse, including travel packages, and bnew is defined accordingly. hotels, flights, events, sights, natural phenomena, destination, cities, Here w is a vector of weights associated with the elements of F forms of sport and many others. Some of them are real products with ni=1 wi = 1. h is a function that represents the dissimilarity meaning bookable, other are not. The latter are still important Í of F . We currently use the mean of all pairwise distances within F for recommender systems as they serve as connection to actual for h (see equation 20) but other definition are certainly possible. products. Sometimes strong intangible aspects such as culture- dependent attributions or emotional concepts are involved. (The n−1 n  decision process might roughly be like: honeymoon + love + Europe 1 Õ Õ  h(F ) = Ín−1 Ín d(fi , f j ) wi wj (20) → city of love → Paris → hotel → room / suite, not right away to i=1 j=i+1 wi w j i=1 j=i+1 the hotel room.) The function д transforms the result of h to a reasonable shrinking Users obtain their profile in a self-learning way as they inter- factor, such as act with items (or even other users). Depending on the particular λ domain and application, interactions can include book-, buy-, like- д = 1 − h (F ) (21) , rate-, comment-, view-, listen-to-, read-, search-, compare-, and where λ ≥ 0 is a tuning parameter. For larger lambdas the penalty other actions. Using the learning procedure from section 2.4, defined for the dissimilarity increases. If λ = 0 there is no shrinking at all. interactions modify the users profile towards the items interacted In this case a new and bnew are simply the weighted averages of the with. To define relevant interactions can be straightforward in some input-parameters (figure 3, left side). With a sufficient shrinkage cases and sophisticated in others. factor on the other hand, m acts more like an union operation (figure The initial association functions might be: the non-informative 3, right side). Note that shrinking refers to a and b and consequently association function, the (dimensionwise) grand mean, the contex- to the precision ρ whereas the spread of f works in the opposite tual a priori association function (for example based on known or direction. The merge-operation is commutative but generally not estimated demographic characteristics). Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 30 The recommendation service itself calculates distances between Finally the sailboat cruise is exiting at times (although not as users and products, sorts the results, and holds a list of most ap- thrilling as skydiving), and the old Mediterranean cities also provide propriate items ready. Computations can be done on demand or the opportunity to get in touch with old cultures. (figure 4, bottom in advance. Filters might be implemented additionally to meet the row). users constraints. Implementing stochastic components can increase serendipity Rich and diversity but destruct predictability and reproducibility. City trip to Rome 4 WORKED EXAMPLE Sailboat Cruise Culture d = 0.35 d = 0.36 Skydiving holiday ● User Little Skydiving holiday Excitement Relaxation Action Figure 5: All item-locations in the R 2 . The dotted lines are d = 0.37 drawn approximately at f = 0.75 to indicate the spread along City trip to Rome d = 0.54 both dimensions. In this toy example, the Mediterranean sailboat cruise would clearly be the best recommendation according to our measurement D (see equation 17), followed by the skydiving holiday. However if we had used the location parameter µ in conjunction with the d = 0.01 Euclidean distance or the Manhattan distance, the skydiving holiday d = 0.29 would have appeared to be the closest to the user. The reason for in the Mediterranean this divergence is the different spread of associations. Sailboat Cruise In table 1 all user-item distances are presented, according to Euclidean-, Manhattan-, and D-distance. Figure 5 illustrates the locations of all items in the R 2 . Table 1: Euclidean-, Manhattan-, and D-distance for all items. Excitement Relaxation Little Rich Action Culture Item Euclidean Manhattan D Figure 4: Example with two dimensions (columns) and three Skydiving 0.35 0.50 0.36 products (rows). The filled shapes display the users prefer- Rome 0.86 1.20 0.45 ences, the dashed lines indicate the product properties. Sailboat Cruise 0.49 0.55 0.15 For a simple example we assume that we have a travel recom- mendation system with two dimensions: Action and Culture, both 5 DISCUSSION equally important meaning equally weighted. The framework presented here offers interesting possibilities as Our user is inclined towards exiting activities as long as they it is flexible, possibly cross-domain, self-learning, and the entity- are not too extreme (figure 4, left column). The user is not really dimension-memberships relation is easy to understand. It has no interested in culture (figure 4, right column). cold start problem with new items and it is not necessary to match We have three items to suggest: A skydiving holiday, a city trip an user to other similar users. It can serve as basis for multivariate to Rome, and a sailboat cruise in the Mediterranean. outlier detection and for cluster analysis. Deviations in the product- The skydiving holiday is about as exiting as it gets with virtually and user- distribution can be revealed as side effect. no cultural options. (figure 4, first row). However this approach comes with two downsides: Firstly the The city trip to Rome offers ample cultural sights but besides dimensions of the model-space must be defined in advance and are that, it’s not terribly exciting. (figure 4, second row). hard to modify in a running system. Hence setting up the model Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 31 space is the crucial task. Secondly the mapping from the original Dordrecht London, Chapter 27, 919–959. feature space to the model dimensions must be implemented. Man- [2] Oliver P. John and Sanjay Srivastava. 2008. Handbook of Personality: Theory and Research (3 ed.). The Guilford Press, New York, Chapter 4, 114–158. ual input is simple but time-consuming thus expensive with large [3] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech- quantities. The next steps will be the utilization in an operating rec- niques for recommender systems. Computer 42 (08 2009), 30–37. [4] Julia Neidhardt, Rainer Schuster, Leonhard Seyfang, and Hannes Werthner. 2014. ommender system and measuring and reporting the performance, Eliciting the Users’ Unknown Preferences. In Proceedings of the 8th ACM Conference ideally in comparison with an established system. on Recommender Systems (RecSys ’14). ACM, New York, NY, USA, 309–312. [5] Julia Neidhardt, Leonhard Seyfang, Rainer Schuster, and Hannes Werthner. 2015. A picture-based approach to recommender systems. Information Technology & REFERENCES Tourism 15-1 (2015), 49 – 69. [1] Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cremonesi. [6] Marko Tkalcic and Li Chen. 2015. Recommender Systems Handbook (2 ed.). Springer, 2015. Recommender Systems Handbook (2 ed.). Springer, New York Heidelberg New York Heidelberg Dordrecht London, Chapter 21, 715–739. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).