CCS CONCEPTS

A Framework for Recommender Systems Based on a Finite Multidimensional Model Space

Leonhard Seyfang

seyfang@ec.tuwien.ac.at 0

Julia Neidhardt

neidhardt@ec.tuwien.ac.at 0 0 Research Unit of E-Commerce, TU Wien , Vienna , Austria

2019

27 31

In this conceptual paper we suggest a framework for flexible and eficient recommender systems. It is based on an unified finite multivariate model space for both user and products. Association functions map each entity to each model-dimension fuzzily. Finally distance- and learning-operations allow eficient operation. The main diferences to existing approaches are the reduced model space and the fuzzy location of entities. The reduced model space is most advantageous where item features are inconsistent structured or sparse. The association function allows to express a distribution of agreement, not just a single location.

CCS CONCEPTS

• Information systems → Personalization; Recommender systems; Collaborative search; Similarity measures.

INTRODUCTION

Tourism is for many reasons an interesting and challenging field for recommender systems: Travel experiences are complex and include various physical and mental aspects. Decisions are mainly based on subconscious, abstract ideas and emotions attached to them. At the same time hard constraints, like the available time frame and budget, have to be met. Also multiple persons are usually involved in the decision finding process. Products are very diverse, they are often inconsistent and incomplete documented. More often than not, products themselves do not satisfy the tourists need directly, but are prerequisites for the tourists dreams to be fulfilled. With all that challenges in mind, we reach for a flexible generic solution.

Generally, recommender systems aim to provide useful suggestions to their users. They use any combination of user-, item-, and context- information.

We suggest a recommendation-framework that: • Reduces the feature-space to few interpretable (user-related) and manageable dimensions. • Maps users and products, and other entities of interest to the model space. • Treats the entity-dimension-relationship fuzzily. • Provides a heuristic to eficiently compute distances between entities. • Provides self-learning procedures in near real-time.

CORE CONCEPTS

In this section we introduce the essential concepts in theory. Practical aspects will be treated in section 3 and 4. 2.1

Model Space

In this framework we use a multidimensional, finite model space. All entities, users, products, or whatever abstract or actual items are of interest, are fuzzy–located in the very same model space. In most cases the number and interpretation of the dimensions will be defined domain specific. This can be done through domainknowledge or by dimension reduction techniques such as factor analysis (see [ 3 ] for a related approach). The latter of course requires a suitable data corpus. For tourism seven factors have already been identified [ 5 ], [ 4 ].

Alternatively a generic, user oriented data model can be used to obtain a cross-domain recommender system. For example the Big Five personality traits [ 2 ] could be used straightforward as dimensions. For a comprehensive work on cross-domain recommendations see [ 1 ], and for thoughts on personality and recommender systems see [ 6 ]. 2.2

Association Function

Association functions express the degree of accordance between entities and model-dimensions. They are most comparable to membership functions in fuzzy logic but should not be confused with probability density functions. Dimensions are treated independently, so each entity has a separate association function for each dimension.

In our model space, we think of each dimension as closed interval between 0 and 1. We believe that placing an entity on a single point on each dimension is an oversimplification. Instead it should be possible to express the spread of conformity over an adjustable range. Hence we were looking for a function that: • Is defined on the closed interval [ 0, 1 ]; • Takes values between 0 and 1; • Is continuous (suficiently small changes in x result in arbitrarily small changes in f (x )); • Allows to specify location and dispersion independent of each other, hence takes (at least) two parameters; • Is memory-eficient (is specified by as little as possible parameters).

We found the association function defined in (equation 1) fulfilling all requirements above.

1    fa,b (x ) =   a   a + b xa (1 − x )b a

a 1 − a + b b if a = b = 0 otherwise (1) fa,b(x) 1.0 0.8 0.6 0.4 0.2 0.0 f is fully specified by two real parameters a ≥ 0 and b ≥ 0. An a = µρ b = (1 − µ )ρ (4) (5) 0.0 0.2 0.4 0.6 0.8 1.0 alternative, more human comprehensible parametrization is given by the location parameter µ ∈ [ 0, 1 ] and the precision parameter ρ ≥ 0. Both parametrizations can easily be converted into each other via (2), (3), (4), and (5). Examples for f are shown in figure 1. µ = a a + b > 0 (2)

a + b ρ = a + b (3) The value of fa,b (x ) is in [ 0, 1 ] for all valid a, b, and x ∈ [ 0, 1 ]. If a = 0 and b = 0, f (x ) is constant 1. We call f0,0 the non-informative case. µ is not defined in the non-informative case and not needed either. Note: fa,b is proportional to the beta distribution Beta(a + 1, b + 1), but density functions are scaled to an area of 1 while the association function is scaled to the range of [ 0, 1 ]. Further, Beta(0.5, 0.5) is called the non-informative prior in the context of Bernoulli trials in Bayesian statistics. Our case f0,0 is not intended to possess the same non-informativeness and should not be confused.

Realistically ρ should not be to small since f gets increasingly vague as ρ approaches 0. On the other hand, ρ should not be to large neither as it would suggest an non-existing precision.

There are several ways how an entity gets its association functions: (1) Per mapping-algorithm: For products, or whatever entities are considered for recommendations, mapping functions can be defined. A mapping function translates the available feature description into association function. Mapping algorithms can also be used related to users: in [ 5 ] users are mapped according to pictures they have selected. Also a mapping based on demographic features is possible. (2) Manually: The graph of f can be used to set up an easy to use human interface. While using two sliders, one for the mode and one for the precision, one could alter the association function until the desired properties are reached. This option is favorable if no mapping-algorithm exists. In cases where the recommendation is in the foreground, it might be attractive to ofer a tool for user-self-classification. (3) Self-learning: Entities – typically users – can learn their position in the model space based on interaction with other entities – typically products – that already have been classiifed (see 2.4 for details).

The association function can also be used to retrieve item properties, particularly after a self-learning phase.

d(fa1,b1 , fa2,b2 ) =

1 − fa1,b1 (x )

2.3 Distance

We define the distance d between two association functions as ( 0 if ρ1 = 0 or ρ2 = 0 otherwise where x is uniquely defined by the two properties (without loss of generality we assume from now on that µ 1 ≤ µ 2):

µ 1 ≤ x ≤ µ 2 fa1,b1 (x ) = fa2,b2 (x ) In words: x is the place between both modes where the two associ1.0 0.8 0.6 0.4 0.2 0.0 d 1.0 0.8 0.6 0.4 0.2 0.0 d (a + b + 4)p(a + 1)(b + 1) if ρ1 = 0 or ρ2 = 0 otherwise (6) (7) (8) (9) (10) (11) (12) (13) The closed solution for d is easy to compute and the deviation |d − d | is limited for a given range of ρ, e.g. |d − d | ≤ 0.039 for the reasonable assumption 0.5 ≤ ρ ≤ 10 (without proof). Obvious properties of d are (also without proof):

µ 1 = µ 2 ⇒ d(fa1,b1 , fa2,b2 ) = 0 d(fµ 1, ρ1 fµ 2, ρ2 ) < d(fµ 1, ρ1 fµ 2+ϵ, ρ2 ) d(fµ 1, ρ1 fµ 2, ρ2 ) > d(fµ 1, ρ1 fµ 2, ρ2+ϵ ) ϵ > 0 (15) ϵ > 0, µ 1 , µ 2 The overall distance D between two entities is the weighted mean of the distances of all k dimensions.

D = k Õ divi i=1 The weights v are chosen proportional to the importance of the corresponding dimension.

2.4 Learning Procedure

The learning procedure allows entities (usually users) to adopt their location in the model space according to their interaction with other entities (usually products). It is based on the merge-operation.

The merge-operation m translates an ordered set of association functions F into a single association function:

F −→ fanew,bnew We assume that no element of F is the non-informative function (otherwise those elements are simply removed as they do not hold information anyway). The cardinality of F (the number of elements in F ) is denoted by n. The new parameter anew is defined as 0 if n = 0 anew = a1 n if n = 1 (19) д h(F ) Õ(ai wi ) if n > 1  i=1 and bnew is define d accordingly.

Here w is a vector of weights associated with the elements of F with Ín

i=1 wi = 1. h is a function that represents the dissimilarity of F . We currently use the mean of all pairwise distances within F for h (see equation 20) but other definition are certainly possible. h(F ) = 1 n−1 n Õ Õ Ín−1 Ín i=1 j=i+1 wi wj i=1 j=i+1 d(fi , fj ) wi wj (20) The function д transforms the result of h to a reasonable shrinking factor, such as д = 1 − h (F ) λ (21) where λ ≥ 0 is a tuning parameter. For larger lambdas the penalty for the dissimilarity increases. If λ = 0 there is no shrinking at all. In this case anew and bnew are simply the weighted averages of the input-parameters (figure 3, left side). With a suficient shrinkage factor on the other hand, m acts more like an union operation (figure 3, right side). Note that shrinking refers to a and b and consequently to the precision ρ whereas the spread of f works in the opposite direction. The merge-operation is commutative but generally not 1.0 0.8 0.6 0.4 0.2 0.0

3 USAGE

A standard application works as follows: The model space (the number and interpretation of the dimensions) would be determined based on expert knowledge or dimensionality reduction methods or both. As mentioned earlier, seven factors have already been determined for the scope of tourism [ 5 ], [ 4 ].

Once the model space is specified, mappings from item-descriptions to the model dimensions must be implemented (see section 2.2).

In tourism, items are very diverse, including travel packages, hotels, flights, events, sights, natural phenomena, destination, cities, forms of sport and many others. Some of them are real products meaning bookable, other are not. The latter are still important for recommender systems as they serve as connection to actual products. Sometimes strong intangible aspects such as culturedependent attributions or emotional concepts are involved. (The decision process might roughly be like: honeymoon + love + Europe → city of love → Paris → hotel → room / suite, not right away to the hotel room.)

Users obtain their profile in a self-learning way as they interact with items (or even other users). Depending on the particular domain and application, interactions can include book-, buy-, like, rate-, comment-, view-, listen-to-, read-, search-, compare-, and other actions. Using the learning procedure from section 2.4, defined interactions modify the users profile towards the items interacted with. To define relevant interactions can be straightforward in some cases and sophisticated in others.

The initial association functions might be: the non-informative association function, the (dimensionwise) grand mean, the contextual a priori association function (for example based on known or estimated demographic characteristics).

The recommendation service itself calculates distances between users and products, sorts the results, and holds a list of most appropriate items ready. Computations can be done on demand or in advance. Filters might be implemented additionally to meet the users constraints.

Implementing stochastic components can increase serendipity and diversity but destruct predictability and reproducibility. 4

WORKED EXAMPLE

For a simple example we assume that we have a travel recommendation system with two dimensions: Action and Culture, both equally important meaning equally weighted.

Our user is inclined towards exiting activities as long as they are not too extreme (figure 4, left column). The user is not really interested in culture (figure 4, right column).

We have three items to suggest: A skydiving holiday, a city trip to Rome, and a sailboat cruise in the Mediterranean.

The skydiving holiday is about as exiting as it gets with virtually no cultural options. (figure 4, first row).

The city trip to Rome ofers ample cultural sights but besides that, it’s not terribly exciting. (figure 4, second row).

y a d il o h g n ii v d y k S e m o R o t p itr y it C n ise nea

a u r r r

e tC it ao de lib M aS the n i

Finally the sailboat cruise is exiting at times (although not as thrilling as skydiving), and the old Mediterranean cities also provide the opportunity to get in touch with old cultures. (figure 4, bottom row).

Rich

e r u lt u C

City trip to Rome Sailboat Cruise

● User

Little Skydiving holiday Excitement Relaxation

Action

In this toy example, the Mediterranean sailboat cruise would clearly be the best recommendation according to our measurement D (see equation 17), followed by the skydiving holiday. However if we had used the location parameter µ in conjunction with the Euclidean distance or the Manhattan distance, the skydiving holiday would have appeared to be the closest to the user. The reason for this divergence is the diferent spread of associations.

In table 1 all user-item distances are presented, according to Euclidean-, Manhattan-, and D-distance. Figure 5 illustrates the locations of all items in the R2. The framework presented here ofers interesting possibilities as it is flexible, possibly cross-domain, self-learning, and the entitydimension-memberships relation is easy to understand. It has no cold start problem with new items and it is not necessary to match an user to other similar users. It can serve as basis for multivariate outlier detection and for cluster analysis. Deviations in the productand user- distribution can be revealed as side efect.

However this approach comes with two downsides: Firstly the dimensions of the model-space must be defined in advance and are hard to modify in a running system. Hence setting up the model space is the crucial task. Secondly the mapping from the original feature space to the model dimensions must be implemented. Manual input is simple but time-consuming thus expensive with large quantities. The next steps will be the utilization in an operating recommender system and measuring and reporting the performance, ideally in comparison with an established system.

[1]

Iván

Cantador , Ignacio Fernández-Tobías,

Shlomo

Berkovsky , and

Paolo

Cremonesi . 2015 . Recommender Systems Handbook (2 ed .). Springer, New York Heidelberg Dordrecht London, Chapter 27 , 919 - 959 .

[2] Oliver

P. John and Sanjay

Srivastava . 2008 . Handbook of Personality: Theory and Research (3 ed.). The Guilford Press, New York, Chapter 4 , 114 - 158 .

[3]

Yehuda

Koren , Robert Bell, and

Chris

Volinsky . 2009 . Matrix factorization techniques for recommender systems . Computer 42 (08 2009 ), 30 - 37 .

[4]

Julia

Neidhardt , Rainer Schuster, Leonhard Seyfang, and

Hannes

Werthner . 2014 . Eliciting the Users' Unknown Preferences . In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14) . ACM, New York, NY, USA, 309 - 312 .

[5]

Julia

Neidhardt , Leonhard Seyfang, Rainer Schuster, and

Hannes

Werthner . 2015 . A picture-based approach to recommender systems . Information Technology & Tourism 15-1 ( 2015 ), 49 - 69 .

[6]

Marko

Tkalcic and

Chen . 2015 . Recommender Systems Handbook (2 ed .). Springer, New York Heidelberg Dordrecht London, Chapter 21 , 715 - 739 .