Introduction

Moving From Item Rating to Features Relevance in Top-N Recommendation

Vito Walter Anelli

Tommaso Di Noia

Eugenio Di Sciascio

Pasquale Lops

pasquale.lops@uniba.it 1

Joseph Trotta

joseph.trottag@poliba.it 0 0 Polytechnic University of Bari , Via E. Orabona, 4, Bai , Italy 1 University of Bari \Aldo Moro" , Via E. Orabona, 4, Bai , Italy

Although very e ective in computing accurate recommendations, due to their inner nature, collaborative algorithms work very well with dense matrices but show their limits when they deal with sparse ones. In these cases, using only past ratings may lead to unsatisfactory results in the recommendation list. In this paper we show how to move from a user-item to a user-feature matrix by exploiting original user ratings. We then use matrix factorization techniques to compute recommendations.

Introduction

Matrix factorization techniques have proven their e ectiveness in improving the performance of recommendation engines in a pure collaborative approach and are implemented in many industrial and commercial systems [ 2 ]. Whenever available, descriptions of the items can be used as a valuable source of information to augment the knowledge injected in and exploited by the system to compute the recommendation list of items. More recently, thanks to the Linking Open Data initiative, many structured data have become freely available to represent the content of items in di erent knowledge domains and then feed recommendation engines [ 3 ]. Several works have tried to build recommender systems by exploiting Linked Open Data (LOD) as side information for representing items, in addition to the user preferences usually collected through ratings. Properties gathered from DBpedia, the cornerstone dataset of the LOD cloud, may be used in di erent ways: (1) to de ne semantic similarity measures for providing more accurate recommendations [ 8, 4 ]; (2) to deal with problems as the limited content analysis or cold-start, e.g. by introducing new relevant features to improve item representations [ 10 ], or to cope with the increasing data sparsity [ 5 ]; (3) to provide a good balance between di erent recommendation objectives, such as An extended version of this paper has been published at [ 1 ]

IIR 2018, May 28-30, 2018, Rome, Italy. Copyright held by the author(s). accuracy and diversity [ 5 ]. In [ 7 ], for instance, e ective strategies to incorporate item features for top-N recommender systems are developed. Recently, an interesting approach called Feature Preferences Matrix Factorization (FPMF) has been proposed in [ 6 ]. FPMF incorporates user feature preferences in a matrix factorization to predict user likes. It is worth to note that the previously mentioned approaches does not rely on features coming from the Linked Open Data cloud. Features composing the description of an item, whatever the source, are not considered per se in the recommendation process but are usually exploited to evaluate the similarity between items or users. We believe that more attention should be paid to modeling the recommendation problem with a focus on recommending features rather then items. Expanding an item in its features brings with it some interesting side e ects. On the one hand, all features may represent relations that, e.g., latent factor models we are not able to look at. On the other hand, features give us a new set of explicit connections between items to be exploited with collaborative ltering algorithms. Finally, recommending items via feature recommendation may lead to an easier generation of explanations for the recommended list of items. Unfortunately, moving from items to features is not that straight as in a forest of many features, most of them may result not relevant to a user. Moreover, once we design an algorithm able to compute a recommendation list of features, we have to go back to the items space, as the ultimate goal of a recommender systems is to suggest items to a user. In this paper we present FF (for Features Factorization), a top-N recommendation algorithm originally introduced in [ 1 ] that relies on user's feature preferences and collaborative ltering information in the features space. The main goal of FF is to compute an ordered list of features preferred by the user and, starting from such list, to reassemble the relevance values of each returned feature to produce a top-N list of items to recommend. All the side information adopted by FF with reference to a speci c item i is retrieved from DBpedia in form of triples hi; p; ei. For each item in the user pro le we retrieve its features by querying DBpedia thus getting them as a set of entities e.

The remainder of the paper is structured as follows. In the next section we introduce and describe FF. We than close the paper with a section devoted to Conclusion and future works. 2

Proposed Approach

Motivation. This work aims at investigating the role of feature rating and relevance in the item rating process. The main intuition behind FF is that items can be handled as a collection of features on which the recommendation process is then performed. If we want to discover the contribution of each single feature in the evaluation, rst of all, we need to unpack each item in its composing features. Then, by combining the overall popularity of each feature in the user pro le (feature relevance) and the rating assigned to items containing that feature we may estimate the implicit rating the user is giving to that speci c feature. The second observation we based our work on, is that the relevance of an item in the user pro le cannot be entirely encoded in its ratings as the single rating represents a degree of liking about the speci c item.

Data Model. Each item in the user pro le is associated with a relevance function we denote with ui( ). Its value represents an estimation of how important is a particular item to the user u. Analogously, we have a value associated to each feature in the pro le computed via the function uf ( ) computing the relevance of the feature f in the user pro le. Actually, each feature is associated also with a rating ruf ( ) which is inferred by considering the rating of all the items containing f .

Problem Formulation. By considering the data associated to the user pro le as described in the previous section we can move from a rating matrix connecting user and items to a user-feature matrix where each value is represented by the pair h uf ( ); ruf ( )i. In other words, we may consider two user-feature matrices: the one P containing relevance values uf ( ), the other R including the inferred ratings ruf ( ).

In FF, the relevance of a feature pe is computed as its probability of belonging to the set Iu representing the items already rated by a user u. More formally we have: uf (pe) =

Pi2Iu jfhi; p; ei j hi; p; ei 2 DBpediagj jIuj

The idea behind this computation is quite straight: the more a feature is connected to the items in the user pro le, the higher its relevance for the user.

Once we have computed the relevance of all the features in the user pro le, we can move to the computation of the relevance for the items i 2 Iu. This can be computed as the normalized summation of the relevance for all the features it is composed by. In formulas, we have ui(i) =

hi;p;ei2DBpedia jfhi; p; ei j hi; p; ei 2 DBpediagj

P uf (pe) Given a feature pe, the computation of the feature rating ruf (pe) exploits both the rating and the relevance of each item i 2 Iu containing pe.

ruf (pe) =

P hi;p;ei2DBpedia rui P hi;p;ei2DBpedia

ui(i) ui(i) (1) top-N Recommendation. The pro les we built contain only the features the user met before, but usually the number of those features is dramatically smaller than the overall number of features and this results in P and R being very sparse. In order to complete the information they contain, we compute, via Biased Matrix Factorization, the missing values ^uf (pe) for P and r^uf (pe) for R. We run matrix factorization independently on P and R. ^uf (pe) and r^uf (pe) represent the predicted relevance and the predicted rating for all those features not belonging to any of the items in Iu. As the resulting matrices contain both content-based and collaborative informations (due to the matrix factorization), we refer to them as hybrid pro le.

With the hybrid pro le we can estimate a ranked list for all the remaining items within the collection. In fact, the ranking of an item in the list is computed by considering the rating of the features belonging to the item and their relevance. r^ui(i) =

X (hi;p;ei2DBpedia)^(i2Iu) uf (pe) ruf (pe)+

X (hi;p;ei2DBpedia)^(i62Iu) ^uf (pe) r^uf (pe) It is important to point out that these estimations do not correspond to an actual rating but the correct item ranking is yet preserved. In order to improve the results of the nal recommendation process, we may reduce the number of features considered while computing the nal rank based on their relevance and popularity [ 1 ]. 3

Conclusion and Future Works

In this paper we presented FF, a novel algorithm that bases on feature recommendation as an intermediate step for computing top-N items recommendation lists. The main idea behind FF is that feature relevance in a user pro le plays a key role in the selection and rating of an item in a collection. As future work, we are investigating the behavior of FF with respect to novelty and diversity of results. We are also interested in exploring the behavior of FF approach with di erent collaborative ltering algorithms, other than factorization techniques in the item-feature space and in particular with Factorization Machines [ 9 ].

1. Anelli , V.W. , Di Noia, T. , Lops , P. , Di Sciascio , E.: Feature factorization for top-n recommendation: From item rating to features relevance . In: Proceedings of the 1st Workshop on Intelligent Recommender Systems by Knowledge Transfer & Learning co-located with ACM Conference on Recommender Systems (RecSys 2017 ), Como, Italy, August 27 , 2017 . pp. 16 { 21 ( 2017 )

2. Bell , R.M. , Koren , Y. : Lessons from the net ix prize challenge . Acm Sigkdd Explorations Newsletter 9 ( 2 ), 75 { 79 ( 2007 )

Noia , T. , Mirizzi , R. , Ostuni , V.C. , Romito , D. , Zanker , M. : Linked open data to support content-based recommender systems . In: Proceedings of the 8th International Conference on Semantic Systems . pp. 1 { 8 . ACM ( 2012 )

Noia , T. , Ostuni , V.C. , Rosati , J. , Tomeo , P. , Di Sciascio , E. , Mirizzi , R. , Bartolini , C. : Building a relatedness graph from linked open data: A case study in the IT domain . Expert Syst. Appl . 44 , 354 { 366 ( 2016 ). https://doi.org/10.1016/j.eswa. 2015 . 08 .038, https://doi.org/10.1016/j.eswa. 2015 . 08 .038

5. Musto , C. , Basile , P. , Lops , P., de Gemmis, M. , Semeraro , G.: Introducing linked open data in graph-based recommender systems . Information Processing & Management 53 ( 2 ), 405 { 435 ( 2017 )

6. Nasery , M. , Braunhofer , M. , Ricci , F. : Recommendations with optimal combination of featurebased and item-based preferences . In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization , UMAP. pp. 269 { 273 ( 2016 )

7. Ning , X. , Karypis , G.: Sparse linear methods with side information for top-n recommendations . In: Sixth ACM Conference on Recommender Systems, RecSys . pp. 155 { 162 ( 2012 )

8. Piao , G. , Breslin , J.G. : Measuring semantic distance for linked open data-enabled recommender systems . In: Proceedings of the 31st Annual ACM Symposium on Applied Computing . pp. 315 { 320 ( 2016 )

9. Rendle , S. : Factorization machines . In: Proceedings of the 2010 IEEE International Conference on Data Mining . pp. 995 { 1000 . ICDM '10, IEEE Computer Society, Washington, DC, USA ( 2010 ). https://doi.org/10.1109/ICDM. 2010 . 127 , http://dx.doi.org/10.1109/ICDM. 2010 .127

10. Schmachtenberg , M. , Strufe , T. , Paulheim , H.: Enhancing a location-based recommendation system by enrichment with structured data from the web . In: 4th International Conference on Web Intelligence, Mining and Semantics WIMS . pp. 17 : 1 { 17 : 12 ( 2014 )