HOPE (High Order Profile Expansion) for the new user problem on recommender systems Diego Fernández Vreixo Formoso CITIC Research Center, University of A Coruña CITIC Research Center, University of A Coruña A Coruña, Spain A Coruña, Spain diego.fernandez@udc.es vformoso@udc.es Fidel Cacheda Victor Carneiro CITIC Research Center, University of A Coruña CITIC Research Center, University of A Coruña A Coruña, Spain A Coruña, Spain fidel.cacheda@udc.es victor.carneiro@udc.es ABSTRACT in order to improve recommendations when facing the new user The new user problem is well-known in recommender systems and problem. refers to the difficulty to provide accurate recommendations to a In our experiments, we will focus on Item Global (IG) and User user who has just arrived to the system or who has provided very Local (UL) PE alternatives because they represent the best PE tech- few ratings. Profile Expansion techniques intend to increase the size niques [2]. IG expand the user profile by searching in the system of the user profile by obtaining information about user preferences for the most similar items to those rated by the user, while UL only in distinct ways and have already been used to expand the informa- use the neighborhood computed for the main algorithm as a source tion provided to recommendation algorithms and improve results. of information to find new items. For the UL technique, different In this work, discussed in more detail at [1], we present the High Or- approaches are considered: most rated (MR) selects the most rated der Profile Expansion techniques, which combine in different ways items in the neighborhood, local item neighbors (LN) uses the most the Profile Expansion methods. The results show 110% improve- similar items to those in the user profile and user-local clustering ment in precision over the algorithm without Profile Expansion, (LC) attempts to find similarities between items but taking into ac- and 10% improvement over Profile Expansion techniques. count only the ratings given by the local user neighborhood. These techniques will be combined according to two different proposals: CCS CONCEPTS serial and parallel. In a serial process PE techniques are used one after the other. The • Information systems → Information retrieval; Recommender kNN algorithm will use the doubly expanded profile to compute systems. the recommendation list. The parallel approach attempts to minimize errors in the item KEYWORDS selection and rating prediction, being the initial user profile the collaborative filtering, new user problem, profile expansion input for all the expansion algorithms used. Since every expansion algorithm will obtain a new profile and the kNN technique needs 1 INTRODUCTION a unique profile as input, it is necessary to combine the different In this work we explore the Profile Expansion (PE) techniques [2], profiles. Regarding the item selection, two alternatives have been which were proposed to deal with the new user problem without considered: union and intersection. involving the user. We propose a new set of methods called High As far as rating selection is concerned, when an item appears in Order Profile Expansion (HOPE) techniques, since instead of just more than one expanded profile, the final rating will be the mean using one PE technique, two or more techniques can be combined of the ratings. so that new and diverse ratings are added to user profiles with few ratings, which is discussed in more detail at [1]. In this work we 3 EXPERIMENTS show how the results obtained by PE techniques are improved by The experiments have been performed in a reduced version of the combining different PE techniques. Netflix dataset, consisting of 8,362 items and 478,458 users, who have done 48,715,350 ratings. Regarding the methodology, we have 2 HIGH ORDER PROFILE EXPANSION randomly selected 1000 users for evaluation purposes. The new user The idea of HOPE is to combine profile expansion techniques to problem has been simulated using a Given-N strategy.We focus on take advantage of their benefits and minimize their errors. Using N = 2, since it is when the information is more scarce and HOPE more than one algorithm to expand the user profile, the expansion techniques can be more useful. Moreover, we study the evolution will be more heterogeneous and, at the same time, still related to of the algorithms according to N . From the remaining users, we the initial user profile. The aim is also to expand the user profile have randomly chosen 90% of ratings for the training subset. Regarding profile expansion size variations, we have used dif- "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- ferent l values (2, 5, 10 and 15) to check how the l combinations mons License Attribution 4.0 International (CC BY 4.0)." affect the results. Moreover, it is also interesting to know how the Fernández et al. distinct algorithms evolve according to N , that is, according to the Table 2: P@5 and MAP for High Order Profile Expansion par- amount of information available. As baselines, the MR algorithm allel alternatives. Significant improvements over baselines and the algorithm without PE are considered. are highlighted. Algorithms N =2 N = 10 3.1 Serial Combinations 1 2 P@5 MAP P@5 MAP In serial combinations, two Profile Expansion techniques are used, LC 0.128 0.029 0.148 0.038 one after the other. Best results were obtained with l = 2 and Table IG LN 0.139 0.028 0.141 0.039 1 shows P@5 and MAP values. MR 0.133 0.027 0.143 0.038 Intersection LN 0.146 0.029 0.141 0.038 Table 1: P@5 and MAP for High Order Profile Expansion se- LC MR 0.143 0.029 0.152 0.036 rial combinations, with l = 2. Significant improvements over LN MR 0.147 0.028 0.141 0.038 baselines are highlighted. LC 0.142 0.031 0.140 0.041 Algorithms N =2 N = 10 IG LN 0.134 0.030 0.125 0.040 1 2 P@5 MAP P@5 MAP MR 0.144 0.030 0.148 0.040 Union LN 0.137 0.025 0.147 0.034 LC 0.153 0.026 0.146 0.034 LC LN 0.147 0.026 0.148 0.036 MR 0.144 0.027 0.152 0.034 LC MR 0.156 0.025 0.153 0.036 LN MR 0.138 0.024 0.148 0.032 IG 0.149 0.032 0.147 0.040 LC - 0.140 0.030 0.140 0.036 - 0.140 0.030 0.140 0.036 LN - 0.139 0.029 0.141 0.039 Simple PE MR - 0.147 0.029 0.139 0.035 LC 0.154 0.027 0.151 0.035 LN 0.142 0.024 0.140 0.037 IG - 0.122 0.035 0.118 0.045 LN MR 0.158 0.025 0.151 0.033 No PE - - 0.078 0.029 0.140 0.036 IG 0.156 0.032 0.127 0.041 - 0.139 0.029 0.141 0.039 LC 0.163 0.026 0.153 0.034 On the other hand, union variation puts together the expansions LN 0.152 0.025 0.146 0.032 obtained with the different algorithms. We have used l = 2 as profile MR MR 0.155 0.024 0.151 0.034 expansion size. As shown in Table 2 the results do not improve those IG 0.160 0.030 0.137 0.039 obtained with the serial alternative either. In fact, no combination - 0.147 0.029 0.139 0.035 gets a P@5 value better than MR baseline algorithm. LC 0.139 0.030 0.144 0.040 So, despite the fact that with the parallel alternatives the second LN 0.137 0.031 0.128 0.041 MR 0.145 0.029 0.143 0.041 expansion algorithm is not affected by the errors committed by the IG IG 0.133 0.035 0.129 0.045 first one, the serial results are better because the second algorithm - 0.122 0.035 0.118 0.045 is provided with more information. No PE 0.078 0.029 0.140 0.036 4 CONCLUSIONS In this work, we present the HOPE techniques, in particular, serial Regarding the results, the LN algorithm works better when it is and parallel alternatives. The experiments have shown how the used as the first algorithm. Moreover, the IG algorithm performs serial alternative performs better than the parallel one. The reason better when it is the second algorithm, significantly improving the is that in the serial alternative an algorithm is supplied with the MAP values with respect to the baselines. The MR and LC algo- information obtained by another algorithm. That does not happen rithms depend on which algorithm is combined with them. These with the parallel alternative, where the algorithms use the same results make sense, because when the information about a user is information. As future work, we plan to combine more than two scarce, global information is more prone to errors. However, when algorithms and use some other merging techniques for the parallel working with local information, although there is less information method. available, it tends to be related to the user profile. ACKNOWLEDGMENTS 3.2 Parallel This research was supported by the by the Xunta de Galicia (Centro Table 2 shows P@5 and MAP results for union and intersection singular de investigacion de Galicia accreditation ED431G/01 2016- variants. Note that intersection alternative may lead to profiles with 2019) and the European Union (European Regional Development few items in common and therefore, profile expansion sizes have Fund - ERDF). been increased with respect to the serial alternative, so that it is more probable that two different techniques have items in common. REFERENCES While l = 5 is enough for obtaining good results when only UL [1] Diego Fernández, Vreixo Formoso, Fidel Cacheda, and Victor Carneiro. 2019. High algorithms are considered, a bigger profile value is required for IG Order Profile Expansion to tackle the new user problem on recommender systems. techniques (l = 100). From Table 2, we can observe that MAP values PloS one 14, 11 (2019). [2] Vreixo Formoso, Diego Fernández, Fidel Cacheda, and Victor Carneiro. 2013. are quite good, but P@5 results for the intersection variation do Using profile expansion techniques to alleviate the new user problem. Inf. Process. not improve those obtained with the serial alternative. Manage. 49, 3 (May 2013), 659–672. https://doi.org/10.1016/j.ipm.2012.07.005