HOPE (High Order Profile Expansion) for the new user problem
                on recommender systems
                              Diego Fernández                                                             Vreixo Formoso
           CITIC Research Center, University of A Coruña                                 CITIC Research Center, University of A Coruña
                         A Coruña, Spain                                                               A Coruña, Spain
                     diego.fernandez@udc.es                                                           vformoso@udc.es

                                Fidel Cacheda                                                             Victor Carneiro
          CITIC Research Center, University of A Coruña                                  CITIC Research Center, University of A Coruña
                         A Coruña, Spain                                                               A Coruña, Spain
                      fidel.cacheda@udc.es                                                          victor.carneiro@udc.es
ABSTRACT                                                                             in order to improve recommendations when facing the new user
The new user problem is well-known in recommender systems and                        problem.
refers to the difficulty to provide accurate recommendations to a                       In our experiments, we will focus on Item Global (IG) and User
user who has just arrived to the system or who has provided very                     Local (UL) PE alternatives because they represent the best PE tech-
few ratings. Profile Expansion techniques intend to increase the size                niques [2]. IG expand the user profile by searching in the system
of the user profile by obtaining information about user preferences                  for the most similar items to those rated by the user, while UL only
in distinct ways and have already been used to expand the informa-                   use the neighborhood computed for the main algorithm as a source
tion provided to recommendation algorithms and improve results.                      of information to find new items. For the UL technique, different
In this work, discussed in more detail at [1], we present the High Or-               approaches are considered: most rated (MR) selects the most rated
der Profile Expansion techniques, which combine in different ways                    items in the neighborhood, local item neighbors (LN) uses the most
the Profile Expansion methods. The results show 110% improve-                        similar items to those in the user profile and user-local clustering
ment in precision over the algorithm without Profile Expansion,                      (LC) attempts to find similarities between items but taking into ac-
and 10% improvement over Profile Expansion techniques.                               count only the ratings given by the local user neighborhood. These
                                                                                     techniques will be combined according to two different proposals:
CCS CONCEPTS                                                                         serial and parallel.
                                                                                        In a serial process PE techniques are used one after the other. The
• Information systems → Information retrieval; Recommender
                                                                                     kNN algorithm will use the doubly expanded profile to compute
systems.
                                                                                     the recommendation list.
                                                                                        The parallel approach attempts to minimize errors in the item
KEYWORDS                                                                             selection and rating prediction, being the initial user profile the
collaborative filtering, new user problem, profile expansion                         input for all the expansion algorithms used. Since every expansion
                                                                                     algorithm will obtain a new profile and the kNN technique needs
1    INTRODUCTION                                                                    a unique profile as input, it is necessary to combine the different
In this work we explore the Profile Expansion (PE) techniques [2],                   profiles. Regarding the item selection, two alternatives have been
which were proposed to deal with the new user problem without                        considered: union and intersection.
involving the user. We propose a new set of methods called High                         As far as rating selection is concerned, when an item appears in
Order Profile Expansion (HOPE) techniques, since instead of just                     more than one expanded profile, the final rating will be the mean
using one PE technique, two or more techniques can be combined                       of the ratings.
so that new and diverse ratings are added to user profiles with few
ratings, which is discussed in more detail at [1]. In this work we                   3   EXPERIMENTS
show how the results obtained by PE techniques are improved by                       The experiments have been performed in a reduced version of the
combining different PE techniques.                                                   Netflix dataset, consisting of 8,362 items and 478,458 users, who
                                                                                     have done 48,715,350 ratings. Regarding the methodology, we have
2    HIGH ORDER PROFILE EXPANSION                                                    randomly selected 1000 users for evaluation purposes. The new user
The idea of HOPE is to combine profile expansion techniques to                       problem has been simulated using a Given-N strategy.We focus on
take advantage of their benefits and minimize their errors. Using                    N = 2, since it is when the information is more scarce and HOPE
more than one algorithm to expand the user profile, the expansion                    techniques can be more useful. Moreover, we study the evolution
will be more heterogeneous and, at the same time, still related to                   of the algorithms according to N . From the remaining users, we
the initial user profile. The aim is also to expand the user profile                 have randomly chosen 90% of ratings for the training subset.
                                                                                        Regarding profile expansion size variations, we have used dif-
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Com-   ferent l values (2, 5, 10 and 15) to check how the l combinations
mons License Attribution 4.0 International (CC BY 4.0)."                             affect the results. Moreover, it is also interesting to know how the
                                                                                                                                               Fernández et al.


distinct algorithms evolve according to N , that is, according to the    Table 2: P@5 and MAP for High Order Profile Expansion par-
amount of information available. As baselines, the MR algorithm          allel alternatives. Significant improvements over baselines
and the algorithm without PE are considered.                             are highlighted.
                                                                                                  Algorithms           N =2                 N = 10
3.1    Serial                                                                Combinations          1      2          P@5 MAP              P@5 MAP
In serial combinations, two Profile Expansion techniques are used,                                          LC       0.128     0.029      0.148     0.038
one after the other. Best results were obtained with l = 2 and Table                               IG       LN       0.139     0.028      0.141     0.039
1 shows P@5 and MAP values.                                                                                 MR       0.133     0.027      0.143     0.038
                                                                              Intersection                  LN       0.146     0.029      0.141     0.038
Table 1: P@5 and MAP for High Order Profile Expansion se-                                          LC       MR       0.143     0.029      0.152     0.036
rial combinations, with l = 2. Significant improvements over                                       LN       MR       0.147     0.028      0.141     0.038
baselines are highlighted.
                                                                                                            LC       0.142     0.031      0.140     0.041
           Algorithms       N =2             N = 10                                                IG       LN       0.134     0.030      0.125     0.040
            1      2      P@5 MAP          P@5 MAP                                                          MR       0.144     0.030      0.148     0.040
                                                                                 Union                      LN       0.137     0.025      0.147     0.034
                  LC      0.153   0.026    0.146   0.034                                           LC
                  LN      0.147   0.026    0.148   0.036                                                    MR       0.144     0.027      0.152     0.034
           LC     MR      0.156   0.025    0.153   0.036                                          LN        MR       0.138     0.024      0.148     0.032
                  IG      0.149   0.032    0.147   0.040                                          LC         -       0.140     0.030      0.140     0.036
                   -      0.140   0.030    0.140   0.036                                          LN         -       0.139     0.029      0.141     0.039
                                                                               Simple PE          MR         -       0.147     0.029      0.139     0.035
                  LC      0.154   0.027    0.151   0.035
                  LN      0.142   0.024    0.140   0.037                                          IG         -       0.122     0.035      0.118     0.045
           LN     MR      0.158   0.025    0.151   0.033                         No PE             -         -       0.078     0.029      0.140     0.036
                  IG      0.156   0.032    0.127   0.041
                   -      0.139   0.029    0.141   0.039
                  LC      0.163    0.026   0.153    0.034                   On the other hand, union variation puts together the expansions
                  LN      0.152    0.025   0.146    0.032                obtained with the different algorithms. We have used l = 2 as profile
           MR     MR      0.155    0.024   0.151    0.034                expansion size. As shown in Table 2 the results do not improve those
                  IG      0.160    0.030   0.137    0.039                obtained with the serial alternative either. In fact, no combination
                   -      0.147    0.029   0.139    0.035
                                                                         gets a P@5 value better than MR baseline algorithm.
                  LC      0.139   0.030    0.144   0.040                    So, despite the fact that with the parallel alternatives the second
                  LN      0.137   0.031    0.128   0.041
                  MR      0.145   0.029    0.143   0.041                 expansion algorithm is not affected by the errors committed by the
           IG
                  IG      0.133   0.035    0.129   0.045                 first one, the serial results are better because the second algorithm
                   -      0.122   0.035    0.118   0.045                 is provided with more information.
             No PE        0.078    0.029   0.140    0.036
                                                                         4    CONCLUSIONS
                                                                         In this work, we present the HOPE techniques, in particular, serial
   Regarding the results, the LN algorithm works better when it is       and parallel alternatives. The experiments have shown how the
used as the first algorithm. Moreover, the IG algorithm performs         serial alternative performs better than the parallel one. The reason
better when it is the second algorithm, significantly improving the      is that in the serial alternative an algorithm is supplied with the
MAP values with respect to the baselines. The MR and LC algo-            information obtained by another algorithm. That does not happen
rithms depend on which algorithm is combined with them. These            with the parallel alternative, where the algorithms use the same
results make sense, because when the information about a user is         information. As future work, we plan to combine more than two
scarce, global information is more prone to errors. However, when        algorithms and use some other merging techniques for the parallel
working with local information, although there is less information       method.
available, it tends to be related to the user profile.
                                                                         ACKNOWLEDGMENTS
3.2    Parallel
                                                                         This research was supported by the by the Xunta de Galicia (Centro
Table 2 shows P@5 and MAP results for union and intersection             singular de investigacion de Galicia accreditation ED431G/01 2016-
variants. Note that intersection alternative may lead to profiles with   2019) and the European Union (European Regional Development
few items in common and therefore, profile expansion sizes have          Fund - ERDF).
been increased with respect to the serial alternative, so that it is
more probable that two different techniques have items in common.        REFERENCES
   While l = 5 is enough for obtaining good results when only UL         [1] Diego Fernández, Vreixo Formoso, Fidel Cacheda, and Victor Carneiro. 2019. High
algorithms are considered, a bigger profile value is required for IG         Order Profile Expansion to tackle the new user problem on recommender systems.
techniques (l = 100). From Table 2, we can observe that MAP values           PloS one 14, 11 (2019).
                                                                         [2] Vreixo Formoso, Diego Fernández, Fidel Cacheda, and Victor Carneiro. 2013.
are quite good, but P@5 results for the intersection variation do            Using profile expansion techniques to alleviate the new user problem. Inf. Process.
not improve those obtained with the serial alternative.                      Manage. 49, 3 (May 2013), 659–672. https://doi.org/10.1016/j.ipm.2012.07.005