<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Estimate features relevance for groups of users</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano Cereda</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Cella</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Cremonesi</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In item cold-start, collaborative ltering techniques cannot be used directly since newly added items have no interactions with users. Hence, content-based ltering is usually the only viable option left. In this paper we propose a feature-based machine learning model that addresses the item cold-start problem by jointly exploiting item content features, past user preferences and interactions of similar users. The proposed solution learns a relevance of each content feature referring to a community of similar users. In our experiments, the proposed approach outperforms classical content-based ltering on an enriched version of the Net ix dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Traditional Content Based recommender systems (CBF) need to represent users
and items pro les in order to recommend similar items to those previously liked
by users. Their main advantage is the capability of recommending previously
unseen items, thus they solve the item-cold start issue. On the contrary,
Collaborative Filtering (CF) algorithms usually reach better performance on
predictions, especially with many interactions between users and items [1]. Their
downside consists of the inability of recommending items with no previous
interactions. Even if CBF approaches are able to solve the item cold-start problem,
they are a ected by at least two relevant limitations: recommended items tend
to be too similar to previously rated items (over-specialization problem) and
recommendations do not depend on preferences of similar users.</p>
      <p>Some attempts to improve recommendation quality of CBFs consists of:
Filtering methods and Embedded approaches. The former main drawback is that
they do not take into account the ratings of users, therefore ignoring if the
feature-based similarity between items is aligned with the user perception of
similarity ([2], [3]). Embedded approaches perform feature weighting during the
learning process and use its objective function to guide searching for relevant
features. Instances of this methodology are: SSLIM [5] , UFSM [4] and
Factorization Machines. The main drawback of embedded methods is the coupling
between the collaborative and content components of the model. When used on
datasets with unstructured user-generated features (e.g., tags) the noise from
the features propagate to the collaborative part, a ecting the overall prediction
quality.</p>
      <p>As a rst solution to this problem, we have developed a machine learning
algorithm whose aim is to compute global1 feature weights based on a pure item
1 in the sense that the relevance scores were shared by all the di erent users.
collaborative ltering approach. Its main objective was to embed in item features
also information regarding user interests. In this research we propose an
extension to this approach, the main contribution brought by this work is a general,
straightforward wrapper to make content-based methods rate-aware and based
on communities of similar users. Our experiments are conducted on the Net ix
dataset in a version enriched with IMDB attributes. The experiments shown
that the proposed solution outperforms classical pure content-based approaches.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Clustered Feature Weighting</title>
      <p>(1)
(2)
(3)
Our objective is to recommend items from a set I to users in a set U . Items
are described by the d-dimensional set of features F. User interactions are
collected with the RjUj jIj feedback matrix. Item features are described by the
item features matrix AjIj jF j, aij = 1 i item i has feature j. In general,
usercluster based recommender systems rely on a cluster-dependent similarity matrix
SpjIj jIj, where p denotes the considered subset of users.</p>
      <p>The predicted rate of user u, that belongs to the group pu, for item i is computed
as follows:
r^ui =</p>
      <p>Pj2Nkpu (i) ruj sipju</p>
      <p>Pj2Nkpu (i) sipju
where: sipju is a local item-item similarity derived from the user subset pu to
which the target user u belongs and Nkpu (i) is the set of k nearest neighbors
of item i according to the similarity model of cluster pu. Starting from this
model, we would recommend the items whose predicted ratings are the largest.
Feature weighting aims to derive a feature vector wpu 2 RjF j such that each
entry wlpu 2 wpu re ects the lth feature relevance for the pu subset of users. We
de ne the weighted similarity sipju between items i and j for the cluster pu as:
sipju = X wfpu aif ajf = hwpu ; ai</p>
      <p>aj i
f2F
where ai; aj 2 f0; 1gjF j are the feature vectors of items i and j respectively and
is the element-wise product. We propose to compute the feature weights by
solving the following LSQ problem for each cluster of users pu:</p>
      <p>i2I j2Infig
argmin X
w pu</p>
      <p>X
jjsicjCF
sipju jj2
More speci cally, in our experiments we have adopted LSLIMr0 [6] as local
similarity matrix ScCF and CLUTO [7]2 to derive the user subsets pu.
Since our goal is to learn a set of feature weights so that CBF similarities mimic
as close as possible CF ones, there is no need to add a regularization term,
2 this choice is based on the methodology followed in [6].
thus greatly simplifying the optimization. Experimental results con rmed this
hypothesis.</p>
      <p>When a new item is added to the catalog, we use w pu to compute its
weighted similarity w.r.t. the previously existing items. Then, it can be
recommended to users belonging to subset pu by using Equation 1. We call the
proposed approach CLFW (Clustered Least-square Features Weighting).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Evaluation</title>
      <p>Dataset. For our experiments, we used a version of the Net- Fig. 1. Dataset
ix dataset enriched with structured and unstructured at- partitioning.
tributes extracted from IMDB. This dataset has 186K users, Items
6.5k movies and 6.7M ratings in 1-5 scale. The rating data 4866 1623
is enriched with 16803 binary attributes representing various A B
kinds of meta-information on movies such as director, actor, A1 A2
genres and user-generated tags3. To investigate the new-item
scenario, we performed a 70/30 random hold-out split over
items as shown in Figure 1. The sub matrix A has then been divided by moving
30% of positive (&gt; 3) ratings into A2 and everything else in A1. A1 has then
been used to compute LSLIMr0 and therefore to t CLFW. When evaluating
the warm-start scenario we used A1 as user pro les and A2 as ground truth,
whereas for the cold-item we used the positive ratings of B as ground truth and
A1 as user pro les.</p>
      <p>Baselines. As in the previous work we have used simple unweighted cosine
similarity (Cos) and TF-IDF-weighted cosine similarity (CosIDF) as CBF baselines
to evaluate the performance of CLSFW in both scenarios.</p>
      <p>Performance Analysis In Table 1, we report the RMSE computed over predicted
rates for di erent neighborhood sizes k in the new-item scenario. The warm-start
scenario is instead represented by Table 2.</p>
      <p>We can state that in both scenarios CLFW consistently outperforms both
the baselines on RMSE at any value of k. Moreover, in the warm-start scenario,
3 the set of content features was signi cantly augmented with respect to our previous
unclustered work.
it is nearly as good as LSLIMr0. We want to also highlight that CLFW di ers
from the other CBF baselines solely in the feature weighting scheme. Therefore,
the improvement in performance must be due to a better feature weighting
discovered by our approach. By comparing the CLFW column with the regCLFW
one 4, we can observe that the regularization does not bring a performance
improvement. This is reasonable and totally in agreement with our prediction. In
fact, the data from which we are learning do not contain noise and, further more,
the number of weights that we learn does not allow to overestimate the model
complexity.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>With this research we investigated the possibility of deriving a user based
feature weighting. We have presented ongoing results of an extended approach that
solves the item cold-start issue by de ning personalized features relevance. The
ongoing development is focused in the usage of di erent personalization
methodologies and extension to other datasets. Moreover, we are interested in combining
this clustered approach with the, already developed, global one.
4 which contains the results of our algorithm when the feature weights are computed
adding an l2 regularization term to Equation 3</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Istvn</given-names>
            <surname>Pilszy</surname>
          </string-name>
          , Domonkos Tikk:
          <article-title>Recommending new movies: even a few ratings are more valuable than metadata RecSys 09</article-title>
          <source>Proceedings of the third ACM conference on Recommender systems</source>
          ,
          <volume>93</volume>
          {
          <fpage>100</fpage>
          <string-name>
            <surname>(2009) Lops</surname>
            <given-names>Pasquale</given-names>
          </string-name>
          , De Gemmis Marco , Semeraro Giovanni:
          <article-title>Content-based recommender systems: State of the art and trends Recommender systems handbook</article-title>
          ,
          <volume>73</volume>
          {
          <fpage>105</fpage>
          (
          <year>2011</year>
          )
          <article-title>Panagiotis Symeonidis</article-title>
          , Alexandros Nanopoulos, Yannis Manolopoulos:
          <article-title>Feature-Weighted User Model for Recommender</article-title>
          <source>Systems UM 07 Proceedings of the 11th international conference on User Modeling</source>
          , Springer,
          <volume>10</volume>
          .1007/978- 3-
          <fpage>540</fpage>
          -73078-1-13 97 {
          <issue>106</issue>
          (
          <year>2007</year>
          )
          <article-title>Elbadrawy Asmaa, Karypis, George: User-Speci c Feature-Based Similarity Models for Top-n Recommendation of New Items</article-title>
          ,
          <source>ACM Trans. Intell. Syst. Technol.</source>
          ,
          <volume>6</volume>
          ,
          <issue>33</issue>
          :1{
          <fpage>33</fpage>
          :
          <fpage>20</fpage>
          , (May
          <year>2015</year>
          )
          <volume>10</volume>
          .1145/2700495, ACM Ning Xia , Karypis George:
          <article-title>Sparse Linear Methods with Side Information for Top-N Recommendations</article-title>
          ,
          <source>Proceedings of the 21st International Conference on World Wide Web , WWW '12 Companion, ACM</source>
          <volume>581</volume>
          {
          <fpage>582</fpage>
          , (
          <year>2012</year>
          )
          <article-title>Christakopoulou Evangelia, Karypis George: Local Item-Item Models for TopN Recommendation</article-title>
          ,
          <source>10th ACM Conference on Recommender Systems , RecSys16</source>
          <volume>67</volume>
          {
          <fpage>74</fpage>
          , (
          <year>2016</year>
          ) http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>