=Paper= {{Paper |id=Vol-1911/17 |storemode=property |title=Disjunctive Boolean Kernels-based Collaborative Filtering for top-N Item Recommendation |pdfUrl=https://ceur-ws.org/Vol-1911/17.pdf |volume=Vol-1911 |authors=Mirko Polato,Fabio Aiolli |dblpUrl=https://dblp.org/rec/conf/iir/PolatoA17 }} ==Disjunctive Boolean Kernels-based Collaborative Filtering for top-N Item Recommendation== https://ceur-ws.org/Vol-1911/17.pdf
Disjunctive Boolean Kernel based Collaborative
   Filtering for top-N item recommendation

                          Mirko Polato and Fabio Aiolli

                University of Padova - Department of Mathematics
                       Via Trieste, 63, 35121 Padova - Italy
                        {mpolato,aiolli}@math.unipd.it



      Abstract. In many real-world recommendation tasks the available data
      consists only of simple interactions between users and items, such as
      clicks and views, called implicit feedback. In this kind of scenarios model
      based pairwise methods have shown of being one of the most promising
      approaches. In this paper, we propose a principled and efficient kernel-
      based collaborative filtering method for top-N item recommendation in-
      spired by pairwise preference learning. We also propose a new boolean
      kernel, called Monotone Disjunctive Kernel, which is able to alleviate the
      sparsity issue that is one of the main problem in collaborative filtering
      contexts. The embedding of this kernel is composed by all the combina-
      tions of a certain degree d of the input variables, and these combined
      features are semantically interpreted as disjunctions of the input vari-
      ables. Experiments on several CF datasets have shown the effectiveness
      and the efficiency of the proposed kernel-based method.

      Keywords: collaborative filtering, implicit feedback, top-n recommen-
      dation, kernel methods, boolean kernels


1   Introduction

Collaborative Filtering (CF) is the de facto approach for making personalized
recommendation. Even though, in the past, the rating prediction task has got
most of the attention by the research community, recently the focus has shifted
towards the implicit feedback scenario, where the task is the, so called, top-N
recommendation task. This drift in favour of the implicit setting is due to the
fact that in real-world recommendation scenarios (i) implicit data are much more
easy to gather as they do not require any active action by the user, and (ii) they
are simply more common.
    In literature, many model-based methods for implicit feedback data have
been proposed, such as SLIM [1], WRMF [2], BPR[3], and CF-OMD [4]. Despite
the complexity of the above mentioned approaches, most of them are linear
models, and in general, in CF contexts easier models tend to achieve very good
results. This behaviour is typical of CF datasets because they usually own the
following two characteristics: (i) they are very sparse (∼ 99% sparsity), and (ii)
the distribution of the interactions over the items and/or over the users are long
2       Mirko Polato and Fabio Aiolli

tailed. For these reasons, for this kind of data, it is not ideal to use more complex
and, consequently, even more sparse representations.
    In this paper, we present a kernel-based CF framework, based on the seminal
work [4], for top-N item recommendation. This method, even though it is based
on kernels, it is very efficient, highly scalable and it is very suitable for datasets
with few positive and many negative/unlabeled examples (e.g., CF datasets).
    We also propose a new representation for boolean valued data which is less
expressive than the linear one. Specifically, we propose a (boolean) kernel, called
Monotone Disjunctive Kernel (mD-Kernel), in which the feature space is com-
posed by all the combinations of the input variables of a given degree d, and
these combined features are semantically interpreted as logical disjunctions of
the input variables. The underpinning idea behind the proposed mD-Kernel is to
define higher-level features that are a kind of generalization of the linear ones so
to obtain more general representations that hopefully can alleviate the sparsity
issue. Moreover, since it is based on boolean logic we also aim to build an effi-
cient and effective algorithm able to extract from this kernel the most relevant
features, namely the most relevant boolean rules. In this way we will be able to
provide explanations for the recommendations.


2    The proposed Framework

In this section we present a collaborative filtering method called CF-KOMD
[5] for top-N recommendation which is inspired by preference learning [4], and
designed to explicitly maximize the AUC (Area Under the ROC Curve).
    In the following, we give a brief explanation of the algorithm, for further
details please refer to [5]. Let W ∈ Rn×k be the embeddings of users (U) in
a latent factor space and X ∈ Rk×m be the embeddings of items (I) in that
space. Given a user u, a ranking over items can be induced by the factorization
R̂ = WX, where r̂ui = wu> xi with the constraint kwu k = kxi k = 1.
    CF-KOMD tries to learn (implicitly) the user representation wu∗ , by solving
the optimization problem


              α∗u+ = arg min α>                     2    >
                              u+ Ku+ αu+ + λp kαu+ k − 2αu+ qu ,                  (1)
                       αu+


    where αu+ is a probability distribution over the positive examples (i.e., rated
items) for a given user u, K ∈ Rm×m is a kernel matrix between items induced
by a given kernel function   κ, and the elements of the vector qu ∈ R|Iu | are
                  1
                     P
defined by qui = |I|   j∈I κ(x i , xj ).

The induced ranking is obtained using the scoring function r̂u = X> wu∗ =
K>u+ : αu+ − q where Ku+ : ∈ R
                                  |Iu |×|I|
                                            is the matrix obtained by taking the
subset of rows corresponding to the positive set of items for the user u, and
q ∈ R|I| is like qu but defined over the whole set of items.
                                                                          Title Suppressed Due to Excessive Length                                                         3

2.1             Monotone Disjunctive Kernel

Since we are working inside a binary input space (i.e., binary rating matrix,
R ∈ {0, 1}n×m ), the kernel κ can be a boolean kernel function, i.e., κ : {0, 1}n ×
{0, 1}n → N. From our previous experiments [6] and other studies, e.g., [7], it is
possible to notice that linear models, in general, achieve very good results in CF
contexts. For this reason, instead of using more expressive kernels, such as, the
polynomial or the Gaussian, we propose a new kernel which is less expressive
than the linear kernel.
    The key idea of the Monotone Disjunctive Kernel (mD-kernel) is the creation
of boolean disjunctions of the input variables. Specifically, the kernel creates all
the combinations of variable of a certain degree d and it interprets them as
disjunctions of boolean variables, e.g., x1 x3 x7 ≡ x1 ∨ x3 ∨ x7 , assuming 1 as
true and 0 as false. A disjunction is satisfied if and only if at least one of its
literals is true, so in the feature space a feature is active if and only if one of its
(input) variables is active. Formally, the embedding of the mD-kernel of degree
                               (b)                 (b)                    Pn
d is given by φd∨ : x 7→ (φ∨ (x))b∈Bd , with φ∨ (x) = Jhx, biK = J i=1 xi bi K,
where  J·K is the indicator function. The dimension of the mD-kernel embedding
is nd . It can be demonstrated [8] that the mD-kernel of degree d between x and
z, i.e., κd∨ (x, z), can be computed by

                                   n − kxk22     n − kzk22     n − kxk22 − kzk22 + hx, zi
                                                                                   
                              n
  κd∨ (x, z) =                  −             −             +                              .
                              d       d             d                      d

           This kernel owns many interesting properties that are described in [8].


3               Results

In this section we report some of the empirical results obtained with our method.
For more details and more extensive empirical results please refer to [5, 6, 8].
    Figure 1 shows the effect of the degree of the mD-kernel on the AUC perfor-
mances of CF-KOMD on three CF datasets.


       0.770                                                      0.86                                               0.975


                                                                  0.84
       0.765
                                                                                                                     0.970
                                                                  0.82
       0.760

                                                                  0.80
 AUC




                                                            AUC




                                                                                                               AUC




       0.755                                                                                                         0.965
                                                                  0.78

       0.750
                                                                  0.76
                                                                                                                     0.960
       0.745
                                                                  0.74
                                            Tanimoto                                          Tanimoto                                                     Tanimoto
                                            D-Kernel                                          D-Kernel                                                     D-Kernel
       0.740                                                      0.72                                               0.955
            0    2   4   6        8    10   12         14             0   20   40   60   80   100        120              0   10   20   30       40   50   60         70
                              d                                                      d                                                       d



                (a) BookCrossing                                               (b) Ciao                                        (c) Film Trust

                                      Fig. 1. Performance of different mD-kernel degrees.
4       Mirko Polato and Fabio Aiolli

    In Table 1 there are the AUC performances achieved by our method with
different kernels (only the best results are reported) against WRMF and BPR.


                          MovieLens Netflix FilmTrust Ciao
                CF-KOMD       89.6        94.1       96.4      73.4
                   WRMF       87.0        80.4        94.7     56.5
                      BPR     85.4        84.3        95.4     54.9
              Table 1. AUC results. Best results are reported in bold.


4    Conclusions and future work
We proposed a kernel-based CF method for top-N recommendation. The method
has demonstrated of being efficient and very effective in many CF datasets.
Moreover, we proposed a new boolean kernel, called mD-kernel, able to deal
with the sparsity issue in CF contexts. We leveraged on the observation made in
our previous work [6] to come up with the idea of creating a data representation
less expressive than the linear one in order to mitigate the sparsity and the
long tail issues. In the future we aim to build an algorithm able to efficiently
extract from this kernel the most relevant features, i.e., the most relevant rules.
In this way we will be able to provide explanation for the given recommendations
improving the user experience.

References
1. Xia Ning and George Karypis. SLIM: sparse linear methods for top-n recommender
   systems. In 11th IEEE International Conference on Data Mining, ICDM 2011,
   Vancouver, BC, Canada, December 11-14, 2011, pages 497–506, 2011.
2. Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit
   feedback datasets. In ICDM, pages 263–272, 2008.
3. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
   Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the
   Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages
   452–461, Arlington, Virginia, United States, 2009. AUAI Press.
4. Fabio Aiolli. Convex AUC optimization for top-N recommendation with implicit
   feedback. In ACM Recommender Systems Conference, pages 293–296, New York,
   USA, 2014.
5. Mirko Polato and Fabio Aiolli. Kernel based collaborative filtering for very large
   scale top-n item recommendation. In Proceedings of the European Symposium
   on Artificial Neural Networks, Computational Intelligence and Machine Learning
   (ESANN), pages 11–16, 2016.
6. Mirko Polato and Fabio Aiolli. Exploiting sparsity to build efficient kernel based
   collaborative filtering for top-n item recommendation. 2017 forthcoming.
7. J. Zhou and T. Luo. A novel approach to solve the sparsity problem in collaborative
   filtering. In 2010 International Conference on Networking, Sensing and Control
   (ICNSC), pages 165–170, 2010.
8. Mirko Polato and Fabio Aiolli. Disjunctive boolean kernels for collaborative filtering
   in top-n recommendation. https://arxiv.org/abs/1612.07025, 2016.