Federated Recommender Systems with Learning to
Rank
Vito Walter Anelli1 , Yashar Deldjoo1 , Tommaso Di Noia1 , Antonio Ferrara1 and
Fedelucio Narducci1
1
    Politecnico di Bari, Via E. Orabona, 4, 70126 Bari, Italy


                                         Abstract
                                         Recommendation services are extensively adopted in several user-centered applications as a tool to alle-
                                         viate the information overload problem and help users in orienteering in a vast space of possible choices.
                                         In such scenarios, data ownership is a crucial concern since users may not be willing to share their sen-
                                         sitive preferences (e.g., visited locations, read books, bought items) with a central server. Unfortunately,
                                         data harvesting and collection is at the basis of modern, state-of-the-art approaches to recommendation.
                                         To address this issue, we extend Federated Pair-wise Learning (FPL), an architecture in which users col-
                                         laborate in training a central factorization model while controlling the amount of sensitive data leaving
                                         their devices. The proposed approach implements pair-wise learning-to-rank optimization by follow-
                                         ing the Federated Learning principles, conceived originally to mitigate the privacy risks of traditional
                                         machine learning.

                                         Keywords
                                         Federated Learning, Recommender Systems, Information Retrieval, Learning to Rank


1. Introduction
Collaborative filtering (CF) models have been mainstream research in the recommender system
(RS) community over the last two decades thanks to their performance accuracy [1, 2]. Among
them, a prominent class uses the matrix factorization (MF) approach as the inference model.
The MF model’s main aim is to uncover user and item latent representations whose linear
interaction explains observed feedback. To date, the majority of existing MF models are trained
in a centralized fashion causing several concerns about the privacy of user data.
   The consequent data scarcity dilemma can thereby jeopardize the training of MF models.
Training high-quality MF models strongly relies on sufficient in-domain interaction data to
ensure that enough co-occurrence information exists to shape similar behavioral/preference
patterns in a user community. Although cross-domain recommendation approaches allow
combating the issue of data scarcity, their applicability largely depends upon the availability of
data providers that can collect/supply cross-domain in their platform (e.g., Amazon). However,

SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV),
Italy
" vitowalter.anelli@poliba.it (V. W. Anelli); yashar.deldjoo@poliba.it (Y. Deldjoo); tommaso.dinoia@poliba.it
(T. D. Noia); antonio.ferrara@poliba.it (A. Ferrara); fedelucio.narducci@poliba.it (F. Narducci)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
these approaches remain out of focus in this work. In recent years, federated learning (FL) was
proposed by Google as a mean to offer a privacy-by-design solution [3, 4, 5] for machine-learned
models. Federated learning aims to meet ML privacy shortcomings by horizontally distributing
the model’s training over user devices; thus, clients exploit private data without sharing them [5].
Despite its original formulation, the FL concept is extended to a more comprehensive idea
of privacy-preserving, decentralized, collaborative ML techniques [6] where different data
partitions share the same feature space (horizontal federation) or not (vertical federation). Weiss
et al. [7] state that privacy can be preserved by limiting data collection, which is one of the
main privacy concerns [8]. Indeed, the accuracy of RS based on the CF paradigm is strictly
dependent on the amount of user preferences available.
   Our idea is to put users in control of their sensitive data by allowing them to choose the
amount of information to share with the server. Hence, if data collection from the server side is
reduced, other threats related to retention, sales, and unauthorized data browsing are limited.
The proposed system extends FPL [9, 10] (short for Federated Pair-wise Learning), is a federated
factorization model for collaborative recommendation1 . It extends state-of-the-art factorization
approaches to build a RS that puts users in control of their sensitive data. Users participating in
the federation process can decide if and to which extent they are willing to disclose their sensitive
private data (i.e., what they liked/consumed). FPL mainly leverages not-sensitive information
(e.g., places the user has not visited) – which can be large and non-sensitive – to reach a
competitive accuracy and, at the same time, respect a satisfactory balance between accuracy and
privacy. We have carried out extensive experiments on real-world datasets [11] in the Point of
Interest (PoI) domain by considering the accuracy of recommendation and diversity metrics. The
experimental evaluation shows that FPL can provide high-quality recommendations, putting
the user in control of the amount of sensitive data to share.


2. Approach
In this section, after a brief introduction of background technologies, we extend FPL [9, 10]
(depicted in Fig. 1). To the best of our knowledge, FPL is the first attempt to put pair-wise
optimization in federated recommender systems and give the users the possibility to select the
trade-off between data disclosure and the recommendation utility.

2.1. Background Technologies
Federated Learning. Federated learning (FL) is a paradigm initially envisioned by Google [3,
12, 5] to train a machine-learning model from data distributed among a loose federation of users’
devices (e.g., personal mobile phones). The rationale is to face the increasing issues of ownership
and locality of data to mitigate the privacy risks (and leaks) resulting from centralized machine
learning [13, 14]. In particular, given Θ denoting the parameters of a machine learning model,
we consider a learning scenario where the objective is to minimize a generic loss function 𝐺(Θ).
FL is a learning paradigm in which the users 𝑢 ∈ 𝒰 of a federation collaborate to solve the
learning problem under the coordination of a central server 𝑆 without sharing or exchanging

    1
        A public implementation of FPL is available at https://github.com/sisinflab/FedBPR/.
their raw data with 𝑆. From an algorithmic point of view, we start with 𝑆 sharing Θ with the
federation of devices. Then, specific methods solve a local optimization problem on the single
device, i.e., using its data, and exploiting Θ. Afterwards, the client shares the parameters of its
local model with 𝑆. The parameters provided by the clients are used to update Θ, which is sent
back to the devices in a new iteration step.

Factorization Models and Pair-Wise Recommendation A recommendation problem
over a set of users 𝒰 and a set of items ℐ is defined as the activity of finding for each user 𝑢 ∈ 𝒰
an item 𝑖 ∈ ℐ that maximizes a utility function 𝑔 : 𝒰 × ℐ → R. In this context, X ∈ R|𝒰 |×|ℐ|
is the user-item matrix containing for each 𝑥𝑢𝑖 an explicit or implicit feedback (e.g., rating or
check-in, respectively) of user 𝑢 ∈ 𝒰 for item 𝑖 ∈ ℐ. In the work at hand, an implicit feedback
scenario is considered — i.e., feedback is, e.g., purchases, visits, clicks, views, check-ins —, with
X containing binary values. Therefore, 𝑥𝑢𝑖 = 1 and 𝑥𝑢𝑖 = 0 denote either user 𝑢 has consumed
or not item 𝑖, respectively.
   In FPL, the underlying data model is a Factorization model, inspired by MF [15], a recommenda-
tion model that became popular in the last decade thanks to its state-of-the-art recommendation
accuracy [16]. This technique aims to build a model Θ in which each user 𝑢 and each item 𝑖 is
represented by the embedding vectors p𝑢 and q𝑖 , respectively, in the shared latent space R𝐹 .
The algorithm relies on the assumption that X can be factorized such that the dot product be-
tween p𝑢 and q𝑖 can explain any observed user-item interaction 𝑥𝑢𝑖 , and that any non-observed
interaction can be estimated as 𝑥  ^ 𝑢𝑖 (Θ) = 𝑏𝑖 (Θ) + p𝑇𝑢 (Θ) · q𝑖 (Θ) where 𝑏𝑖 is a term denoting
the bias of the item 𝑖. Among pair-wise approaches for learning-to-rank the items of a catalog,
Bayesian Personalized Ranking (BPR) [17] is one of the most broadly adopted, thanks to its ca-
pabilities to correctly rank with acceptable computational complexity. In detail, given a training
set defined by 𝒦 =∑︀{(𝑢, 𝑖, 𝑗) | 𝑥𝑢𝑖 = 1 ∧ 𝑥𝑢𝑗 = 0}, BPR solves the optimization problem via
the criterion max (𝑢,𝑖,𝑗)∈𝒦 ln 𝜎(𝑥     ^ 𝑢𝑖𝑗 (Θ)) − 𝜆‖Θ‖2 , where 𝑥^ 𝑢𝑖𝑗 (Θ) = 𝑥           ^ 𝑢𝑗 (Θ) is a
                                                                                ^ 𝑢𝑖 (Θ) − 𝑥
                Θ
real value modeling the relation between user 𝑢, item 𝑖 and item 𝑗, 𝜎(·) is the sigmoid function,
and 𝜆 is a model-specific regularization parameter to prevent overfitting.
   Pair-wise optimization can be applied to a wide range of recommendation models, included
factorization. Hereafter, we denote the model Θ = ⟨P, Q, b⟩, where P ∈ R|𝒰 |×𝐹 is a matrix
whose 𝑢-th row corresponds to the vector p𝑢 , and Q ∈ R|ℐ|×𝐹 is a matrix in which the 𝑖-th
row corresponds to the vector q𝑖 . Finally, b ∈ R|ℐ| is a vector whose 𝑖-th element corresponds
to the value 𝑏𝑖 .

2.2. FPL: Federated Pair-wise Learning for Recommendation
Following the aforementioned federated learning principles, let us assume that users in 𝒰
consume items from a catalog ℐ and give feedback about them. 𝑆 is aware of the whole catalog
ℐ, while only user 𝑢 knows her own set of consumed items. Given these conditions, the
classic BPR-MF learning procedure [17] for model training can not be applied to the federated
learning scheme [5]. Instead, we propose a novel learning paradigm (depicted in Figure 1) that
is executed for a number 𝐸 of epochs and works by rounds of communication that envisages
Distribution→Computation→Transmission→Aggregation sequences between the server
and the clients.
                       Client-side
                                                                                                        Server-side
                            User-Factor
                            Vector (pu)                           Item-Factor
                                                                   Matrix (Q)

                                         BPR
                                     Optimization
                                                                                                                      Σ
    Training data Ku                                                                         Server S

                                      ∆(      )

                                      ∆(      )
                                                             ∆(    )     ∆(     )

                                                    ∆(   )                          ∆(   )
                                                                  ∆(      )
                  Client u


Figure 1: Item-Factor Matrix (center) is sent by the server to the federation of devices (left side) which
perform the local training phase. Local outputs are sent to the server which aggregates them (right
side).


   In the FPL setting, a global model Θ𝑆 is built on 𝑆 such that Θ𝑆 = ⟨Q, b⟩, where Q ∈ R|ℐ|×𝐹
and b ∈ R|ℐ| are the item-factor matrix and the bias vector (introduced in Section 2.1). On the
other hand, on each device in the federation, FPL builds a model Θ𝑢 = ⟨p𝑢 ⟩, which corresponds
to the representation of user 𝑢 in a latent space of dimensionality 𝐹 . It is noteworthy that, in
FPL, only user 𝑢 holds the embedding vector p𝑢 ; therefore, each user 𝑢 autonomously computes
her personalized item ranking, by combining the global model Θ𝑆 , sent by 𝑆 to the devices in
the federation, with her local model Θ𝑢 . In such a setting, each user 𝑢 holds her own private
feedback dataset x𝑢 ∈ Rℐ , which — compared with a centralized recommender system —
corresponds to the 𝑢-th row of matrix X. Each FPL client 𝑢 hosts a user-specific training set
𝒦𝑢 : 𝒰 × ℐ × ℐ defined by 𝒦𝑢 = {(𝑢, 𝑖, 𝑗) | 𝑥𝑢𝑖 = 1 ∧ 𝑥𝑢𝑗 = 0}, where ∑︀ 𝑥𝑢𝑖 represents the 𝑖-th
element of 𝑥𝑢 . Please note that, in the following, we refer to 𝑋 + = 𝑢∈𝒰 |{𝑥𝑢𝑖 | 𝑥𝑢𝑖 = 1}| as
the number of positive interactions.
The number of rounds of communication performed in each learning epoch is a parameter
denoted by the symbol rpe (round-per-epoch). Each round of communication is envisioned as a
four-step protocol, described in the following.

   1. Distribution. 𝑆 randomly selects a subset of users 𝒰 − ⊆ 𝒰 and delivers them the model
      Θ𝑆 .

   2. Computation. Each user 𝑢 generates 𝑇 triples (𝑢, 𝑖, 𝑗) from her dataset 𝒦𝑢 and for each
      of them performs BPR stochastic optimization to compute the updates for the local p𝑢
      vector of Θ𝑢 , and for p𝑖 , 𝑏𝑖 , p𝑗 , and 𝑏𝑗 of the received Θ𝑆 , following:
                                                                     ⎧
                                                                      (q − q𝑗 )       if 𝜃 = p𝑢 ,
                                                                     ⎪ 𝑖
                                                                     ⎪
                                                                     ⎪
                                                                     ⎨p𝑢              if 𝜃 = q𝑖 ,
                                                                     ⎪
                                                                     ⎪
                 𝑒−𝑥^𝑢𝑖𝑗
                                                                     ⎪
                              𝜕                            𝜕
        Δ𝜃 =         −𝑥
                      ^
                            ·    ^ 𝑢𝑖𝑗 − 𝜆𝜃,
                                 𝑥                with        ^ 𝑢𝑖𝑗 = −p𝑢
                                                              𝑥                       if 𝜃 = q𝑗 ,   (1)
                1+𝑒     𝑢𝑖𝑗   𝜕𝜃                           𝜕𝜃
                                                                                      if 𝜃 = 𝑏𝑖 ,
                                                                     ⎪
                                                                      1
                                                                     ⎪
                                                                     ⎪
                                                                     ⎪
                                                                     ⎪
                                                                                      if 𝜃 = 𝑏𝑗 .
                                                                     ⎪
                                                                      −1
                                                                     ⎩


      It is worth noticing that Rendle [17] suggests, in a centralized scenario, to adopt a uniform
      distribution (over 𝒦) to choose the training triples randomly. The purpose is to avoid data
      is traversed item-wise or user-wise, since this may lead to slow convergence. Conversely,
      in a federated approach, we required to train the model user-wise since the training of
      each round of communication is performed separately on each client 𝑢 knowing only
      data in 𝒦𝑢 . This is the reason why, in FPL, the designer can control of the number of
      triples 𝑇 used for training, to tune the degree of local computation — i.e., how much the
      sampling is user-wise traversing.

   3. Transmission. The clients in 𝒰 − send back to 𝑆 a portion of the updates for the computed
      item factor vector and item bias. More in detail, since the training output of a triple (𝑢, 𝑖, 𝑗)
      in BPR lets the server distinguish the consumed item 𝑖 from the non-consumed one 𝑗
      (for example just by analyzing the positive and the negative sign of Δ𝑏𝑖 and Δ𝑏𝑗 ), while
      they show the same absolute value, we argue that sending all the updates computed
      by 𝑢 may allow 𝑆 to reconstruct 𝒦𝑢 thus raising a privacy issue. Since our primary
      goal is to put users in control of their data, FPL proposes a solution to overcome this
      vulnerability. By sending the sole update (Δq𝑗 , Δ𝑏𝑗 ) of each training triples (𝑢, 𝑖, 𝑗), user
      𝑢 would share with 𝑆 indistinguishably negative or missing values, which are assumed to
      be non-sensitive data. Furthermore, in FPL we introduce the parameter 𝜋, which allows
      users to control of the number of consumed items to share with the central server 𝑆. The
      parameter 𝜋 works as a probability that clients send also a specific positive item update
      (Δq𝑖 , Δ𝑏𝑖 ) in addition to (Δq𝑗 , Δ𝑏𝑗 ).

   4. Global aggregation. 𝑆∑︀  aggregates all the received updates in Q and b to build the new
      model Θ𝑆 ← Θ𝑆 + 𝛼 𝑢∈𝒰 − ΔΘ𝑢 , with 𝛼 being the learning rate (each row of the
      matrix Q and each element of b is updated by summing up the contribution of all clients
      in 𝒰 − for the corresponding item).


3. Experimental Setup
In this section, we introduce the experimental setting designed to analyze the performance
of FPL. To this extent, we introduce the choice of the datasets with a brief analysis of their
characteristics. Then, we describe the state-of-the-art algorithms we have involved. For the
sake of reproducibility, for each method, we report the explored hyper-parameters in a specific
section. Lastly, we present the evaluation protocol, and the metrics considered in the study.
Table 1
Characteristics of the evaluation datasets used in the offline experiment: |𝒰| is the number of users,
|ℐ| the number of items, 𝑋 + the number of records.
                                                                      𝑋+      𝑋+       𝑋+
                            Dataset         |𝒰|     |ℐ|      𝑋+       |𝒰 |    |ℐ|    |ℐ|·|𝒰 | %
                             Brazil       17,473 47,270 599,958 34.34 12.69 0.00073%
                            Canada         1,340 29,518 63,514 47.40 2.15 0.00161%
                             Italy         1,353 25,522 54,088 39.98 2.20 0.00157%


3.1. Datasets
The evaluation of FPL needs to meet some particular constraints: the availability of transaction
data to obtain a reliable experimental setting and a domain that guarantees the presence of
data the user may prefer to protect. In our view, the optimal domain would be that of the
Point-of-Interest (PoI), which concerns data that users usually perceive as sensitive. Among
the many available datasets, we think that a very good candidate is the Foursquare dataset [11].
Indeed, it is often considered as a reference for evaluating PoI recommendation models.
   To mimic a federation of devices in a single country, we have extracted check-ins for three
countries, namely Brazil, Canada, and Italy. Since our only constraint was to obtain datasets
with different size/sparsity characteristics, we took the liberty of choosing three countries
of recent RecSys conference venues. To fairly evaluate FPL against the baselines, we have
kept users with more than 20 interactions2 . Moreover, we have split the datasets by adopting
a realistic temporal hold-out 80-20 splitting on a per-user basis [18, 19]. Table 1 shows the
characteristics of the resulting training sets.

3.2. Baselines
To evaluate the efficacy of FPL, we have conducted the experiments by considering non-
personalized methods (random and most popular recommendation), and different recommen-
dation approaches, including the centralized BPR-MF implementation [17], VAE [20], and
FCF [21], which is, to date, the only federated recommendation approach based on MF (since
no source code is available, we reimplemented and considered it in the reader’s interest). To
evaluate the impact of feedback deprivation on recommendation accuracy, we have evalu-
ated different values of 𝜋 in the range [0.0, 1.0]. We remember that 𝜋 = 0.0 means 𝑢 is not
sharing any update (Δq𝑖 , Δ𝑏𝑖 ) with 𝑆 regarding her positive items feedback, while 𝜋 = 1.0
means 𝑢 is sharing the updates on all positive items. Hence, we have considered four different
configurations regarding computation and communication:

    • sFPL: it aims to reproduce the stochastic learning approach of centralized factorization
      model with pair-wise learning, where the central model is updated sequentially; therefore,
      we set |𝒰 − | = 1 to involve just one random client per round, and it extracts solely one
      triple (𝑢, 𝑖, 𝑗) from its dataset (𝑇 = 1) for the training phase;


    2
        The limitations of the Collaborative Filtering in a cold-start user setting are well-known in the literature.
                                                                                      +
    • sFPL+: we increase client local computation by raising to 𝑋|𝒰 | the number of triples 𝑇
      extracted from 𝒦𝑢 by each client involved in the round of communication;

    • pFPL: we enable parallelism by involving all clients in each round of communication
      (𝒰 − = 𝒰); we keep 𝑇 = 1;
                                                                                          +
    • pFPL+: we extend pFPL by letting each client sample 𝑇 = 𝑋        |𝒰 | triples from 𝒦𝑢 ; the
      rationale is that the overall training samples are exactly 𝑋 , as in centralized BPR-MF.
                                                                  +


In Rendle et al. [17], authors suggest to set the number of triples in one epoch of BPR to 𝑋 + ,
which corresponds to the number of optimizations steps. A particular choice is to randomly
                   +
sampling 𝑇 = 𝐶|𝒰 | triples per user. To make a federated training epoch of FPL comparable
to BPR and among different configurations, we set rpe to obtain always the same number of
interactions 𝜌 between clients and server in one epoch. This value is equal to the overall number
of optimization steps in one epoch of the centralized pair-wise learning. In detail, we set 𝜌 = 𝑋 +
                                                                              𝑋+
and then rpe = 𝜌 · |𝒰 − | which results in 𝑟𝑝𝑒 = 𝑋 + for sFPL, and 𝑟𝑝𝑒 = |𝒰    − | for pFPL.


3.3. Reproducibility
For the splitting strategy, we have adopted a temporal hold-out 80/20 to separate our datasets
in training and test set. Moreover, to find the most promising learning rate 𝛼, we have further
split the training set, adopting a temporal hold-out 80/20 strategy on a user basis to extract
her validation set. VAE has been trained by considering three autoencoder topologies, with
the following number of neurons per layer: 200-100-200, 300-100-300, 600-200-600. We have
chosen candidate models by considering the best models after training for 50, 100, and 200
epochs, respectively. For the factorization models, we have performed a grid search in BPR-
MF for 𝛼 ∈ {0.005, 0.05, 0.5} varying the number of latent factors in {10, 20, 50}. Then, to
ensure a fair comparison, we have exploited the same learning rate and number of latent factors
to train FPL and FCF, and we explored the models in the range of {10, . . . , 50} iterations.
We have set user- and positive item-regularization parameter to 20   1
                                                                        of the learning rate. The
negative item-regularization parameter is 200 of the learning rate, as suggested in mymedialite3
                                             1

implementation as well as by Anelli et al. [22].

3.4. Evaluation Metrics
We have evaluated the performance of FPL under the accuracy and diversity perspective. The
accuracy of the models is measured by exploiting Precision (𝑃 @𝑁 ) and Recall (𝑅@𝑁 ). They
respectively represent, for each user, the proportion of relevant recommended items in the
recommendation list, and the fraction of relevant items that have been altogether suggested.
We have assessed the statistical significance of results by adopting Student’s paired T-test
considering p-values < 0.054 . The results are in general statistically significant but the dif-
ferences among BPR-MF, sFPL, and pFPL, which is a very important result. To measure the
diversity of recommendations, we have measured the Item Coverage (𝐼𝐶@𝑁 ), and the Gini
   3
       http://www.mymedialite.net/
   4
       The complete results are available at https://github.com/sisinflab/fpl-results/.
Table 2
Results of accuracy and beyond-accuracy metrics for baselines and FPL on the three datasets. For each
configuration of FPL and for each dataset, the experiment with the best 𝜋 is shown (see the bottom
part for details). For all metrics, the greater the better.
                             Brazil                                  Canada                                       Italy
              PR        RE        IC        Gini        PR          RE    IC            Gini      PR         RE           IC        Gini
Random     0.00013    0.00015   46120    0.709455     0.00030     0.00035   10815   0.26809     0.00030   0.00029    10478         0.28914
Top-Pop    0.01909    0.02375     19     0.000203     0.04239     0.04679     18    0.00030     0.04634   0.05506      19          0.00035
VAE *      0.10320    0.13153    5503     0.02117     0.06060     0.06317    1044   0.00652     0.10421   0.21324     165          0.02336
BPR-MF     0.07702    0.09494    2552     0.00756     0.03694     0.03650    1216   0.00998     0.04560   0.05458      19          0.00036
FCF        0.03089    0.03749    911      0.00095     0.03724     0.03836     504   0.00174     0.03126   0.03708     403          0.00158
sFPL       0.07757    0.09581    1581     0.00561     0.04515     0.04550     451   0.00243     0.04701   0.05600      18          0.00036
sFPL+      0.08682    0.11004    5200     0.01449     0.05701     0.05665    1510   0.01259     0.05595   0.06229     932          0.00789
pFPL       0.07771    0.09582    2114     0.00638     0.04582     0.04637     425   0.00213     0.04642   0.05465      96          0.00056
pFPL+      0.08733    0.11085    3820     0.01106     0.05761     0.05755    1214   0.00981     0.05565   0.06291     936          0.00725
Best 𝜋 obtained for each the proposed FPL variations for Brazil, Canada, and Italy are: sFPL = (0.5, 0.1, 0.4), pFPL = (0.8, 0.1, 1)
* VAE does not always produce recommendations for all the users. For Italy, the reported results cover the 14% of the users.
For this reason, it is not marked with bold in the table.


               0.1
              0.08                                                               0.05
                                                                            F1
         F1


              0.06                                                               0.04
              0.04
                     0.1           0.5                1                                  0.1           0.5                     1
                                     𝜋                                                                   𝜋
                           (a) Brazil                                                          (b) Canada
                                               0.06
                                               0.05
                                          F1


                                               0.04

                                                      0.1             0.5                1
                                                                        𝜋
                                                                (c) Italy
Figure 2: F1 performance at different values of 𝜋 in the range [0.1, 1]. Dark blue is sFPL, dark green is
sFPL+, light blue is pFPL, light green is pFPL+.


Index (𝐺𝑖𝑛𝑖@𝑁 ). 𝐼𝐶 provides the number of diverse items recommended to users. It also
conveys the sense of the degree of personalization [23]. 𝐺𝑖𝑛𝑖 measures distributional inequality,
i.e., how unequally different items a RS provides users with [24]. A higher value of 𝐺𝑖𝑛𝑖 [18]
corresponds to higher personalization.
4. Discussion
The main goal of the experiments is assessing whether it is possible to obtain a recommendation
performance comparable to a centralized pair-wise learning approach while allowing the users
to control their data. In this respect, Table 2 shows the accuracy and diversity results of the
comparison between the state-of-the-art baselines and the four configurations of FPL presented
in Section 3. By focusing on accuracy metrics, we may notice that User-kNN outperforms
the other approaches in the three datasets, while the performance of Item-kNN and BPR-MF
approximately settle in the same range of values. This is possibly due to the user-item ratio [25],
that favors the user-based schemes (see Table 1). On the other hand, it is important to investigate
the differences of FPL with respect to BPR-MF, which is a pair-wise centralized approach, being
FPL the first federated pair-wise recommender based on a factorization model. The performance
of BPR-MF against FPL, in the configuration sFPL, shows how precision and recall in sFPL
are slightly outperforming BPR-MF for the three datasets. This result is surprising since the
two methods share the sequential training, but sFPL exploits a 𝜋 reduced to 0.5, 0.1, and 0.4,
respectively, for Brazil, Canada, and Italy. This behavior is more evident in Figure 2, where the
harmonic mean between Precision and Recall (F1) is plotted for different values of 𝜋. If we look
at the dark blue line, we may observe how the best result does not correspond to 𝜋 = 1. In
the last three rows of Table 2, we explore an increasing of the local computation (sFPL+), or
an increased parallelism (pFPL), or a combination of both (pFPL+). In detail, we observe that
sFPL+ takes advantage of the increased local computation, and FPL significantly outperforms
BPR-MF for the three datasets; for instance, for Canada, we observe an interesting increase in
precision. Instead, when comparing pFPL with sFPL, we observe that the increased parallelism
does not affect the performance significantly. Even then, the increased local computation boosts
the Precision and Recall performance, up to 24% for precision in the Italy dataset. The results
confirm that the proposed system can generate recommendations with a quality that is compara-
ble with the centralized pair-wise learning approach. Moreover, the increased local computation
causes a considerable improvement in the accuracy of recommendation. On the other side, the
training parallelism does not significantly affects results. Finally, when the local computation
is combined with parallelism, the results show a further improvement.
   Afterwards, we varied 𝜋 in the range {0.1, . . . , 1.0} to assess how removal of the updates
for consumed items affects the final recommendation accuracy, and we plotted the accuracy
performance by considering F1 in Figure 2. As previously observed, the best performance rarely
corresponds to 𝜋 = 1. On the contrary, a general trend can be observed: the training reaches a
peak for a certain value of 𝜋 — depending on the dataset —, and then the system performance
decays in accuracy when increasing the amount of shared positive updates. In rare cases, e.g.,
sFPL, and pFPL for Brazil dataset, the decay is absent, but results that are very close for different
values of 𝜋. The general behavior suggests that the system learning exploits the updates of
positive items to absorb information about popularity. This consideration is coherent with the
mathematical formulation of the learning procedure, and it is also supported by the observation
that for Canada and Italy FPL reaches the peak before with respect to Brazil. Indeed, Canada
and Italy datasets are less sparse than Brazil, and the increase of information about positive
items may lead to push up too much the popular items (this is a characteristic of pair-wise
learning), while the same behavior in Brazil can be observed for values of 𝜋 very close to 1.
The same mathematical background, for sFPL+ and pFPL+ with Brazil dataset, which is very
sparse, explains the higher value of 𝜋 needed to reach good performance. Here, the lack of
positive information with a vast catalog of items, confuses the training that cannot exploit item
popularity. Now, we can positively assert that user can receive high-quality recommendations
also when decides to disclose a small amount of her sensitive data. However, it should be noted
that the more the dataset is sparse, the more the amount of sensitive data should be large.


5. Conclusion and Future Work
Inspired by the potential ubiquity of the federated learning paradigm, we extend FPL, a novel
federated learning framework that exploits pair-wise learning for factorization models. To
this purpose, we have designed a model to leave the user-specific information of the original
factorization model in the clients’ devices. With FPL, a user may be completely in control
of her sensitive data and could share no positive feedback with the centralized server. The
framework can be envisioned as a general factorization model in which clients can tune the
amount of information shared among devices. We have conducted an exploratory, but extensive,
experimental evaluation to analyze the degree of accuracy, the diversity of the recommendation
results, the trade-off between accuracy, and amount of shared transactions. We have assessed
that the proposed model shows performance comparable with several state-of-the-art baselines
and the classic centralized factorization model with pair-wise learning. The evaluation shows
that clients may share a small portion of their data with the server and still receive high-
performance recommendations. We believe that the proposed privacy-oriented paradigm may
open the doors to a new class of ubiquitous recommendation engines.


References
 [1] B. McFee, L. Barrington, G. R. G. Lanckriet, Learning content similarity for music recom-
     mendation, IEEE Trans. Audio, Speech & Language Processing 20 (2012) 2207–2218.
 [2] J. Yuan, W. Shalaby, M. Korayem, D. Lin, K. AlJadda, J. Luo, Solving cold-start problem in
     large-scale recommendation engines: A deep learning approach, in: 2016 IEEE Int. Conf.
     on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016, IEEE Computer
     Society, 2016, pp. 1901–1910.
 [3] J. Konecný, B. McMahan, D. Ramage, Federated optimization: Distributed optimization
     beyond the datacenter, CoRR abs/1511.03575 (2015). arXiv:1511.03575.
 [4] V. W. Anelli, Y. Deldjoo, T. Di Noia, A. Ferrara, Towards effective device-aware federated
     learning, in: International Conference of the Italian Association for Artificial Intelligence,
     Springer, 2019, pp. 477–491.
 [5] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al., Communication-efficient
     learning of deep networks from decentralized data, arXiv preprint arXiv:1602.05629 (2016).
 [6] Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: Concept and applications,
     ACM TIST 10 (2019) 12:1–12:19.
 [7] S. Weiss, The need for a paradigm shift in addressing privacy risks in social networking
     applications, in: IFIP International Summer School on the Future of Identity in the
     Information Society, Springer, 2007, pp. 161–171.
 [8] A. J. P. Jeckmans, M. Beye, Z. Erkin, P. H. Hartel, R. L. Lagendijk, Q. Tang, Privacy in
     recommender systems, in: N. Ramzan, R. van Zwol, J. Lee, K. Clüver, X. Hua (Eds.), Social
     Media Retrieval, Computer Communications and Networks, Springer, 2013, pp. 263–281.
 [9] V. W. Anelli, Y. Deldjoo, T. Di Noia, A. Ferrara, F. Narducci, How to put users in control
     of their data in federated top-n recommendation with learning to rank, in: Proceedings
     of the 36th Annual ACM Symposium on Applied Computing, SAC ’21, Association for
     Computing Machinery, New York, NY, USA, 2021, p. 1359–1362. URL: https://doi.org/10.
     1145/3412841.3442010. doi:10.1145/3412841.3442010.
[10] V. W. Anelli, Y. Deldjoo, T. D. Noia, A. Ferrara, F. Narducci, Federank: User controlled feed-
     back with federated recommender systems, in: D. Hiemstra, M. Moens, J. Mothe, R. Perego,
     M. Potthast, F. Sebastiani (Eds.), Advances in Information Retrieval - 43rd European Con-
     ference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings,
     Part I, volume 12656 of Lecture Notes in Computer Science, Springer, 2021, pp. 32–47. URL:
     https://doi.org/10.1007/978-3-030-72113-8_3. doi:10.1007/978-3-030-72113-8\_3.
[11] D. Yang, D. Zhang, B. Qu, Participatory cultural mapping based on collective behavior
     data in location-based social networks, ACM TIST 7 (2016) 30:1–30:23.
[12] J. Konecný, H. B. McMahan, D. Ramage, P. Richtárik, Federated optimization: Dis-
     tributed machine learning for on-device intelligence, CoRR abs/1610.02527 (2016).
     arXiv:1610.02527.
[13] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon,
     J. Konecný, S. Mazzocchi, H. B. McMahan, T. V. Overveldt, D. Petrou, D. Ramage, J. Rose-
     lander, Towards federated learning at scale: System design, CoRR abs/1902.01046 (2019).
     arXiv:1902.01046.
[14] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz,
     Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated
     learning, arXiv preprint arXiv:1912.04977 (2019).
[15] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems,
     IEEE Computer 42 (2009) 30–37.
[16] D. kumar Bokde, S. Girase, D. Mukhopadhyay, Role of matrix factorization model in collab-
     orative filtering algorithm: A survey, CoRR abs/1503.07475 (2015). arXiv:1503.07475.
[17] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized
     ranking from implicit feedback, in: J. A. Bilmes, A. Y. Ng (Eds.), UAI 2009, Proc. of the
     Twenty-Fifth Conf. on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, June
     18-21, 2009, AUAI Press, 2009, pp. 452–461.
[18] A. Gunawardana, G. Shani, Evaluating recommender systems, in: F. Ricci, L. Rokach,
     B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015, pp. 265–308.
[19] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, Local popularity and time
     in top-n recommendation, in: European Conf. on Information Retrieval, volume 11437,
     Springer, 2019, pp. 861–868.
[20] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, Variational autoencoders for col-
     laborative filtering, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp.
     689–698.
[21] M. Ammad-ud-din, E. Ivannikova, S. A. Khan, W. Oyomno, Q. Fu, K. E. Tan, A. Flanagan,
     Federated collaborative filtering for privacy-preserving personalized recommendation
     system, CoRR abs/1901.09888 (2019). arXiv:1901.09888.
[22] V. W. Anelli, T. D. Noia, E. D. Sciascio, C. Pomo, A. Ragone, On the discriminative power
     of hyper-parameters in cross-validation and how to choose them, in: Proc. of the 13th
     ACM Conf. on Recommender Systems, ACM, 2019, pp. 447–451.
[23] G. Adomavicius, Y. Kwon, Improving aggregate recommendation diversity using ranking-
     based techniques, IEEE Trans. Knowl. Data Eng. 24 (2012) 896–911.
[24] P. Castells, N. J. Hurley, S. Vargas, Novelty and diversity in recommender systems, in:
     F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook, Springer, 2015, pp.
     881–918.
[25] G. Adomavicius, J. Zhang, Impact of data characteristics on recommender systems perfor-
     mance, ACM Trans. Management Inf. Syst. 3 (2012) 3:1–3:17. URL: https://doi.org/10.1145/
     2151163.2151166. doi:10.1145/2151163.2151166.