Addressing Privacy in Recommender Systems with Federated Learning Discussion Paper Vito Walter Anelli1 , Tommaso Di Noia1 , Eugenio Di Sciascio1 , Antonio Ferrara1,∗ and Alberto Carlo Maria Mancino1,∗ 1 Politecnico di Bari, Bari, Italy Abstract In recent years, recommender systems have successfully assisted user decision-making in various user- centered applications. In such scenarios, the modern approaches are based on collecting user-sensitive preferences. However, data collection is crucial since users now worry about the related privacy risks when sharing their data. This work presents a recommendation approach based on the Federated Learning paradigm, a distributed privacy-preserving approach to the recommendation. Here, users collaborate on the training while still controlling the amount of the shared sensitive data. This paper presents FPL: a pair-wise learning-to-rank approach based on Federated Learning. We show that it puts users in control of their data and reveals recommendation performance competing with centralized state-of-the-art approaches. The public implementation is available at https://split.to/sisinflab-fpl. Keywords Federated Learning, Collaborative Filtering, Pair-wise Learning, Matrix Factorization 1. Introduction Recommender Systems (RSs) have emerged as a solution for mitigating the information-overload problem by assisting users with personalized recommendations. Generally, these models are trained in a centralized fashion, where massive proprietary sensitive data are hosted on a single server. In the last two decades, the RS mainstream research line has focused on Collaborative Filtering (CF) [2, 3], Content-based [4, 5], and hybrid [6] approaches. However, these models need sufficient data to provide accurate recommendations by exploiting users’ behavior similarities. Proposed by Google, Federated Learning (FL) emerged as a privacy-by- design solution for machine learning models [7, 8, 9, 10]. FL addresses the ML-privacy limitations by horizontally distributing the training while having the clients train the global model on their local devices without sharing their private data [10]. Recent works on Federated Learning-based RSs have exhibited benefits for the users’ privacy [11]. Federated Pair-wise Learning (FPL) [8] shows how a federated recommender system can exploit the Learning-to-Rank competitive performance while still letting users control their data. Indeed, one of the most significant IIR2022: 12th Italian Information Retrieval Workshop, June 29 - June 30th, 2022, Milan, Italy Extended version [1] published in the 58th volume of Journal of Intelligent Information Systems ∗ Corresponding author. Envelope-Open antonio.ferrara@poliba.it (A. Ferrara); alberto.mancino@poliba.it (A. C. M. Mancino) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) advantages of employing FPL is that the users participating in the federated learning process can independently decide how much they are inclined to disclose their private sensitive preferences. In this paper, we formally show how in FPL, the users can control their data. We investigate the risks related to the transmission of the gradients and how we address this drawback by putting users in control of their data. Moreover, we show that integrating FPL could lead to competitive performance in accuracy and provide users with a trustworthy model [12]. 2. Foundations of Federated Pair-wise Learning FPL: Federated Pair-wise Learning for Recommendation. Let be 𝒰 and ℐ the set of users and items, respectively, and X ∈ ℝ|𝒰|×|ℐ | be the user-item matrix containing for each 𝑥𝑢𝑖 an implicit feedback 1 or 0. Inspired by the state-of-the-art MF [13, 14, 15, 16], each user 𝑢 and item 𝑖 are represented by the embedding vectors p𝑢 and q𝑖 , respectively. The dot product between p𝑢 and q𝑖 can explain any observed user-item interaction 𝑥𝑢𝑖 , so that any non-observed interaction can be estimated as 𝑥̂𝑢𝑖 = 𝑏𝑖 + p𝑇𝑢 q𝑖 , where 𝑏𝑖 is a bias term. FPL builds a global model Θ𝑆 = ⟨Q, b⟩ on a server 𝑆, which is aware of the whole catalog ℐ, and a private local model Θ𝑢 = ⟨p𝑢 ⟩ on each client of the federation. In our federated setting, each user 𝑢 holds her own feedback dataset x𝑢 ∈ ℝℐ , which — compared with a centralized recommender system — corresponds to the 𝑢-th row of matrix X, so only user 𝑢 knows her own set of consumed items. In FPL, the model is trained by rounds of communications composed of a four-step protocol (Distribution→Computation→Transmission→Aggregation), described in the following. 1. Distribution. 𝑆 delivers the model Θ𝑆 to a subset of selected users 𝒰 − ⊆ 𝒰. 2. Computation. Each user 𝑢 generates 𝑇 triples (𝑢, 𝑖, 𝑗) from her local dataset and for each of them performs BPR stochastic optimization to compute the updates for the local p𝑢 vector of Θ𝑢 , and for p𝑖 , 𝑏𝑖 , p𝑗 , and 𝑏𝑗 of the received Θ𝑆 , following: ⎧(q𝑖 − q𝑗 ) if 𝜃 = p𝑢 , ⎪p𝑢 if 𝜃 = q𝑖 , 𝑒 −𝑥̂𝑢𝑖𝑗 𝜕 𝜕 ⎪ Δ𝜃 = ⋅ 𝑥̂ − 𝜆𝜃, with 𝑥̂𝑢𝑖𝑗 = −p𝑢 if 𝜃 = q𝑗 , (1) 1+𝑒 −𝑥̂𝑢𝑖𝑗 𝜕𝜃 𝑢𝑖𝑗 𝜕𝜃 ⎨ ⎪1 if 𝜃 = 𝑏𝑖 , ⎪ ⎩−1 if 𝜃 = 𝑏𝑗 . 3. Transmission. Each client 𝑢 ∈ 𝒰 − send back the updates for the computed item embedding and item bias to the server 𝑆. We should focus on how BPR computes the training output. Since for a triple (𝑢, 𝑖, 𝑗), the server could be able to distinguish the consumed item 𝑖 from the non-consumed one 𝑗 (for instance, by analyzing the positive and the negative sign of Δ𝑏𝑖 and Δ𝑏𝑗 ), we argue that sending all the updates computed by 𝑢 may raise a privacy issue. FPL proposes a solution to overcome this vulnerability by sending the sole update (Δq𝑗 , Δ𝑏𝑗 ) of each training triples (𝑢, 𝑖, 𝑗). In this way, the user 𝑢 shares only indistinguishably negative or missing values, which are assumed to be non-sensitive. Furthermore, FPL allows users to establish the number of consumed items to share with the central server 𝑆, by introducing the parameter 𝜋. It is related to the probability of a user sending a specific update relative to a positive item (Δq𝑖 , Δ𝑏𝑖 ) in addition to (Δq𝑗 , Δ𝑏𝑗 ). 4. Global aggregation. All the received updates are aggregate by 𝑆 in Q and b to build the new model Θ𝑆 ← Θ𝑆 + 𝛼 ∑𝑢∈𝒰 − ΔΘ𝑢 , with 𝛼 being the learning rate. Privacy Analysis of FPL. FPL has not been conceived as a privacy-preserving framework but as a tool to control the trade-off between potentially exposed sensitive data and the recommen- dation quality. While federated learning hides, by design, users’ raw data to the server, some malicious actors might still try to learn sensitive information. For this reason, federated learning alone does not consider providing privacy guarantees to users. In the context of FPL, the aim is to protect the existence of user-item transactions. While attempts of active reconstruction of the user profile are not considered here and are out of the scope of this work, we focus on the presence of an honest-but-curious server. Regarding Eq. 1, suppose a pair of positive and negative items 𝑖 and 𝑗 and the gradients received at the 𝑡-th round of communication. The notation of Δq𝑡𝑖 and Δq𝑡𝑗 could be extended by focusing on a single latent factor 𝑓: Δq𝑡𝑖,𝑓 = p𝑡−1 𝑡−1 𝑡−1 𝑡−1 𝑢,𝑓 𝜎 (p𝑢,𝑓 (q𝑖,𝑓 − q𝑗,𝑓 )), (2) Δq𝑡𝑗,𝑓 = −p𝑡−1 𝑡−1 𝑡−1 𝑡−1 𝑢,𝑓 𝜎 (p𝑢,𝑓 (q𝑖,𝑓 − q𝑗,𝑓 )), (3) where 𝜎(⋅) returns values in the range (0, 1). These equations show that the modules of Δq𝑡𝑖,𝑓 and Δq𝑡𝑗,𝑓 (which must be sent to the server) are identical, while their signs are opposite. Moreover, the sign of the update depends on both the existence/absence of a transaction for 𝑘 and on 𝑠𝑔𝑛(p𝑡−1 𝑢,𝑓 ). Therefore, the sign of a gradient does not directly reveal the presence or absence of an item in the user’s training set. However, the pairs of positive and negative gradients disclose user preference patterns. In a round of communication, all the updates for the consumed items share the same sign, as well as all the updates for the non-consumed items have the same positive or negative sign, depending on 𝑠𝑔𝑛(p𝑡−1 𝑢,𝑓 ). If the server 𝑆 is honest-but-curious (i.e., it may try to inspect the updates to obtain some user information), as soon as it obtains enough information adequate to identify one or more consumed/non-consumed items, the entire user dataset will be exposed. To avoid this problem, FPL puts users in control of their data. If the users adopt the privacy-oriented masking procedure during the Transmission phase, they can decide the fraction of updates for positive items to send. In the case of exposure of the user transactions, only a fraction is given up. As a consequence, FPL has to work in a data scarcity scenario, where the fraction of used data is defined by the parameter 𝜋 (this could be fixed by the system designer or actively decided by the users). In the experimental section, we empirically show how FPL is resilient to missing data in the federated scenario. 3. Experimental Results In the following, we report the accuracy performance of FPL. It has been evaluated on the Foursquare dataset [17] in the Point-of-Interest domain since it contains data usually perceived as sensible. To mimic a federation of devices in a single country, we have extracted check-ins Table 1 Results of accuracy metrics for baselines and FPL on the three datasets. The metrics refers to the top 10 recommendation list. Brazil Canada Italy P@10 R@10 P@10 R@10 P@10 R@10 Random 0.00013 0.00015 0.00030 0.00035 0.00030 0.00029 Centralized Top-Pop 0.01909 0.02375 0.04239 0.04679 0.04634 0.05506 User-kNN 0.10600 0.13480 0.07639 0.07533 0.06881 0.07833 Item-kNN 0.07716 0.09607 0.04006 0.03881 0.04663 0.05356 VAE 0.10320 0.13153 0.06060 0.06317 0.10421 0.21324 BPR-MF 0.07702 0.09494 0.03694 0.03650 0.04560 0.05458 FCF 0.03089 0.03749 0.03724 0.03836 0.03126 0.03708 Federated sFPL * 0.07757 0.09581 0.04515 0.04550 0.04701 0.05600 sFPL+ * 0.08682 0.11004 0.05701 0.05665 0.05595 0.06229 pFPL * 0.07771 0.09582 0.04582 0.04637 0.04642 0.05465 pFPL+ * 0.08733 0.11085 0.05761 0.05755 0.05565 0.06291 * Best 𝜋 obtained for each the proposed FPL variations across three countries (Brazil, Canada, and Italy) are: sFPL = (0.5, 0.1, 0.4), sFPL+ = (0.9, 0.4, 0.2), pFPL = (0.8, 0.1, 1), pFPL+ = (0.8, 0.3, 0.1) for three countries, namely Brazil, Canada, and Italy. Moreover, we have split each dataset by adopting a realistic temporal hold-out 80-20 splitting on a per-user basis [18, 19]. FPL has been evaluated with different values of 𝜋 in [0.0, 1.0] with step 0.1, in order to assess the impact of user sharing more (high 𝜋) or less (low 𝜋) positive feedbacks. Hence, four configurations have been considered regarding variations in computation ad communication. In sFPL and pFPL, the model is updated for each round of communication involving one client, or all the clients, respectively. In these configurations, the clients’ local training involves only one triple (𝑢, 𝑖, 𝑗) from their local dataset. In contrast, in the correspondent sFPL+ and pFPL+ configurations, the number of selected triples is proportional to the number of each user’s positive feedback. FPL has been compared against six centralized models (Random, Most Popular, BPR-MF [20], User-kNN and Item-kNN [21], VAE [22]), and a federated recommendation approach based on MF (FCF [23]). The accuracy performances of the results are reported in table 1, comparing the four configurations of FPL and the state-of-the-art baselines. We notice that VAE and User-kNN outperform the other models, while Item-kNN and BPR-MF show similar results. Regarding FPL, when comparing sFPL and sFPL+ with their parallelized configurations (pFPL and pFPL+), we observe that the increased parallelism does not affect the performance significantly. On the other hand, increasing the local computation (sFPL+ and pFPL+) boots the performance up to 24%. The results show that FPL behaves better than BPR-MF in precision and recall. These performances are surprising considering that FPL exploits less feedback per round since they are reduced by 𝜋. It is also notable that FPL outperforms FCF and preserves privacy to a greater extent since sharing gradients of all rated items in FCF can result in a data leak [24]. We have seen how the proposed system can generate recommendations with a quality that is comparable with the centralized pair-wise learning approach. Moreover, the increased local computation causes a considerable improvement in the accuracy of recommendations. On the other side, the training parallelism does not significantly affects results. Finally, when the local computation is combined with parallelism, the results show a further improvement. References [1] V. W. Anelli, Y. Deldjoo, T. D. Noia, A. Ferrara, F. Narducci, User-controlled federated matrix factorization for recommender systems, J. Intell. Inf. Syst. 58 (2022) 287–309. [2] B. McFee, L. Barrington, G. R. G. Lanckriet, Learning content similarity for music recom- mendation, IEEE Trans. Audio, Speech & Language Processing 20 (2012) 2207–2218. [3] J. Yuan, W. Shalaby, M. Korayem, D. Lin, K. AlJadda, J. Luo, Solving cold-start problem in large-scale recommendation engines: A deep learning approach, in: 2016 IEEE Int. Conf. on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016, IEEE Computer Society, 2016, pp. 1901–1910. [4] V. Bellini, G. M. Biancofiore, T. D. Noia, E. D. Sciascio, F. Narducci, C. Pomo, Guapp: A conversational agent for job recommendation for the italian public administration, in: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, May 27-29, 2020, IEEE, 2020, pp. 1–7. URL: https://doi.org/10.1109/EAIS48028.2020.9122756. doi:10.1109/EAIS48028.2020.9122756 . [5] V. W. Anelli, A. Bellogín, A. Ferrara, D. Malitesta, F. A. Merra, C. Pomo, F. M. Donini, T. D. Noia, V-elliot: Design, evaluate and tune visual recommender systems, in: H. J. C. Pampín, M. A. Larson, M. C. Willemsen, J. A. Konstan, J. J. McAuley, J. Garcia-Gathright, B. Huurnink, E. Oldridge (Eds.), RecSys ’21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 October 2021, ACM, 2021, pp. 768–771. URL: https://doi.org/10.1145/3460231.3478881. doi:10.1145/3460231.3478881 . [6] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, How to make latent factors interpretable by feeding factorization machines with knowledge graphs, in: ISWC (1), volume 11778 of Lecture Notes in Computer Science, Springer, 2019, pp. 38–56. [7] J. Konecný, B. McMahan, D. Ramage, Federated optimization: Distributed optimization beyond the datacenter, CoRR abs/1511.03575 (2015). arXiv:1511.03575 . [8] V. W. Anelli, Y. Deldjoo, T. D. Noia, A. Ferrara, F. Narducci, How to put users in control of their data in federated top-n recommendation with learning to rank, in: SAC ’21: The 36th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, Republic of Korea, March 22-26, 2021, ACM, 2021, pp. 1359–1362. URL: https://doi.org/10.1145/3412841.3442010. doi:10.1145/3412841.3442010 . [9] J. Konecný, H. B. McMahan, D. Ramage, P. Richtárik, Federated optimization: Dis- tributed machine learning for on-device intelligence, CoRR abs/1610.02527 (2016). arXiv:1610.02527 . [10] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in: Proc. of 20th Int. Conf. on Arti- ficial Intelligence and Stat., 2017, pp. 1273–1282. URL: http://proceedings.mlr.press/v54/ mcmahan17a.html. [11] V. W. Anelli, Y. Deldjoo, T. D. Noia, A. Ferrara, F. Narducci, Federank: User controlled feedback with federated recommender systems, in: ECIR (1), volume 12656 of Lecture Notes in Computer Science, Springer, 2021, pp. 32–47. [12] C. Ardito, T. D. Noia, E. D. Sciascio, D. Lofú, G. Mallardi, C. Pomo, F. Vitulano, Towards a trustworthy patient home-care thanks to an edge-node infrastructure, in: R. Bernhaupt, C. Ardito, S. Sauer (Eds.), Human-Centered Software Engineering - 8th IFIP WG 13.2 International Working Conference, HCSE 2020, Eindhoven, The Netherlands, November 30 - December 2, 2020, Proceedings, volume 12481 of Lecture Notes in Computer Science, Springer, 2020, pp. 181–189. URL: https://doi.org/10.1007/978-3-030-64266-2_11. doi:10. 1007/978- 3- 030- 64266- 2\_11 . [13] Y. Koren, R. M. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer 42 (2009) 30–37. [14] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ferrara, A. C. M. Mancino, Sparse feature factorization for recommender systems with knowledge graphs, in: H. J. C. Pampín, M. A. Larson, M. C. Willemsen, J. A. Konstan, J. J. McAuley, J. Garcia-Gathright, B. Huurnink, E. Oldridge (Eds.), RecSys ’21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021 - 1 October 2021, ACM, 2021, pp. 154–165. URL: https://doi.org/10.1145/3460231.3474243. doi:10.1145/3460231.3474243 . [15] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, Semantic interpretation of top-n recommendations, IEEE Trans. Knowl. Data Eng. 34 (2022) 2416–2428. URL: https://doi.org/10.1109/TKDE.2020.3010215. doi:10.1109/TKDE.2020.3010215 . [16] V. W. Anelli, T. D. Noia, P. Lops, E. D. Sciascio, Feature factorization for top-n recommen- dation: From item rating to features relevance, in: RecSysKTL, volume 1887 of CEUR Workshop Proceedings, CEUR-WS.org, 2017, pp. 16–21. [17] D. Yang, D. Zhang, B. Qu, Participatory cultural mapping based on collective behavior data in location-based social networks, ACM TIST 7 (2016) 30:1–30:23. [18] A. Gunawardana, G. Shani, Evaluating recommender systems, in: Recommender Systems Handbook, Springer, 2015, pp. 265–308. [19] V. W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, J. Trotta, Local popularity and time in top-n recommendation, in: European Conf. on Information Retrieval, volume 11437, Springer, 2019, pp. 861–868. [20] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: bayesian personalized ranking from implicit feedback, in: Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence, 2009, pp. 452–461. [21] Y. Koren, Factor in the neighbors: Scalable and accurate collaborative filtering, ACM Transactions on Knowledge Discovery from Data (TKDD) 4 (2010) 1–24. [22] D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, Variational autoencoders for collabora- tive filtering, in: Proc. of 2018 WWW Conf., 2018, pp. 689–698. [23] M. Ammad-ud-din, E. Ivannikova, S. A. Khan, W. Oyomno, Q. Fu, K. E. Tan, A. Flanagan, Federated collaborative filtering for privacy-preserving personalized recommendation system, CoRR abs/1901.09888 (2019). arXiv:1901.09888 . [24] D. Chai, L. Wang, K. Chen, Q. Yang, Secure federated matrix factorization, CoRR abs/1906.05108 (2019). URL: http://arxiv.org/abs/1906.05108. arXiv:1906.05108 .