=Paper=
{{Paper
|id=Vol-2758/OHARS-paper5
|storemode=property
|title=Making Session-based News Recommenders Diversity-aware
|pdfUrl=https://ceur-ws.org/Vol-2758/OHARS-paper5.pdf
|volume=Vol-2758
|authors=Alireza Gharahighehi,Celine Vens
|dblpUrl=https://dblp.org/rec/conf/recsys/GharahighehiV20
}}
==Making Session-based News Recommenders Diversity-aware==
Making Session-based News Recommenders Diversity-aware Alireza Gharahighehia,b , Celine Vensa,b a Itec, imec research group at KU Leuven, Kortrijk, Belgium b KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Kortrijk, Belgium Abstract Recommender systems are widely applied in digital platforms such as news websites to personalize services based on user preferences. In news websites most of users are anonymous and the only available data is sequences of items in anonymous sessions. Due to this, typical collaborative filtering methods, which are highly applied in many applications, are not effective in news recommendations. In this context, session-based recommenders are able to recommend next items given the sequence of previous items in the active session. Session-based k nearest neighbor method has been shown to be highly effective compared to more sophisticated approaches. In this study we propose three scenarios to make session-based k nearest neighbor method diversity-aware and to address the filter bubble phenomenon. The filter bubble phenomenon is a common concern in recommendation systems and it occurs when the system narrows the information and deprives users of diverse information. The results of applying proposed scenarios show that these diversification scenarios improve the rank and relevance sensitive diversity measure in session-based k nearest neighbor method based on three news datasets. Keywords News recommendation, session-based recommender system, filter bubble phenomenon 1. Introduction Nowadays recommender systems are applied in almost every digital platform. These platforms try to adapt their services based on user needs in order to increase user satisfaction. In news aggregator websites, users are usually anonymous and therefore their profiles and long-term interaction histories are not available. In this situation the only available information is the sequence of interactions in the current session of the (anonymous) user. Moreover, news domain is highly dynamic and the set of available news articles for recommendation changes rapidly. Therefore a news recommender system should focus on these characteristic to capture recent trends and anonymous users’ short-term preferencess [1, 2, 3]. When the user’s long history is not available, Session-based Recommender Systems (SBRSs) are applied. SBRSs are meant to recommend the next items given the sequence of visited items in the current session of an anonymous user. SBRSs use the collaborative and sequential information from previous sessions of anonymous users to rank and recommend candidate OHARS’20: Workshop on Online Misinformation- and Harm-Aware Recommender Systems, September 25, 2020, Virtual Event email: alireza.gharahighehi@kuleuven.be (A. Gharahighehi); celine.vens@kuleuven.be (C. Vens) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 60 items for an active session. These methods are applied in many applications such as news recommendation [4], music recommendation and next basket prediction in e-commerce [5]. Recommender systems that primarily optimise predictive accuracy can narrow the scope of users’ recommendations and tighten the filter bubble around the user. In news aggregator web- sites, in addition to the filter bubble problem, focusing only on accuracy can boost polarization, radicalization and fragmentation among users [6]. To address these issues, diversity should be considered in the recommendation list to avoid recommending redundant items to users and also to broaden users’ horizons. In this study we introduce diversity in a session-based news recommendation system based on news article metadata. To the best of our knowledge, most current SBRS methods only focus on providing accurate predictions, ignoring diversity of recommendation lists. We propose three scenarios to make SKNN method diversity-aware using three news datasets. 2. Related Work The concept of diversity was firstly introduced in the information retrieval community. A diversified list is more likely to contain the user’s actual search intent [7]. In recommender systems diversification is applied to provide a wider range of items and therefore to address filter bubble phenomenon. To measure diversity, a common evaluation measure is the average pair-wise distance between items in the recommendation list [8]. In recommender systems accuracy and ranking play important roles. Vargas and Castells [9] introduced a rank and relevance sensitive intra-list diversity measure that shows to what extent the recommender can diversify the list and preserve the relevant items in high ranks. Generally there are two diversification approaches in recommendation systems: re-ranking and diversity modeling. Re-ranking approaches such as [10, 11, 12] are post-processing methods that reorder the ranked list generated by a baseline recommender. While these methods are able to increase diversity, they need additional post-processing steps and normally computationally expensive. On the other hand diversity modeling methods such as [13, 14, 15] adapt the main algorithm to make it diversity-aware. Although diversity has been vastly studied in user-based recommendation systems it has received very limited attention in SBRSs. Previous studies on SBRSs [16, 5, 1] have shown that simple SBRSs methods such as session-based k-Nearest Neighbor (SKNN) can outperform complex neural network methods in both accuracy and computational cost. The aim of this paper is to make SKNN, which is a flexible and simple SBRS, diversity-aware using news content. 3. Methodology SKNN is a memory-based SBRS that uses the items in the current session to select the nearest neighbor sessions and to predict the next items in the current session. To predict the score of a candidate item, SKNN uses the similarity of the item set in the neighbor sessions with the item set in the current session. This score can be calculated using Equation 1: ̂ 𝑁 𝑁 (𝑖, 𝑠) = ∑ 𝑤𝑛 (𝑠, 𝑛) × 1𝑛 (𝑖) 𝑟𝑆𝐾 (1) 𝑛∈𝑁𝑠 61 In Equation 1, 𝑟𝑆𝐾 ̂ 𝑁 𝑁 (𝑖, 𝑠) is the predicted score for session 𝑠 and candidate item 𝑖, 𝑤𝑛 (𝑠, 𝑛) is the item set similarity between session 𝑠 and session 𝑛, 1𝑛 (𝑖) is an indicator function that verifies whether item 𝑖 exists in session 𝑛 and 𝑁𝑠 is the set of neighbor sessions for session 𝑠. To calculate the item set similarity between two sessions one can use Jaccard distance measure: |𝐼𝑠 ∩ 𝐼𝑛 | 𝑤𝑛 (𝑠, 𝑛) = (2) |𝐼𝑠 ∪ 𝐼𝑛 | where 𝐼𝑠 and 𝐼𝑛 are the item sets for session 𝑠 and 𝑛. To make SKNN diversity-aware we add two components to Equation 1 based on the news article metadata: ̂ 𝑁 𝑁 (𝑖, 𝑠) = 𝑤𝑖 (𝑖, 𝑠) ∑ 𝑤𝑛 (𝑠, 𝑛) × 𝑑(𝑛) × 1𝑛 (𝑖), 𝑟𝑆𝐾 𝑛∈𝑁𝑠 ∑𝑖∈𝑛 ∑𝑗∈𝑛\ {𝑖} 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗) 𝑑𝑛 = , (3) |𝑛|(|𝑛| − 1) ∑𝑗∈𝑠 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗) 𝑤𝑖 (𝑖, 𝑠) = |𝑠| In Equation 3, 𝑑𝑛 is the diversity of session 𝑛, 𝑤𝑖 (𝑖, 𝑠) is the average content dissimilarity of item 𝑖 and the items in session 𝑠 and 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗) is the content dissimilarity between item 𝑖 and 𝑗. To calculate 𝑑𝑛 and 𝑤𝑖 (𝑖, 𝑠) we need a content representation for news articles and a distance measure (e.g. cosine measure). In the next section we explain how this content representation looks like for each dataset. We evaluate three diversity-aware scenarios. First, giving higher weights (𝑑(𝑛)) to the more diverse neighbor sessions (diverse neighbor), second, considering a higher weight (𝑤𝑖 (𝑖, 𝑠)) for a candidate item with higher average dissimilarity with items of the current session (diverse candidate) and finally the combination of both previous scenarios. 4. Dataset and Experimental Set-up We used three news datasets, namely Roularta1 , Globo.com [4] and Adressa [17], which are described in Table 1, to evaluate the performance of the proposed diversity-aware scenarios. To calculate content dissimilarity explained in the previous section we should form a content representation for news articles. For Roularta and Globo.com datasets the content representation is formed based on news article text and tags. The CNN based deep neural network approach proposed by [4] is used to generate article embeddings for news articles of these datasets based on article text and tags. Since the text of articles are not available for Adressa dataset, we use multi-hot encoding of article tags to represent article content in this dataset. We compared the performance of SKNN (k=100) with the three proposed diversity-aware scenarios, namely diverse neighbor (SKNN_D), diverse candidate (SKNN_C) and the combi- nation of them (SKNN_DC) based on three performance measures, namely precision (p@k), expected intra-list Diversity (d@k) and Rank and Relevance sensitive expected intra-list Di- versity (rrd@k) [9]. p@k is a standard information retrieval accuracy measure that evaluates 1 Dataset obtained from Roularta Media Group, a Belgian multimedia group. 62 the model in predicting the relevant items in the top k recommendation. d@k is the average content dissimilarity between pairs of items in top k recommendation and rrd@k is another diversity measure that considers the ranks and relevance of top k recommendation in calculating diversity. To form the train and test sets, we use the approach by [16]. In this approach the datasets are split into five partitions with same duration. The sessions in the last day of each partition are considered as the test sessions. In these sessions the last two items are regarded as the test items. The accuracy measure is calculated based on ability of the model in predicting these test items in the test sessions. The reported performance in the next section is the average performance of the model over these five partitions. 5. Result and Discussion The results of proposed scenarios2 with regard to p@20, d@20 and rrd@20 are reported in Table 2. For Roularta dataset the proposed scenarios can enhance d@20 at the cost of reduced p@20. This indicates that enhancing the diversity of recommendation lists makes the model predictions less accurate in this dataset. Moreover, the proposed scenarios can improve rrd@20 which implies that these approaches have better trade-offs between diversity and accuracy compared to the original SKNN. Interestingly, in Globo.com and Adressa datasets the SKNN_D scenario can improve diversity without deteriorating the accuracy compared to SKNN. With regard to rrd@20, the combined scenario (SKNN_DC) has the best performance in both diversity measures compared to the other approaches in all datasets. Moreover, SKNN_C has better performance in both diversity measures compared to SKNN_D in all datasets. The recommendation lists have more diversity when the more diverse neighbors are selected in predictions and when the candidate item that has more dissimilarity with the user history is recommended. The nearest neighbors convey the collaborative information and according to the results using the more diverse collaborative information gives us a trade-off between diversity and accuracy. diverse candidate approach is based on content-based information and has a greater impact on the diversity of recommendations. To show how this approach addresses the filter bubble phenomenon the recommended lists generated by the diversity-aware model (SKNN_DC) and the original model (SKNN ) for a test session in Adressa dataset are assessed. The diversified list contains 43 unique tags whereas the list generated by the original model only covers 25 tags. This indicates that the diversity-aware model offers a wider range of content to the user. Diversification does not necessarily come with high accuracy loss. Based on the results one can increase the diversity of a news recommender with relatively low or even without accuracy drop. 6. Conclusion The main contribution of this study is to make an SKNN news recommender system diversity- aware. In news aggregator websites focusing only on prediction accuracy of the recommender 2 The source code is available on https://itec.kuleuven-kulak.be/supportingmaterial 63 Table 1 Datasets descriptions Roularta Globo.com Adressa # Sessions (anonymous users) 446,117 1,048,389 546,949 # Items 37,188 45,559 13,604 Timespan 26 days 16 days 7 days Language Dutch and French Portuguese Norwegian Representation content embedding content embedding tags multi-hot encoding Table 2 Results of diversity-aware news session-based recommender system Roularta Globo.com Adressa p@20 d@20 rrd@20 p@20 d@20 rrd@20 p@20 d@20 rrd@20 SKNN 0.0383 0.1878 0.0090 0.0356 0.3547 0.0209 0.0590 0.3739 0.0316 SKNN_C 0.0315 0.2378 0.0102 0.0341 0.3711 0.0230 0.0514 0.4718 0.0354 SKNN_D 0.0364 0.2038 0.0093 0.0357 0.3622 0.0207 0.0590 0.4132 0.0341 SKNN_DC 0.0305 0.2432 0.0102 0.0333 0.3778 0.0234 0.0512 0.4783 0.0355 can burst the filter bubble phenomenon and can intensify polarization and fragmentation among users. Diversification is a way to address these issues in news recommenders. We proposed three scenarios to diversify the recommendation lists generated by SKNN, which is a session- based recommendation system. According to the results the combined scenario improves the rrd@k, which is a rank and relevance sensitive diversity measure, in all news datasets. This result is remarkable since in addition to being rank and relevance aware, it also shows that the diversification scenario addresses the filter bubble phenomenon by improving the diversity of recommendations lists. For future extension, we propose to personalize the diversification level in recommendation lists based on the diversity of the current session. Another interesting direction for future work is to diversify news recommendations based on multiple aspects such as text, tags, sentiment and polarity of news articles. Moreover, we will assess the possibility of enhancing the diversity of model-based SBRSs such as Factorizing Personalized Markov Chains (FPMC) [18], GRU4REC [19] and CHAMELEON [4]. In the loss functions of these model-based methods regularization terms that penalize similar contents should be applied. Finally, we will apply the proposed scenarios on other domains such as music and e-commerce recommenders. In these domains there are other types of metadata such as lyrics, genres, artists, item descriptions or a hierarchy of item categories that should be used to diversify recommendations. Acknowledgments This work was executed within the imec.icon project NewsButler, a research project bringing together academic researchers (KU Leuven, VUB) and industry partners (Roularta Media Group, 64 Bothrs and ML6). The NewsButler project is co-financed by imec and receives project support from Flanders Innovation & Entrepreneurship (project nr. HBC.2017.0628). References [1] M. Jugovac, D. Jannach, M. Karimi, Streamingrec: a framework for benchmarking stream- based news recommenders, in: Proceedings of the 12th ACM Conference on Recommender Systems, 2018, pp. 269–273. [2] M. Karimi, D. Jannach, M. Jugovac, News recommender systems–survey and roads ahead, Information Processing & Management 54 (2018) 1203–1227. [3] A. Gharahighehi, C. Vens, Extended bayesian personalized ranking based on consumption behavior, in: Postproceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn 2019), Springer, 2019, p. (to appear). [4] P. M. Gabriel De Souza, D. Jannach, A. M. Da Cunha, Contextual hybrid session-based news recommendation with recurrent neural networks, IEEE Access 7 (2019) 169185–169203. [5] D. Jannach, M. Ludewig, When recurrent neural networks meet the neighborhood for session-based recommendation, in: Proceedings of the Eleventh ACM Conference on Recommender Systems, 2017, pp. 306–310. [6] N. Helberger, On the democratic role of news recommenders, Digital Journalism 7 (2019) 993–1012. [7] M. Kaminskas, D. Bridge, Diversity, serendipity, novelty, and coverage: a survey and empir- ical analysis of beyond-accuracy objectives in recommender systems, ACM Transactions on Interactive Intelligent Systems (TiiS) 7 (2016) 1–42. [8] B. Smyth, P. McClave, Similarity vs. diversity, in: International conference on case-based reasoning, Springer, 2001, pp. 347–361. [9] S. Vargas, P. Castells, Rank and relevance in novelty and diversity metrics for recommender systems, in: Proceedings of the fifth ACM conference on Recommender systems, 2011, pp. 109–116. [10] J. P. Kelly, D. Bridge, Enhancing the diversity of conversational collaborative recommen- dations: a comparison, Artificial Intelligence Review 25 (2006) 79–95. [11] S. Vargas, P. Castells, D. Vallet, Intent-oriented diversity in recommender systems, in: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 1211–1212. [12] T. Jambor, J. Wang, Optimizing multiple objectives in collaborative filtering, in: Proceed- ings of the fourth ACM conference on Recommender systems, 2010, pp. 55–62. [13] A. Said, B. Fields, B. J. Jain, S. Albayrak, User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm, in: Proceedings of the 2013 conference on Computer supported cooperative work, 2013, pp. 1399–1408. [14] Y. Shi, X. Zhao, J. Wang, M. Larson, A. Hanjalic, Adaptive diversification of recommenda- tion results via latent factor portfolio, in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012, pp. 175–184. [15] R. Su, L. Yin, K. Chen, Y. Yu, Set-oriented personalized ranking for diversified top-n 65 recommendation, in: Proceedings of the 7th ACM conference on Recommender systems, 2013, pp. 415–418. [16] M. Ludewig, D. Jannach, Evaluation of session-based recommendation algorithms, User Modeling and User-Adapted Interaction 28 (2018) 331–390. [17] J. A. Gulla, L. Zhang, P. Liu, Ö. Özgöbek, X. Su, The adressa dataset for news recommen- dation, in: Proceedings of the international conference on web intelligence, 2017, pp. 1042–1048. [18] S. Rendle, C. Freudenthaler, L. Schmidt-Thieme, Factorizing personalized markov chains for next-basket recommendation, in: Proceedings of the 19th international conference on World wide web, 2010, pp. 811–820. [19] B. Hidasi, A. Karatzoglou, Recurrent neural networks with top-k gains for session-based recommendations, in: Proceedings of the 27th ACM International Conference on Informa- tion and Knowledge Management, 2018, pp. 843–852. 66