=Paper= {{Paper |id=Vol-2758/OHARS-paper5 |storemode=property |title=Making Session-based News Recommenders Diversity-aware |pdfUrl=https://ceur-ws.org/Vol-2758/OHARS-paper5.pdf |volume=Vol-2758 |authors=Alireza Gharahighehi,Celine Vens |dblpUrl=https://dblp.org/rec/conf/recsys/GharahighehiV20 }} ==Making Session-based News Recommenders Diversity-aware== https://ceur-ws.org/Vol-2758/OHARS-paper5.pdf
Making Session-based News Recommenders
Diversity-aware
Alireza Gharahighehia,b , Celine Vensa,b
a
    Itec, imec research group at KU Leuven, Kortrijk, Belgium
b
    KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Kortrijk, Belgium


                                         Abstract
                                         Recommender systems are widely applied in digital platforms such as news websites to personalize
                                         services based on user preferences. In news websites most of users are anonymous and the only available
                                         data is sequences of items in anonymous sessions. Due to this, typical collaborative filtering methods,
                                         which are highly applied in many applications, are not effective in news recommendations. In this
                                         context, session-based recommenders are able to recommend next items given the sequence of previous
                                         items in the active session. Session-based k nearest neighbor method has been shown to be highly
                                         effective compared to more sophisticated approaches. In this study we propose three scenarios to make
                                         session-based k nearest neighbor method diversity-aware and to address the filter bubble phenomenon.
                                         The filter bubble phenomenon is a common concern in recommendation systems and it occurs when
                                         the system narrows the information and deprives users of diverse information. The results of applying
                                         proposed scenarios show that these diversification scenarios improve the rank and relevance sensitive
                                         diversity measure in session-based k nearest neighbor method based on three news datasets.

                                         Keywords
                                         News recommendation, session-based recommender system, filter bubble phenomenon




1. Introduction
Nowadays recommender systems are applied in almost every digital platform. These platforms
try to adapt their services based on user needs in order to increase user satisfaction. In news
aggregator websites, users are usually anonymous and therefore their profiles and long-term
interaction histories are not available. In this situation the only available information is the
sequence of interactions in the current session of the (anonymous) user. Moreover, news domain
is highly dynamic and the set of available news articles for recommendation changes rapidly.
Therefore a news recommender system should focus on these characteristic to capture recent
trends and anonymous users’ short-term preferencess [1, 2, 3].
   When the user’s long history is not available, Session-based Recommender Systems (SBRSs)
are applied. SBRSs are meant to recommend the next items given the sequence of visited
items in the current session of an anonymous user. SBRSs use the collaborative and sequential
information from previous sessions of anonymous users to rank and recommend candidate


OHARS’20: Workshop on Online Misinformation- and Harm-Aware Recommender Systems, September 25, 2020, Virtual
Event
email: alireza.gharahighehi@kuleuven.be (A. Gharahighehi); celine.vens@kuleuven.be (C. Vens)
                                       © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                                          60
items for an active session. These methods are applied in many applications such as news
recommendation [4], music recommendation and next basket prediction in e-commerce [5].
   Recommender systems that primarily optimise predictive accuracy can narrow the scope of
users’ recommendations and tighten the filter bubble around the user. In news aggregator web-
sites, in addition to the filter bubble problem, focusing only on accuracy can boost polarization,
radicalization and fragmentation among users [6]. To address these issues, diversity should be
considered in the recommendation list to avoid recommending redundant items to users and
also to broaden users’ horizons.
   In this study we introduce diversity in a session-based news recommendation system based
on news article metadata. To the best of our knowledge, most current SBRS methods only focus
on providing accurate predictions, ignoring diversity of recommendation lists. We propose
three scenarios to make SKNN method diversity-aware using three news datasets.


2. Related Work
The concept of diversity was firstly introduced in the information retrieval community. A
diversified list is more likely to contain the user’s actual search intent [7]. In recommender
systems diversification is applied to provide a wider range of items and therefore to address
filter bubble phenomenon. To measure diversity, a common evaluation measure is the average
pair-wise distance between items in the recommendation list [8]. In recommender systems
accuracy and ranking play important roles. Vargas and Castells [9] introduced a rank and
relevance sensitive intra-list diversity measure that shows to what extent the recommender can
diversify the list and preserve the relevant items in high ranks.
    Generally there are two diversification approaches in recommendation systems: re-ranking
and diversity modeling. Re-ranking approaches such as [10, 11, 12] are post-processing methods
that reorder the ranked list generated by a baseline recommender. While these methods are able
to increase diversity, they need additional post-processing steps and normally computationally
expensive. On the other hand diversity modeling methods such as [13, 14, 15] adapt the main
algorithm to make it diversity-aware.
    Although diversity has been vastly studied in user-based recommendation systems it has
received very limited attention in SBRSs. Previous studies on SBRSs [16, 5, 1] have shown
that simple SBRSs methods such as session-based k-Nearest Neighbor (SKNN) can outperform
complex neural network methods in both accuracy and computational cost. The aim of this
paper is to make SKNN, which is a flexible and simple SBRS, diversity-aware using news content.


3. Methodology
SKNN is a memory-based SBRS that uses the items in the current session to select the nearest
neighbor sessions and to predict the next items in the current session. To predict the score of a
candidate item, SKNN uses the similarity of the item set in the neighbor sessions with the item
set in the current session. This score can be calculated using Equation 1:

                                  ̂ 𝑁 𝑁 (𝑖, 𝑠) = ∑ 𝑤𝑛 (𝑠, 𝑛) × 1𝑛 (𝑖)
                                 𝑟𝑆𝐾                                                          (1)
                                               𝑛∈𝑁𝑠



                                                 61
   In Equation 1, 𝑟𝑆𝐾
                   ̂ 𝑁 𝑁 (𝑖, 𝑠) is the predicted score for session 𝑠 and candidate item 𝑖, 𝑤𝑛 (𝑠, 𝑛) is the
item set similarity between session 𝑠 and session 𝑛, 1𝑛 (𝑖) is an indicator function that verifies
whether item 𝑖 exists in session 𝑛 and 𝑁𝑠 is the set of neighbor sessions for session 𝑠. To calculate
the item set similarity between two sessions one can use Jaccard distance measure:

                                                                 |𝐼𝑠 ∩ 𝐼𝑛 |
                                                𝑤𝑛 (𝑠, 𝑛) =                                             (2)
                                                                 |𝐼𝑠 ∪ 𝐼𝑛 |

where 𝐼𝑠 and 𝐼𝑛 are the item sets for session 𝑠 and 𝑛. To make SKNN diversity-aware we add two
components to Equation 1 based on the news article metadata:

                                ̂ 𝑁 𝑁 (𝑖, 𝑠) = 𝑤𝑖 (𝑖, 𝑠) ∑ 𝑤𝑛 (𝑠, 𝑛) × 𝑑(𝑛) × 1𝑛 (𝑖),
                               𝑟𝑆𝐾
                                                      𝑛∈𝑁𝑠
                                                ∑𝑖∈𝑛 ∑𝑗∈𝑛\ {𝑖} 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗)
                                         𝑑𝑛 =                                  ,                        (3)
                                                          |𝑛|(|𝑛| − 1)
                                                           ∑𝑗∈𝑠 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗)
                                            𝑤𝑖 (𝑖, 𝑠) =
                                                                   |𝑠|
   In Equation 3, 𝑑𝑛 is the diversity of session 𝑛, 𝑤𝑖 (𝑖, 𝑠) is the average content dissimilarity of
item 𝑖 and the items in session 𝑠 and 𝑑𝑖𝑠𝑡𝑐 (𝑖, 𝑗) is the content dissimilarity between item 𝑖 and 𝑗.
To calculate 𝑑𝑛 and 𝑤𝑖 (𝑖, 𝑠) we need a content representation for news articles and a distance
measure (e.g. cosine measure). In the next section we explain how this content representation
looks like for each dataset.
   We evaluate three diversity-aware scenarios. First, giving higher weights (𝑑(𝑛)) to the more
diverse neighbor sessions (diverse neighbor), second, considering a higher weight (𝑤𝑖 (𝑖, 𝑠)) for
a candidate item with higher average dissimilarity with items of the current session (diverse
candidate) and finally the combination of both previous scenarios.


4. Dataset and Experimental Set-up
We used three news datasets, namely Roularta1 , Globo.com [4] and Adressa [17], which are
described in Table 1, to evaluate the performance of the proposed diversity-aware scenarios.
To calculate content dissimilarity explained in the previous section we should form a content
representation for news articles. For Roularta and Globo.com datasets the content representation
is formed based on news article text and tags. The CNN based deep neural network approach
proposed by [4] is used to generate article embeddings for news articles of these datasets based
on article text and tags. Since the text of articles are not available for Adressa dataset, we use
multi-hot encoding of article tags to represent article content in this dataset.
   We compared the performance of SKNN (k=100) with the three proposed diversity-aware
scenarios, namely diverse neighbor (SKNN_D), diverse candidate (SKNN_C) and the combi-
nation of them (SKNN_DC) based on three performance measures, namely precision (p@k),
expected intra-list Diversity (d@k) and Rank and Relevance sensitive expected intra-list Di-
versity (rrd@k) [9]. p@k is a standard information retrieval accuracy measure that evaluates
    1
        Dataset obtained from Roularta Media Group, a Belgian multimedia group.



                                                            62
the model in predicting the relevant items in the top k recommendation. d@k is the average
content dissimilarity between pairs of items in top k recommendation and rrd@k is another
diversity measure that considers the ranks and relevance of top k recommendation in calculating
diversity.
   To form the train and test sets, we use the approach by [16]. In this approach the datasets
are split into five partitions with same duration. The sessions in the last day of each partition
are considered as the test sessions. In these sessions the last two items are regarded as the test
items. The accuracy measure is calculated based on ability of the model in predicting these
test items in the test sessions. The reported performance in the next section is the average
performance of the model over these five partitions.


5. Result and Discussion
The results of proposed scenarios2 with regard to p@20, d@20 and rrd@20 are reported in
Table 2. For Roularta dataset the proposed scenarios can enhance d@20 at the cost of reduced
p@20. This indicates that enhancing the diversity of recommendation lists makes the model
predictions less accurate in this dataset. Moreover, the proposed scenarios can improve rrd@20
which implies that these approaches have better trade-offs between diversity and accuracy
compared to the original SKNN. Interestingly, in Globo.com and Adressa datasets the SKNN_D
scenario can improve diversity without deteriorating the accuracy compared to SKNN. With
regard to rrd@20, the combined scenario (SKNN_DC) has the best performance in both diversity
measures compared to the other approaches in all datasets. Moreover, SKNN_C has better
performance in both diversity measures compared to SKNN_D in all datasets.
   The recommendation lists have more diversity when the more diverse neighbors are selected
in predictions and when the candidate item that has more dissimilarity with the user history
is recommended. The nearest neighbors convey the collaborative information and according
to the results using the more diverse collaborative information gives us a trade-off between
diversity and accuracy. diverse candidate approach is based on content-based information and
has a greater impact on the diversity of recommendations. To show how this approach addresses
the filter bubble phenomenon the recommended lists generated by the diversity-aware model
(SKNN_DC) and the original model (SKNN ) for a test session in Adressa dataset are assessed.
The diversified list contains 43 unique tags whereas the list generated by the original model only
covers 25 tags. This indicates that the diversity-aware model offers a wider range of content to
the user. Diversification does not necessarily come with high accuracy loss. Based on the results
one can increase the diversity of a news recommender with relatively low or even without
accuracy drop.


6. Conclusion
The main contribution of this study is to make an SKNN news recommender system diversity-
aware. In news aggregator websites focusing only on prediction accuracy of the recommender

   2
       The source code is available on https://itec.kuleuven-kulak.be/supportingmaterial



                                                        63
Table 1
Datasets descriptions
                                       Roularta              Globo.com                 Adressa
 # Sessions (anonymous users)           446,117               1,048,389                546,949
 # Items                                 37,188                 45,559                 13,604
 Timespan                               26 days                16 days                 7 days
 Language                           Dutch and French         Portuguese              Norwegian
 Representation                    content embedding     content embedding     tags multi-hot encoding


Table 2
Results of diversity-aware news session-based recommender system
                        Roularta                       Globo.com                     Adressa
               p@20      d@20      rrd@20    p@20       d@20       rrd@20   p@20     d@20      rrd@20
 SKNN         0.0383    0.1878     0.0090   0.0356      0.3547     0.0209   0.0590   0.3739      0.0316
 SKNN_C       0.0315    0.2378     0.0102   0.0341      0.3711     0.0230   0.0514   0.4718      0.0354
 SKNN_D       0.0364    0.2038     0.0093   0.0357      0.3622     0.0207   0.0590   0.4132      0.0341
 SKNN_DC      0.0305    0.2432     0.0102   0.0333      0.3778     0.0234   0.0512   0.4783      0.0355


can burst the filter bubble phenomenon and can intensify polarization and fragmentation among
users. Diversification is a way to address these issues in news recommenders. We proposed
three scenarios to diversify the recommendation lists generated by SKNN, which is a session-
based recommendation system. According to the results the combined scenario improves the
rrd@k, which is a rank and relevance sensitive diversity measure, in all news datasets. This
result is remarkable since in addition to being rank and relevance aware, it also shows that the
diversification scenario addresses the filter bubble phenomenon by improving the diversity of
recommendations lists.
    For future extension, we propose to personalize the diversification level in recommendation
lists based on the diversity of the current session. Another interesting direction for future work
is to diversify news recommendations based on multiple aspects such as text, tags, sentiment and
polarity of news articles. Moreover, we will assess the possibility of enhancing the diversity of
model-based SBRSs such as Factorizing Personalized Markov Chains (FPMC) [18], GRU4REC [19]
and CHAMELEON [4]. In the loss functions of these model-based methods regularization terms
that penalize similar contents should be applied. Finally, we will apply the proposed scenarios
on other domains such as music and e-commerce recommenders. In these domains there are
other types of metadata such as lyrics, genres, artists, item descriptions or a hierarchy of item
categories that should be used to diversify recommendations.


Acknowledgments
This work was executed within the imec.icon project NewsButler, a research project bringing
together academic researchers (KU Leuven, VUB) and industry partners (Roularta Media Group,



                                                  64
Bothrs and ML6). The NewsButler project is co-financed by imec and receives project support
from Flanders Innovation & Entrepreneurship (project nr. HBC.2017.0628).


References
 [1] M. Jugovac, D. Jannach, M. Karimi, Streamingrec: a framework for benchmarking stream-
     based news recommenders, in: Proceedings of the 12th ACM Conference on Recommender
     Systems, 2018, pp. 269–273.
 [2] M. Karimi, D. Jannach, M. Jugovac, News recommender systems–survey and roads ahead,
     Information Processing & Management 54 (2018) 1203–1227.
 [3] A. Gharahighehi, C. Vens, Extended bayesian personalized ranking based on consumption
     behavior, in: Postproceedings of the 31st Benelux Conference on Artificial Intelligence
     (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn
     2019), Springer, 2019, p. (to appear).
 [4] P. M. Gabriel De Souza, D. Jannach, A. M. Da Cunha, Contextual hybrid session-based news
     recommendation with recurrent neural networks, IEEE Access 7 (2019) 169185–169203.
 [5] D. Jannach, M. Ludewig, When recurrent neural networks meet the neighborhood for
     session-based recommendation, in: Proceedings of the Eleventh ACM Conference on
     Recommender Systems, 2017, pp. 306–310.
 [6] N. Helberger, On the democratic role of news recommenders, Digital Journalism 7 (2019)
     993–1012.
 [7] M. Kaminskas, D. Bridge, Diversity, serendipity, novelty, and coverage: a survey and empir-
     ical analysis of beyond-accuracy objectives in recommender systems, ACM Transactions
     on Interactive Intelligent Systems (TiiS) 7 (2016) 1–42.
 [8] B. Smyth, P. McClave, Similarity vs. diversity, in: International conference on case-based
     reasoning, Springer, 2001, pp. 347–361.
 [9] S. Vargas, P. Castells, Rank and relevance in novelty and diversity metrics for recommender
     systems, in: Proceedings of the fifth ACM conference on Recommender systems, 2011, pp.
     109–116.
[10] J. P. Kelly, D. Bridge, Enhancing the diversity of conversational collaborative recommen-
     dations: a comparison, Artificial Intelligence Review 25 (2006) 79–95.
[11] S. Vargas, P. Castells, D. Vallet, Intent-oriented diversity in recommender systems, in:
     Proceedings of the 34th international ACM SIGIR conference on Research and development
     in Information Retrieval, 2011, pp. 1211–1212.
[12] T. Jambor, J. Wang, Optimizing multiple objectives in collaborative filtering, in: Proceed-
     ings of the fourth ACM conference on Recommender systems, 2010, pp. 55–62.
[13] A. Said, B. Fields, B. J. Jain, S. Albayrak, User-centric evaluation of a k-furthest neighbor
     collaborative filtering recommender algorithm, in: Proceedings of the 2013 conference on
     Computer supported cooperative work, 2013, pp. 1399–1408.
[14] Y. Shi, X. Zhao, J. Wang, M. Larson, A. Hanjalic, Adaptive diversification of recommenda-
     tion results via latent factor portfolio, in: Proceedings of the 35th international ACM SIGIR
     conference on Research and development in information retrieval, 2012, pp. 175–184.
[15] R. Su, L. Yin, K. Chen, Y. Yu, Set-oriented personalized ranking for diversified top-n



                                               65
     recommendation, in: Proceedings of the 7th ACM conference on Recommender systems,
     2013, pp. 415–418.
[16] M. Ludewig, D. Jannach, Evaluation of session-based recommendation algorithms, User
     Modeling and User-Adapted Interaction 28 (2018) 331–390.
[17] J. A. Gulla, L. Zhang, P. Liu, Ö. Özgöbek, X. Su, The adressa dataset for news recommen-
     dation, in: Proceedings of the international conference on web intelligence, 2017, pp.
     1042–1048.
[18] S. Rendle, C. Freudenthaler, L. Schmidt-Thieme, Factorizing personalized markov chains
     for next-basket recommendation, in: Proceedings of the 19th international conference on
     World wide web, 2010, pp. 811–820.
[19] B. Hidasi, A. Karatzoglou, Recurrent neural networks with top-k gains for session-based
     recommendations, in: Proceedings of the 27th ACM International Conference on Informa-
     tion and Knowledge Management, 2018, pp. 843–852.




                                             66