=Paper=
{{Paper
|id=Vol-2554/paper3
|storemode=property
|title=On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-2554/paper_03.pdf
|volume=Vol-2554
|authors=Gabriel De Souza P. Moreira,Dietmar Jannach,Adilson Marques Da Cunha
|dblpUrl=https://dblp.org/rec/conf/recsys/MoreiraJC19
}}
==On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems==
On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems Gabriel de Souza P. Moreira* Dietmar Jannach Adilson Marques da Cunha CI&T University of Klagenfurt Instituto Tecnológico de Campinas, SP, Brazil Klagenfurt, Austria Aeronáutica gspmoreira@gmail.com dietmar.jannach@aau.at São José dos Campos, SP, Brazil cunha@ita.br ABSTRACT possibly thousands of new articles published each day [38]. At News recommender systems are designed to surface relevant the same time, these articles become outdated very quickly [5]. information for online readers by personalizing their user ex- Second, on many news sites, we have to deal with user cold- periences. A particular problem in that context is that online start, when users are anonymous or not logged-in [7, 22, 25], readers are often anonymous, which means that this personal- which means that personalization has to be based on a few ization can only be based on the last few recorded interactions observed interactions (e.g., clicks) of the user. with the user, a setting named session-based recommendation. In many application domains of recommenders, collabora- Another particularity of the news domain is that constantly tive filtering techniques, which only rely on observed prefer- fresh articles are published, which should be immediately con- ence patterns in a user community, have proven to be highly sidered for recommendation. To deal with this item cold-start effective in the past. However, in the particular domain of problem, it is important to consider the actual content of news recommendation, the use of hybrid techniques, which items when recommending. Hybrid approaches are therefore also consider the actual content of a news item, have of- often considered as the method of choice in such settings. In ten shown to be preferable to deal with item cold-start, see this work, we analyze the importance of considering content e.g., [2, 8, 22, 23, 25, 26, 37, 39]. information in a hybrid neural news recommender system. We Likewise, to deal with user cold-start issues, session-based contrast content-aware and content-agnostic techniques and recommendation techniques received more research interest also explore the effects of using different content encodings. in recent years. In these approaches, the provided recommen- Experiments on two public datasets confirm the importance dations are not based on long-term preference profiles, but of adopting a hybrid approach. Furthermore, we show that solely on adapting recommendations according to the most the choice of the content encoding can have an impact on recent observed interactions of the current user. the resulting performance. Technically, a number of algorithmic approaches can be applied for this problem, from rule-learning techniques, over CCS CONCEPTS nearest-neighbor schemes, to more complex sequence learn- ing methods and deep learning approaches. For an overview Information systems Recommender systems; see [34]. Among the neural methods, Recurrent Neural Net- Computing methodologies Neural networks; works (RNN) are a natural choice for learning sequential models [12, 21]. Attention mechanisms have also been used KEYWORDS for session-based recommendation [27]. Recommender Systems; Hybrid Systems; News Recommen- The goal of this work is to investigate two aspects of hybrid dation; Session-Based Recommendation; Recurrent Neural session-based news recommendation using neural networks. Networks Our first goal is to understand the value of considering content information in a hybrid system. Second, we aim to investigate 1 INTRODUCTION & BACKGROUND to what extent the choice of the mechanism for encoding the Many of today’s major media and news aggregator websites, articles’ textual content matters. To that purpose, we have including The New York Times [38], The Washington Post [9], made experiments with various encoding mechanisms, includ- Google News [5], and Yahoo! News [39], provide automated ing unsupervised (like Latent Semantic Analysis and doc2vec) reading recommendations for their users. News recommen- and supervised ones. Our experiments were made using a re- dation, while being one of the earliest application fields of alistic streaming-based evaluation protocol. The outcomes of recommenders, is often still considered a challenging problem our studies, which were based on two public datasets, confirm for a many reasons [16]. the usefulness of considering content information. However, Among them, there are two types of cold-start problems. the quality and detail of the content representation matters, First, there is the permanent item cold-start problem. In which means that care of these aspects should be taken in the news domain, we have to deal with a constant stream of practical settings. Second, we found that the specific docu- ment encoding can makes a difference in recommendations * Also with Brazilian Aeronautics Institute of Technology. © quality, but sometimes those differences are small. Finally, Copyright 2019 for this paper by its authors. Use permitted under we found that content-agnostic nearest-neighbor methods, Creative Commons License Attribution 4.0 International (CC BY 4.0). INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al. which are considered highly competitive with RNN-based techniques in other scenarios [14, 28], were falling behind on different performance measures compared to the used neural approach. 2 METHODOLOGY To conduct our experiments, we have implemented different instantiations of our deep learning meta-architecture for news recommendation called CHAMELEON [32, 33]. The main component of the architecture is the Next-Article Recommen- dation (NAR) module, which processes various types of input features, including pre-trained Article Content Embeddings (ACE) and contextual information about users (e.g., time, location, device) and items (e.g., recent popularity, recency). These inputs are provided for all clicks of a user observed in the current session to generate next-item recommendations based on an RNN (e.g., GRU, LSTM). The ACEs are produced by the Article Content Repre- sentation (ACR) module. The input to the module is the article’s text, represented as a sequence of word embeddings (e.g. using Word2Vec [31]), pre-trained on a large corpus. Figure 1: A simplified overview of CHAMELEON. These embeddings are further processed by feature extrac- The components for which we tested different vari- tors, which can be instantiated as Convolutional Neural Net- ants are shaded. works (CNN) or RNNs. The ACR module’s neural network is trained in a supervised manner for a side task: to predict Table 1: Alternative content processing techniques. metadata attributes of an article, such as categories or tags. Figure 1 illustrates how the Article Content Embeddings are Technique Input Description used within CHAMELEON ’s processing chain to provide No-ACE None In this setting, no content representation next-article recommendations. is used as input. Supervised In this work, we first analyzed the importance of consider- CNN word2vec 5 A 1D-CNN-based model trained to classify ing article content information for recommendations. Second, the articles’ metadata (e.g., category). The we experimented with different techniques for textual content architecture combines three CNNs, with window sizes of 3, 4, and 5 to model n- representation1 , and investigated how they might affect rec- grams. The output of an intermediate layer ommendation quality. The different variants that were tested is used as textual representation. For more 2 details see [32, 33] are listed in Table 1. GRU word2vec Similar to the CNN-based version, a GRU For the experiments, CHAMELEON ’s NAR module took layer is trained to classify metadata. The the following features as input, described in more detail in outputs of the GRU layer are max-pooled to generate representations. [33] 3 : (1) Article Content Embeddings (generated by the Unsupervised different techniques presented in Table 1), (2) article meta- LSA Raw text Traditional Latent Semantic Analysis data (category and author4 ), (3) article context (novelty (LSA) [6]. We used a variation based on TF-IDF vectors [36] and Truncated SVD and recency), (4) user context (city, region, country, device [11]. type, operational system, hour of the day, day of the week, W2V*TF- word2vec TF-IDF weighted word embeddings [24], a referrer). IDF technique to represent a piece of text as the average of its word embeddings weighted by TF-IDF [36]. 1 doc2vec Raw text Paragraph Vector (a.k.a doc2vec) [19] As there were some very long articles, the text was truncated after learns fixed-length feature representations the first 12 sentences, and concatenated with the title. Article Content from variable-length pieces of texts, which Embeddings (ACE) produced by the selected techniques were L2 - are trained via the distributed memory and normalized to make the feature scale similar, but also to preserve high distributed bag of words models. similarity scores for embeddings from similar articles. 2 We also experimented with Sequence Autoencoders GRU (adapted from SA-LSTM [4]) to extract textual features by reconstructing the 3 EXPERIMENTAL SETUP sequence of input word embeddings, but this technique did not lead to better results than the other unsupervised methods. We adopt a temporal offline evaluation method as proposed 3 Note that the experiments reported here did not include the trainable in [32, 33], which simulates a streaming flow of new user Article ID feature used in the experiments from [33], which can lead interactions (clicks) and articles being published. Since in to a slightly improved accuracy, but possibly reduces the differences observed between the content representations. practical environments it is highly important to quickly react 4 Article author and user city are available only for the Adressa dataset. 5 Portuguese: A pre-trained Word2Vec [31] skip-gram model (300 di- Norwegian: a skip-gram model (100 dimensions) is available at mensions) is available at http://nilc.icmc.usp.br/embeddings; and http://vectors.nlpl.eu/repository (model #100). On the Importance of News Content Representation in Hybrid Neural... INRA’19, September, 2019, Copenhagen, Denmark to incoming events [15, 17, 30], the baseline recommender ones viewed at least once in the last hour by any user. To methods are constantly updated over time. CHAMELEON ’s measure novelty, we used the ESI-R@n metric [33], adapted NAR module also supports online learning. The training from [1, 41, 42]. The metric is based on item popularity and process of CHAMELEON emulates a streaming scenario returns higher values when long-tail items are among the with mini-batches, in which each user session is used for top-n recommendations. training only once. Such a scalable approach is different from other techniques, like GRU4Rec [12], which require training 3.3 Datasets for some epochs on a larger set of past interactions to reach We use two public datasets from news portals: high accuracy. (1) Globo.com (G1 ) dataset - Globo.com is the most popular 3.1 Evaluation Protocol media company in Brazil. The dataset7 was collected at the G1 news portal, which has more than 80 million unique users The evaluation process works as follows: and publishes over 100,000 new articles per month; and (1) The recommenders are continuously trained on user (2) SmartMedia Adressa - This dataset contains approxi- sessions ordered by time and grouped by hours. Every five mately 20 million page visits from a Norwegian news por- hours, the recommenders are evaluated on sessions from the tal [10]. In our experiments we used its complete version8 , next hour. With this interval of five hours (not a divisor of which includes article text and click events of about 2 million 24 hours), we cover different hours of the day for evaluation. users and 13,000 articles. After the evaluation of the next hour was done, this hour is also considered for training, until the entire dataset is Both datasets include the textual content of the news arti- covered.6 Note that CHAMELEON ’s model is only updated cles, article metadata (such as publishing date, category, and after all events of the test hour are processed. This allows us author), and logged user interactions (page views) with con- to emulate a realistic production scenario where the model is textual information. Since we are focusing on session-based trained and deployed once an hour to serve recommendations news recommendations and short-term users preferences, it is for the next hour; not necessary to train algorithms for long periods. Therefore, and because articles become outdated very quickly, we have (2) For each session in the test set, we incrementally reveal selected all available user sessions from the first 16 days for one click after the other to the recommender, as done, e.g., both datasets for our experiments. in [12, 35]; In a pre-processing step, like in [8, 28, 40], we organized the (3) For each click to be predicted, we sample a random set data into sessions using a 30 minute threshold of inactivity containing 50 recommendable articles (the ones that received as an indicator of a new session. Sessions were then sorted by at least one click by any user in the preceding hour) that were timestamp of their first click. From each session, we removed not viewed by the user in their session (negative samples) repeated clicks on the same article, as we are not focusing plus the true next article (positive sample), as done in [3] on the capability of algorithms to act as reminders as in [20]. and [18]. We then evaluate the algorithms for the task of Sessions with only one interaction are not suitable for next- ranking those 51 items; and click prediction and were discarded. Sessions with more than (4) Given these rankings, standard information retrieval (top- 20 interactions (stemming from outlier users with an unusual n) metrics can be computed. behavior or from bots) were truncated. The characteristics of the resulting pre-processed datasets 3.2 Metrics are shown in Table 2. Coincidentally, the datasets are similar As relevant quality factors from the news domain [16], we in many statistics, except for the total number of published considered accuracy, item coverage, and novelty. To determine articles, which is much higher for G1 than for the Adressa the metrics, we took measurements at list length 10. As dataset. accuracy metrics, we used the Hit Rate (HR@n), which checks whether or not the true next item appears in the top-n ranked Table 2: Statistics of the datasets used for the exper- items, and the Mean Reciprocal Rank (MRR@n), a ranking iments. metric that is sensitive to the position of the true next item. Both metrics are common when evaluating session-based Globo.com (G1) Adressa recommendation algorithms [12, 15, 28]. Language Portuguese Norwegian Period (days) 16 16 Since it is sometimes important that a news recommender # users 322,897 314,661 not only focuses on a small set of items, we also considered # sessions 1,048,594 982,210 # clicks 2,988,181 2,648,999 Item Coverage (COV@n) as a quality criterion. We computed # articles 46,033 13,820 item coverage as the number of distinct articles that appeared Avg. session length 2.84 2.70 in any top-n list divided by the number of recommendable articles [13]. In our case, the recommendable articles are the 6 Our dataset consists of 16 days. We used the first 2 days to learn an 7 initial model for the session-based algorithms and report the averaged https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom 8 measures after this warm-up. http://reclab.idi.ntnu.no/dataset INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al. 3.4 Baselines Table 4: Results for the G1 dataset. The baselines used in our experiments are summarized in Recommender HR@10 MRR@10 COV@10 ESI-R@10 Table 3. While some baselines appear conceptually simple, CHAMELEON with ACEs generated differently recent work has shown that they are often able to outperform No-ACE 0.6281 0.3066 0.6429 6.3169 CNN 0.6585 0.3395 0.6493 6.2874 very recent neural approaches for session-based recommenda- GRU 0.6585 0.3388 0.6484 6.2674 tion tasks [14, 28, 29]. Unlike neural methods like GRU4REC, W2V*TF-IDF 0.6575 0.3291 0.6500 6.4187 these methods can be continuously updated over time to LSA 0.6686* 0.3423 0.6452 6.3833 doc2vec 0.6368 0.3119 0.6431 6.4345 take newly published articles into account. A comparison Baselines of GRU4REC with some of our baselines in a streaming SR 0.5911 0.2889 0.2757 5.9743 scenario is provided in [15], and specifically in the news do- Item-kNN 0.5707 0.2801 0.3892 6.5898 CO 0.5699 0.2625 0.2496 5.5716 main in [32], which is why we do not include GRU4REC and RP 0.4580 0.1994 0.0220 4.4904 similar methods here. CB 0.3703 0.1746 0.6855* 8.1683* Table 3: Baseline recommendation algorithms. Table 5: Results for the Adressa dataset. Association Rules-based and Neighborhood Meth- ods Recommender HR@10 MRR@10 COV@10 ESI-R@10 Co-Occurrence Recommends articles commonly viewed to- CHAMELEON with ACEs generated differently (CO) gether with the last read article in previous No-ACE 0.6816 0.3252 0.8185 5.2453 user sessions [15, 28]. CNN 0.6860 0.3333 0.8103 5.2924 Sequential Rules The method also uses association rules of GRU 0.6856 0.3327 0.8096 5.2861 (SR) size two. It however considers the sequence W2V*TF-IDF 0.6913 0.3402 0.7976 5.3273 of the items within a session and uses a LSA 0.6935 0.3403 0.8013 5.3347 weighting function when two items do not doc2vec 0.6898 0.3402 0.7968 5.3417 immediately appear after each other [28]. Baselines Item-kNN Returns the most similar items to the last read article using the cosine similarity be- SR 0.6285 0.3020 0.4597 5.4445 tween their vectors of co-occurrence with Item-kNN 0.6136 0.2769 0.5287 5.4668 other items within sessions. This method CO 0.6178 0.2819 0.4198 5.0785 has been commonly used as a baseline for RP 0.5647 0.2481 0.0542 4.1464 neural approaches, e.g., in [12].9 CB 0.3273 0.1197 0.8807* 7.6534* Non-personalized Methods Recently Popu- This method recommends the most viewed the long-established LSA method was the best performing lar (RP) articles within a defined set of recently ob- served user interactions on the news portal technique to represent the content for both datasets in terms (e.g., clicks during the last hour). Such a of accuracy, even when compared to more recent techniques strategy proved to be very effective in the using pre-trained word embeddings, such as the CNN and 2017 CLEF NewsREEL Challenge [30]. Content-Based For each article read by the user, this GRU. (CB) method suggests recommendable articles For the G1 dataset, the Hit Rates (HR) were improved by with similar content to the last clicked arti- cle, based on the cosine similarity of their around 7% and the MRR by almost 12% when using the LSA Article Content Embeddings (generated by representation instead of the No-ACE setting. For the Adressa the CNN technique described in Table 1). dataset, the difference between the No-ACE settings and the hybrid methods leveraging text are less pronounced. The improvement using LSA compared to the No-ACE setting Replicability. We publish the data and source code used in was around 2% for HR and 5% for MRR. our experiments online10 , including the code for CHAMELEON, Furthermore, for the Adressa dataset, it is possible to ob- which is implemented using TensorFlow. serve that all the unsupervised methods (LSA, W2V*TF-IDF, and doc2vec) for generating ACEs performed better than the 4 EXPERIMENTAL RESULTS supervised ones, differently from the G1 dataset. A possible The results for the G1 and Adressa datasets after (hyper- explanation can be that the supervised methods depend more )parameter optimization for all methods are presented11 in on the quality and depth of the available article metadata Tables 4 and 5. information. While the G1 dataset uses a fine-grained cate- Accuracy Results. In general, we can observe that consid- gorization scheme (461 categories), the categorization of the ering content information is in fact highly beneficial in terms Adressa dataset is much more coarse (41 categories). of recommendation accuracy. It is also possible to see that Among the baselines, SR leads to the best accuracy results, the choice of the article representation matters. Surprisingly, but does not match the performance of the content-agnostic 9 We also made experiments with session-based methods proposed in No-ACE settings for an RNN. This indicates that the hybrid [28] (e.g. V-SkNN), but they did not lead to results that were better approach of considering additional contextual information, than the SR and CO methods. 10 https://github.com/gabrielspmoreira/chameleon recsys as done by CHAMELEON ’s NAR module in this condition, 11 The highest values for a given metric are highlighted in bold. The is important. best values for the CHAMELEON configurations are printed in italics. Recommending only based on content information (CB ), If the best results are significantly different (𝑝 < 0.001) from all other algorithms, they are marked with *. We used paired Student’s t-tests as expected, does not lead to competitive accuracy results, with Bonferroni correction for significance tests. because the popularity of the items is not taken into account On the Importance of News Content Representation in Hybrid Neural... INRA’19, September, 2019, Copenhagen, Denmark (which SR and neighborhood-based methods implicitly do). Science 41, 6 (1990), 391–407. Recommending only recently popular articles (RP ) works [7] Jorge Dı́ez Peláez, David Martı́nez Rego, Amparo Alonso Betanzos, Óscar Luaces Rodrı́guez, and Antonio Bahamonde Rionda. 2016. better than CB, but does not match the performance of the Metrical Representation of Readers and Articles in a Digital other methods. Newspaper. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys 2016). Coverage and Novelty. In terms of coverage (COV@10 ), [8] Elena Viorica Epure, Benjamin Kille, Jon Espen Ingvaldsen, Re- the simple Content-Based (CB) method leads to the highest becca Deneckere, Camille Salinesi, and Sahin Albayrak. 2017. Recommending Personalized News in Short User Sessions. In Pro- value, as it recommends across the entire spectrum based ceedings of the Eleventh ACM Conference on Recommender solely on content similarity, without considering the popular- Systems (RecSys’17). 121–129. [9] Ryan Graff. 2015. How the Washington Post used data ity of the items. It is followed by the various CHAMELEON and natural language processing to get people to read instantiations, where it turned out that the specifically chosen more news. https://knightlab.northwestern.edu/2015/06/03/ content representation is not too important in this respect. how-the-washington-posts-clavis-tool-helps-to-make-news-personal/. (June 2015). As expected, the CB method also frequently recommends [10] Jon Atle Gulla, Lemei Zhang, Peng Liu, Özlem Özgöbek, and long-tail items, which also leads to the highest value in terms Xiaomeng Su. 2017. The Adressa dataset for news recommenda- of novelty (ESI-R@10 ). The popularity-based method (RP ), tion. In Proceedings of the International Conference on Web Intelligence (WI’17). 1042–1048. in contrast, leads to the lowest novelty value. From the other [11] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. methods, the traditional Item-KNN method, to some surprise, Finding structure with randomness: Probabilistic algorithms for leads to the best novelty results, even though neighborhood- constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217–288. based methods have a certain popularity bias. Looking at the [12] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and other configurations, using unsupervised methods to represent Domonkos Tikk. 2016. Session-based recommendations with re- current neural networks. In Proceedings of Fourth International the text of the articles can help to drive the recommendations Conference on Learning Representations (ICLR’16). a bit away from the popular ones. [13] Dietmar Jannach, Lukas Lerche, Iman Kamehkhosh, and Michael Jugovac. 2015. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User 5 SUMMARY AND CONCLUSION Modeling and User-Adapted Interaction 25, 5 (2015), 427–491. The consideration of content information for news recom- [14] Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recom- mendation proved to be important in the past, and therefore mendation. In Proceedings of the Eleventh ACM Conference on many hybrid systems were proposed in the literature. In this Recommender Systems (RecSys’17). 306–310. work, we investigated the relative importance of incorporat- [15] Michael Jugovac, Dietmar Jannach, and Mozhgan Karimi. 2018. StreamingRec: A Framework for Benchmarking Stream-based ing content information in both streaming- and session-based News Recommenders. In Proceedings of the Twelfth ACM Con- recommendation scenarios. Our experiments highlighted the ference on Recommender Systems (RecSys ’18). 306–310. value of content information by showing that it helped to out- [16] Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018. News recommender systems–Survey and roads ahead. Information perform otherwise competitive baselines. Furthermore, the Processing & Management 54, 6 (2018), 1203–1227. experiments also demonstrated that the choice of the article [17] Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha Larson, and Torben Brodt. 2017. CLEF 2017 NewsREEL representation can matter. However, the value of consider- Overview: Offline and Online Evaluation of Stream-based News ing additional content information in the process depends Recommender Systems. In Working Notes of CLEF 2017 – Con- on the quality and depth of the available data, especially ference and Labs of the Evaluation Forum. [18] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix fac- for supervised methods. From a practical perspective, this torization techniques for recommender systems. IEEE Computer indicates that quality assurance and curation of the content 42, 8 (2009). information can be essential to obtain better results. [19] Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st Interna- tional Conference on Machine Learning (ICML ’14). 1188–1196. REFERENCES [20] Lukas Lerche, Dietmar Jannach, and Malte Ludewig. 2016. On [1] Pablo Castells, Neil J. Hurley, and Saul Vargas. 2015. Novelty and the Value of Reminders within E-Commerce Recommendations. Diversity in Recommender Systems. In Recommender Systems In Proceedings of the 2016 Conference on User Modeling Adap- Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira tation and Personalization, (UMAP’16). (Eds.). Springer US, 881–918. [21] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and [2] Wei Chu and Seung-Taek Park. 2009. Personalized recommen- Jun Ma. 2017. Neural Attentive Session-based Recommendation. dation on dynamic content using predictive bilinear models. In In Proceedings of the 2017 ACM on Conference on Information Proceedings of the 18th International Conference on World and Knowledge Management (CIKM ’17). 1419–1428. Wide Web (WWW’09). 691–700. [22] Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Pad- [3] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Per- manabhan. 2011. SCENE: a scalable two-stage personalized news formance of recommender algorithms on top-n recommendation recommendation system. In Proceedings of the 34th Interna- tasks. In Proceedings of the fourth ACM Conference on Recom- tional Conference on Research and Development in Information mender Systems (RecSys ’10). 39–46. Retrieval (SIGIR’11). 125–134. [4] Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence [23] Lei Li, Li Zheng, Fan Yang, and Tao Li. 2014. Modeling and broad- learning. In Advances in neural information processing systems. ening temporal user interest in personalized news recommendation. 3079–3087. Expert Systems with Applications 41, 7 (2014), 3168–3177. [5] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam [24] Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support Rajaram. 2007. Google news personalization: scalable online vector machines and word2vec for text classification with semantic collaborative filtering. In Proceedings of the 16th International features. In 14th International Conference on Cognitive Infor- Conference on World Wide Web (WWW’07). 271–280. matics & Cognitive Computing (ICCI*CC ’15). 136–140. [6] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. [25] Chen Lin, Runquan Xie, Xinjun Guan, Lei Li, and Tao Li. 2014. Landauer, and Richard Harshman. 1990. Indexing by latent seman- Personalized news recommendation via implicit social experts. tic analysis. Journal of the American Society for Information Information Sciences 254 (2014), 1–18. INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al. [26] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personal- [34] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. ized news recommendation based on click behavior. In Proceedings Sequence-Aware Recommender Systems. ACM Computing Sur- of the 15th International Conference on Intelligent User Inter- veys (CSUR) 51, 4 (2018), 66. faces (IUI ’10). 31–40. [35] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and [27] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. Paolo Cremonesi. 2017. Personalizing Session-based Recommen- STAMP: Short-Term Attention/Memory Priority Model for dations with Hierarchical Recurrent Neural Networks. In Proceed- Session-based Recommendation. In Proceedings of the 24th ACM ings of the 11th ACM Conference on Recommender Systems SIGKDD International Conference on Knowledge Discovery & (RecSys’17). 130–137. Data Mining, (KDD ’18). 1831–1839. [36] Juan Ramos. 2003. Using TF-IDF to determine word relevance in [28] Malte Ludewig and Dietmar Jannach. 2018. Evaluation of Session- document queries. In Technical Report, Department of Computer based Recommendation Algorithms. User-Modeling and User- Science, Rutgers University. Adapted Interaction 28, 4–5 (2018), 331–390. [37] Junyang Rao, Aixia Jia, Yansong Feng, and Dongyan Zhao. 2013. [29] Malte Ludewig, Noemi Mauro, Sara Latifi, and Dietmar Jannach. Personalized news recommendation using ontologies harvested 2019. Performance Comparison of Neural and Non-Neural Ap- from the web. In International Conference on Web-Age Infor- proaches to Session-based Recommendation. In Proceedings of mation Management. 781–787. the 2019 ACM Conference on Recommender Systems (RecSys [38] A. Spangher. 2015. Building the Next New York Times Recom- 2019). mendation Engine. https://open.blogs.nytimes.com/2015/08/ [30] Cornelius A Ludmann. 2017. Recommending News Articles in 11/building-the-next-new-york-times-recommendation-engine/. the CLEF News Recommendation Evaluation Lab with the Data (Aug 2015). Stream Management System Odysseus.. In Working Notes of the [39] Michele Trevisiol, Luca Maria Aiello, Rossano Schifanella, and Conference and Labs of the Evaluation Forum (CLEF’17). Alejandro Jaimes. 2014. Cold-start news recommendation with [31] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and domain-dependent browse graph. In Proceedings of the 8th ACM Jeff Dean. 2013. Distributed representations of words and phrases Conference on Recommender Systems (RecSys’14). 81–88. and their compositionality. In Proceedings of Advances in Neural [40] Bartlomiej Twardowski. 2016. Modelling Contextual Information Information Processing Systems (NIPS ’13). 3111–3119. in Session-Aware Recommender Systems with Neural Networks. [32] Gabriel de Souza Pereira Moreira, Felipe Ferreira, and Adilson In Proceedings of the 10th ACM Conference on Recommender Marques da Cunha. 2018. News Session-Based Recommendations Systems (RecSys’16). 273–276. using Deep Neural Networks. In Proceedings of the 3rd Workshop [41] Saúl Vargas. 2015. Novelty and Diversity Evaluation and En- on Deep Learning for Recommender Systems (DLRS) at ACM hancement in Recommender Systems. PhD thesis. Universidad RecSys’18. 15–23. Autónoma de Madrid. [33] Gabriel de Souza Pereira Moreira, Dietmar Jannach, and Adil- [42] Saúl Vargas and Pablo Castells. 2011. Rank and relevance in son Marques da Cunha. 2019. Contextual Hybrid Session-based novelty and diversity metrics for recommender systems. In Pro- News Recommendation with Recurrent Neural Networks. arXiv ceedings of the fifth ACM Conference on Recommender Systems preprint arXiv:1904.10367 (2019). (RecSys’11). 109–116.