=Paper= {{Paper |id=Vol-2554/paper3 |storemode=property |title=On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-2554/paper_03.pdf |volume=Vol-2554 |authors=Gabriel De Souza P. Moreira,Dietmar Jannach,Adilson Marques Da Cunha |dblpUrl=https://dblp.org/rec/conf/recsys/MoreiraJC19 }} ==On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems== https://ceur-ws.org/Vol-2554/paper_03.pdf

On the Importance of News Content Representation in Hybrid
Neural Session-based Recommender Systems
Gabriel de Souza P. Moreira* Dietmar Jannach Adilson Marques da Cunha
CI&T University of Klagenfurt Instituto Tecnológico de
Campinas, SP, Brazil Klagenfurt, Austria Aeronáutica
gspmoreira@gmail.com dietmar.jannach@aau.at São José dos Campos, SP, Brazil
cunha@ita.br
ABSTRACT possibly thousands of new articles published each day [38]. At
News recommender systems are designed to surface relevant the same time, these articles become outdated very quickly [5].
information for online readers by personalizing their user ex- Second, on many news sites, we have to deal with user cold-
periences. A particular problem in that context is that online start, when users are anonymous or not logged-in [7, 22, 25],
readers are often anonymous, which means that this personal- which means that personalization has to be based on a few
ization can only be based on the last few recorded interactions observed interactions (e.g., clicks) of the user.
with the user, a setting named session-based recommendation. In many application domains of recommenders, collabora-
Another particularity of the news domain is that constantly tive filtering techniques, which only rely on observed prefer-
fresh articles are published, which should be immediately con- ence patterns in a user community, have proven to be highly
sidered for recommendation. To deal with this item cold-start effective in the past. However, in the particular domain of
problem, it is important to consider the actual content of news recommendation, the use of hybrid techniques, which
items when recommending. Hybrid approaches are therefore also consider the actual content of a news item, have of-
often considered as the method of choice in such settings. In ten shown to be preferable to deal with item cold-start, see
this work, we analyze the importance of considering content e.g., [2, 8, 22, 23, 25, 26, 37, 39].
information in a hybrid neural news recommender system. We Likewise, to deal with user cold-start issues, session-based
contrast content-aware and content-agnostic techniques and recommendation techniques received more research interest
also explore the effects of using different content encodings. in recent years. In these approaches, the provided recommen-
Experiments on two public datasets confirm the importance dations are not based on long-term preference profiles, but
of adopting a hybrid approach. Furthermore, we show that solely on adapting recommendations according to the most
the choice of the content encoding can have an impact on recent observed interactions of the current user.
the resulting performance. Technically, a number of algorithmic approaches can be
applied for this problem, from rule-learning techniques, over
CCS CONCEPTS nearest-neighbor schemes, to more complex sequence learn-
ing methods and deep learning approaches. For an overview

Information systems Recommender systems;
see [34]. Among the neural methods, Recurrent Neural Net-
Computing methodologies Neural networks;
works (RNN) are a natural choice for learning sequential
models [12, 21]. Attention mechanisms have also been used
KEYWORDS for session-based recommendation [27].
Recommender Systems; Hybrid Systems; News Recommen- The goal of this work is to investigate two aspects of hybrid
dation; Session-Based Recommendation; Recurrent Neural session-based news recommendation using neural networks.
Networks Our first goal is to understand the value of considering content
information in a hybrid system. Second, we aim to investigate
1 INTRODUCTION & BACKGROUND to what extent the choice of the mechanism for encoding the
Many of today’s major media and news aggregator websites, articles’ textual content matters. To that purpose, we have
including The New York Times [38], The Washington Post [9], made experiments with various encoding mechanisms, includ-
Google News [5], and Yahoo! News [39], provide automated ing unsupervised (like Latent Semantic Analysis and doc2vec)
reading recommendations for their users. News recommen- and supervised ones. Our experiments were made using a re-
dation, while being one of the earliest application fields of alistic streaming-based evaluation protocol. The outcomes of
recommenders, is often still considered a challenging problem our studies, which were based on two public datasets, confirm
for a many reasons [16]. the usefulness of considering content information. However,
Among them, there are two types of cold-start problems. the quality and detail of the content representation matters,
First, there is the permanent item cold-start problem. In which means that care of these aspects should be taken in
the news domain, we have to deal with a constant stream of practical settings. Second, we found that the specific docu-
ment encoding can makes a difference in recommendations
* Also with Brazilian Aeronautics Institute of Technology.
©
quality, but sometimes those differences are small. Finally,
Copyright 2019 for this paper by its authors. Use permitted under we found that content-agnostic nearest-neighbor methods,
Creative Commons License Attribution 4.0 International (CC BY 4.0).
INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al.

which are considered highly competitive with RNN-based
techniques in other scenarios [14, 28], were falling behind on
different performance measures compared to the used neural
approach.

2 METHODOLOGY
To conduct our experiments, we have implemented different
instantiations of our deep learning meta-architecture for news
recommendation called CHAMELEON [32, 33]. The main
component of the architecture is the Next-Article Recommen-
dation (NAR) module, which processes various types of input
features, including pre-trained Article Content Embeddings
(ACE) and contextual information about users (e.g., time,
location, device) and items (e.g., recent popularity, recency).
These inputs are provided for all clicks of a user observed in
the current session to generate next-item recommendations
based on an RNN (e.g., GRU, LSTM).
The ACEs are produced by the Article Content Repre-
sentation (ACR) module. The input to the module is the
article’s text, represented as a sequence of word embeddings
(e.g. using Word2Vec [31]), pre-trained on a large corpus.
Figure 1: A simplified overview of CHAMELEON.
These embeddings are further processed by feature extrac-
The components for which we tested different vari-
tors, which can be instantiated as Convolutional Neural Net-
ants are shaded.
works (CNN) or RNNs. The ACR module’s neural network
is trained in a supervised manner for a side task: to predict Table 1: Alternative content processing techniques.
metadata attributes of an article, such as categories or tags.
Figure 1 illustrates how the Article Content Embeddings are Technique Input Description
used within CHAMELEON ’s processing chain to provide No-ACE None In this setting, no content representation
next-article recommendations. is used as input.
Supervised
In this work, we first analyzed the importance of consider- CNN word2vec 5 A 1D-CNN-based model trained to classify
ing article content information for recommendations. Second, the articles’ metadata (e.g., category). The
we experimented with different techniques for textual content architecture combines three CNNs, with
window sizes of 3, 4, and 5 to model n-
representation1 , and investigated how they might affect rec- grams. The output of an intermediate layer
ommendation quality. The different variants that were tested is used as textual representation. For more
2 details see [32, 33]
are listed in Table 1. GRU word2vec Similar to the CNN-based version, a GRU
For the experiments, CHAMELEON ’s NAR module took layer is trained to classify metadata. The
the following features as input, described in more detail in outputs of the GRU layer are max-pooled
to generate representations.
[33] 3 : (1) Article Content Embeddings (generated by the Unsupervised
different techniques presented in Table 1), (2) article meta- LSA Raw text Traditional Latent Semantic Analysis
data (category and author4 ), (3) article context (novelty (LSA) [6]. We used a variation based on
TF-IDF vectors [36] and Truncated SVD
and recency), (4) user context (city, region, country, device [11].
type, operational system, hour of the day, day of the week, W2V*TF- word2vec TF-IDF weighted word embeddings [24], a
referrer). IDF technique to represent a piece of text as the
average of its word embeddings weighted
by TF-IDF [36].
1 doc2vec Raw text Paragraph Vector (a.k.a doc2vec) [19]
As there were some very long articles, the text was truncated after learns fixed-length feature representations
the first 12 sentences, and concatenated with the title. Article Content from variable-length pieces of texts, which
Embeddings (ACE) produced by the selected techniques were L2 - are trained via the distributed memory and
normalized to make the feature scale similar, but also to preserve high distributed bag of words models.
similarity scores for embeddings from similar articles.
2
We also experimented with Sequence Autoencoders GRU (adapted
from SA-LSTM [4]) to extract textual features by reconstructing the 3 EXPERIMENTAL SETUP
sequence of input word embeddings, but this technique did not lead
to better results than the other unsupervised methods. We adopt a temporal offline evaluation method as proposed
3
Note that the experiments reported here did not include the trainable in [32, 33], which simulates a streaming flow of new user
Article ID feature used in the experiments from [33], which can lead interactions (clicks) and articles being published. Since in
to a slightly improved accuracy, but possibly reduces the differences
observed between the content representations. practical environments it is highly important to quickly react
4
Article author and user city are available only for the Adressa dataset.
5
Portuguese: A pre-trained Word2Vec [31] skip-gram model (300 di- Norwegian: a skip-gram model (100 dimensions) is available at
mensions) is available at http://nilc.icmc.usp.br/embeddings; and http://vectors.nlpl.eu/repository (model #100).
On the Importance of News Content Representation in Hybrid Neural... INRA’19, September, 2019, Copenhagen, Denmark

to incoming events [15, 17, 30], the baseline recommender ones viewed at least once in the last hour by any user. To
methods are constantly updated over time. CHAMELEON ’s measure novelty, we used the ESI-R@n metric [33], adapted
NAR module also supports online learning. The training from [1, 41, 42]. The metric is based on item popularity and
process of CHAMELEON emulates a streaming scenario returns higher values when long-tail items are among the
with mini-batches, in which each user session is used for top-n recommendations.
training only once. Such a scalable approach is different from
other techniques, like GRU4Rec [12], which require training 3.3 Datasets
for some epochs on a larger set of past interactions to reach
We use two public datasets from news portals:
high accuracy.
(1) Globo.com (G1 ) dataset - Globo.com is the most popular
3.1 Evaluation Protocol media company in Brazil. The dataset7 was collected at the
G1 news portal, which has more than 80 million unique users
The evaluation process works as follows:
and publishes over 100,000 new articles per month; and
(1) The recommenders are continuously trained on user
(2) SmartMedia Adressa - This dataset contains approxi-
sessions ordered by time and grouped by hours. Every five
mately 20 million page visits from a Norwegian news por-
hours, the recommenders are evaluated on sessions from the
tal [10]. In our experiments we used its complete version8 ,
next hour. With this interval of five hours (not a divisor of
which includes article text and click events of about 2 million
24 hours), we cover different hours of the day for evaluation.
users and 13,000 articles.
After the evaluation of the next hour was done, this hour
is also considered for training, until the entire dataset is Both datasets include the textual content of the news arti-
covered.6 Note that CHAMELEON ’s model is only updated cles, article metadata (such as publishing date, category, and
after all events of the test hour are processed. This allows us author), and logged user interactions (page views) with con-
to emulate a realistic production scenario where the model is textual information. Since we are focusing on session-based
trained and deployed once an hour to serve recommendations news recommendations and short-term users preferences, it is
for the next hour; not necessary to train algorithms for long periods. Therefore,
and because articles become outdated very quickly, we have
(2) For each session in the test set, we incrementally reveal
selected all available user sessions from the first 16 days for
one click after the other to the recommender, as done, e.g.,
both datasets for our experiments.
in [12, 35];
In a pre-processing step, like in [8, 28, 40], we organized the
(3) For each click to be predicted, we sample a random set data into sessions using a 30 minute threshold of inactivity
containing 50 recommendable articles (the ones that received as an indicator of a new session. Sessions were then sorted by
at least one click by any user in the preceding hour) that were timestamp of their first click. From each session, we removed
not viewed by the user in their session (negative samples) repeated clicks on the same article, as we are not focusing
plus the true next article (positive sample), as done in [3] on the capability of algorithms to act as reminders as in [20].
and [18]. We then evaluate the algorithms for the task of Sessions with only one interaction are not suitable for next-
ranking those 51 items; and click prediction and were discarded. Sessions with more than
(4) Given these rankings, standard information retrieval (top- 20 interactions (stemming from outlier users with an unusual
n) metrics can be computed. behavior or from bots) were truncated.
The characteristics of the resulting pre-processed datasets
3.2 Metrics are shown in Table 2. Coincidentally, the datasets are similar
As relevant quality factors from the news domain [16], we in many statistics, except for the total number of published
considered accuracy, item coverage, and novelty. To determine articles, which is much higher for G1 than for the Adressa
the metrics, we took measurements at list length 10. As dataset.
accuracy metrics, we used the Hit Rate (HR@n), which checks
whether or not the true next item appears in the top-n ranked
Table 2: Statistics of the datasets used for the exper-
items, and the Mean Reciprocal Rank (MRR@n), a ranking
iments.
metric that is sensitive to the position of the true next item.
Both metrics are common when evaluating session-based Globo.com (G1) Adressa
recommendation algorithms [12, 15, 28]. Language Portuguese Norwegian
Period (days) 16 16
Since it is sometimes important that a news recommender # users 322,897 314,661
not only focuses on a small set of items, we also considered # sessions 1,048,594 982,210
# clicks 2,988,181 2,648,999
Item Coverage (COV@n) as a quality criterion. We computed # articles 46,033 13,820
item coverage as the number of distinct articles that appeared Avg. session length 2.84 2.70
in any top-n list divided by the number of recommendable
articles [13]. In our case, the recommendable articles are the
6
Our dataset consists of 16 days. We used the first 2 days to learn an 7
initial model for the session-based algorithms and report the averaged https://www.kaggle.com/gspmoreira/news-portal-user-interactions-by-globocom
8
measures after this warm-up. http://reclab.idi.ntnu.no/dataset
INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al.

3.4 Baselines Table 4: Results for the G1 dataset.
The baselines used in our experiments are summarized in Recommender HR@10 MRR@10 COV@10 ESI-R@10
Table 3. While some baselines appear conceptually simple, CHAMELEON with ACEs generated differently
recent work has shown that they are often able to outperform No-ACE 0.6281 0.3066 0.6429 6.3169
CNN 0.6585 0.3395 0.6493 6.2874
very recent neural approaches for session-based recommenda- GRU 0.6585 0.3388 0.6484 6.2674
tion tasks [14, 28, 29]. Unlike neural methods like GRU4REC, W2V*TF-IDF 0.6575 0.3291 0.6500 6.4187
these methods can be continuously updated over time to LSA 0.6686* 0.3423 0.6452 6.3833
doc2vec 0.6368 0.3119 0.6431 6.4345
take newly published articles into account. A comparison Baselines
of GRU4REC with some of our baselines in a streaming SR 0.5911 0.2889 0.2757 5.9743
scenario is provided in [15], and specifically in the news do- Item-kNN 0.5707 0.2801 0.3892 6.5898
CO 0.5699 0.2625 0.2496 5.5716
main in [32], which is why we do not include GRU4REC and RP 0.4580 0.1994 0.0220 4.4904
similar methods here. CB 0.3703 0.1746 0.6855* 8.1683*

Table 3: Baseline recommendation algorithms.
Table 5: Results for the Adressa dataset.
Association Rules-based and Neighborhood Meth-
ods Recommender HR@10 MRR@10 COV@10 ESI-R@10
Co-Occurrence Recommends articles commonly viewed to- CHAMELEON with ACEs generated differently
(CO) gether with the last read article in previous No-ACE 0.6816 0.3252 0.8185 5.2453
user sessions [15, 28]. CNN 0.6860 0.3333 0.8103 5.2924
Sequential Rules The method also uses association rules of GRU 0.6856 0.3327 0.8096 5.2861
(SR) size two. It however considers the sequence
W2V*TF-IDF 0.6913 0.3402 0.7976 5.3273
of the items within a session and uses a
LSA 0.6935 0.3403 0.8013 5.3347
weighting function when two items do not
doc2vec 0.6898 0.3402 0.7968 5.3417
immediately appear after each other [28].
Baselines
Item-kNN Returns the most similar items to the last
read article using the cosine similarity be- SR 0.6285 0.3020 0.4597 5.4445
tween their vectors of co-occurrence with Item-kNN 0.6136 0.2769 0.5287 5.4668
other items within sessions. This method CO 0.6178 0.2819 0.4198 5.0785
has been commonly used as a baseline for RP 0.5647 0.2481 0.0542 4.1464
neural approaches, e.g., in [12].9 CB 0.3273 0.1197 0.8807* 7.6534*
Non-personalized Methods
Recently Popu- This method recommends the most viewed the long-established LSA method was the best performing
lar (RP) articles within a defined set of recently ob-
served user interactions on the news portal technique to represent the content for both datasets in terms
(e.g., clicks during the last hour). Such a of accuracy, even when compared to more recent techniques
strategy proved to be very effective in the using pre-trained word embeddings, such as the CNN and
2017 CLEF NewsREEL Challenge [30].
Content-Based For each article read by the user, this GRU.
(CB) method suggests recommendable articles For the G1 dataset, the Hit Rates (HR) were improved by
with similar content to the last clicked arti-
cle, based on the cosine similarity of their
around 7% and the MRR by almost 12% when using the LSA
Article Content Embeddings (generated by representation instead of the No-ACE setting. For the Adressa
the CNN technique described in Table 1). dataset, the difference between the No-ACE settings and the
hybrid methods leveraging text are less pronounced. The
improvement using LSA compared to the No-ACE setting
Replicability. We publish the data and source code used in was around 2% for HR and 5% for MRR.
our experiments online10 , including the code for CHAMELEON, Furthermore, for the Adressa dataset, it is possible to ob-
which is implemented using TensorFlow. serve that all the unsupervised methods (LSA, W2V*TF-IDF,
and doc2vec) for generating ACEs performed better than the
4 EXPERIMENTAL RESULTS supervised ones, differently from the G1 dataset. A possible
The results for the G1 and Adressa datasets after (hyper- explanation can be that the supervised methods depend more
)parameter optimization for all methods are presented11 in on the quality and depth of the available article metadata
Tables 4 and 5. information. While the G1 dataset uses a fine-grained cate-
Accuracy Results. In general, we can observe that consid-
gorization scheme (461 categories), the categorization of the
ering content information is in fact highly beneficial in terms
Adressa dataset is much more coarse (41 categories).
of recommendation accuracy. It is also possible to see that
Among the baselines, SR leads to the best accuracy results,
the choice of the article representation matters. Surprisingly,
but does not match the performance of the content-agnostic
9
We also made experiments with session-based methods proposed in No-ACE settings for an RNN. This indicates that the hybrid
[28] (e.g. V-SkNN), but they did not lead to results that were better approach of considering additional contextual information,
than the SR and CO methods.
10
https://github.com/gabrielspmoreira/chameleon recsys as done by CHAMELEON ’s NAR module in this condition,
11
The highest values for a given metric are highlighted in bold. The is important.
best values for the CHAMELEON configurations are printed in italics. Recommending only based on content information (CB ),
If the best results are significantly different (𝑝 < 0.001) from all other
algorithms, they are marked with *. We used paired Student’s t-tests as expected, does not lead to competitive accuracy results,
with Bonferroni correction for significance tests. because the popularity of the items is not taken into account
On the Importance of News Content Representation in Hybrid Neural... INRA’19, September, 2019, Copenhagen, Denmark

(which SR and neighborhood-based methods implicitly do). Science 41, 6 (1990), 391–407.
Recommending only recently popular articles (RP ) works [7] Jorge Dı́ez Peláez, David Martı́nez Rego, Amparo Alonso Betanzos,
Óscar Luaces Rodrı́guez, and Antonio Bahamonde Rionda. 2016.
better than CB, but does not match the performance of the Metrical Representation of Readers and Articles in a Digital
other methods. Newspaper. In Proceedings of the 10th ACM Conference on
Recommender Systems (RecSys 2016).
Coverage and Novelty. In terms of coverage (COV@10 ), [8] Elena Viorica Epure, Benjamin Kille, Jon Espen Ingvaldsen, Re-
the simple Content-Based (CB) method leads to the highest becca Deneckere, Camille Salinesi, and Sahin Albayrak. 2017.
Recommending Personalized News in Short User Sessions. In Pro-
value, as it recommends across the entire spectrum based ceedings of the Eleventh ACM Conference on Recommender
solely on content similarity, without considering the popular- Systems (RecSys’17). 121–129.
[9] Ryan Graff. 2015. How the Washington Post used data
ity of the items. It is followed by the various CHAMELEON and natural language processing to get people to read
instantiations, where it turned out that the specifically chosen more news. https://knightlab.northwestern.edu/2015/06/03/
content representation is not too important in this respect. how-the-washington-posts-clavis-tool-helps-to-make-news-personal/.
(June 2015).
As expected, the CB method also frequently recommends [10] Jon Atle Gulla, Lemei Zhang, Peng Liu, Özlem Özgöbek, and
long-tail items, which also leads to the highest value in terms Xiaomeng Su. 2017. The Adressa dataset for news recommenda-
of novelty (ESI-R@10 ). The popularity-based method (RP ), tion. In Proceedings of the International Conference on Web
Intelligence (WI’17). 1042–1048.
in contrast, leads to the lowest novelty value. From the other [11] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011.
methods, the traditional Item-KNN method, to some surprise, Finding structure with randomness: Probabilistic algorithms for
leads to the best novelty results, even though neighborhood- constructing approximate matrix decompositions. SIAM Rev. 53,
2 (2011), 217–288.
based methods have a certain popularity bias. Looking at the [12] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and
other configurations, using unsupervised methods to represent Domonkos Tikk. 2016. Session-based recommendations with re-
current neural networks. In Proceedings of Fourth International
the text of the articles can help to drive the recommendations Conference on Learning Representations (ICLR’16).
a bit away from the popular ones. [13] Dietmar Jannach, Lukas Lerche, Iman Kamehkhosh, and Michael
Jugovac. 2015. What recommenders recommend: an analysis
of recommendation biases and possible countermeasures. User
5 SUMMARY AND CONCLUSION Modeling and User-Adapted Interaction 25, 5 (2015), 427–491.
The consideration of content information for news recom- [14] Dietmar Jannach and Malte Ludewig. 2017. When recurrent
neural networks meet the neighborhood for session-based recom-
mendation proved to be important in the past, and therefore mendation. In Proceedings of the Eleventh ACM Conference on
many hybrid systems were proposed in the literature. In this Recommender Systems (RecSys’17). 306–310.
work, we investigated the relative importance of incorporat- [15] Michael Jugovac, Dietmar Jannach, and Mozhgan Karimi. 2018.
StreamingRec: A Framework for Benchmarking Stream-based
ing content information in both streaming- and session-based News Recommenders. In Proceedings of the Twelfth ACM Con-
recommendation scenarios. Our experiments highlighted the ference on Recommender Systems (RecSys ’18). 306–310.
value of content information by showing that it helped to out- [16] Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018.
News recommender systems–Survey and roads ahead. Information
perform otherwise competitive baselines. Furthermore, the Processing & Management 54, 6 (2018), 1203–1227.
experiments also demonstrated that the choice of the article [17] Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha
Larson, and Torben Brodt. 2017. CLEF 2017 NewsREEL
representation can matter. However, the value of consider- Overview: Offline and Online Evaluation of Stream-based News
ing additional content information in the process depends Recommender Systems. In Working Notes of CLEF 2017 – Con-
on the quality and depth of the available data, especially ference and Labs of the Evaluation Forum.
[18] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix fac-
for supervised methods. From a practical perspective, this torization techniques for recommender systems. IEEE Computer
indicates that quality assurance and curation of the content 42, 8 (2009).
information can be essential to obtain better results. [19] Quoc Le and Tomas Mikolov. 2014. Distributed representations
of sentences and documents. In Proceedings of the 31st Interna-
tional Conference on Machine Learning (ICML ’14). 1188–1196.
REFERENCES [20] Lukas Lerche, Dietmar Jannach, and Malte Ludewig. 2016. On
[1] Pablo Castells, Neil J. Hurley, and Saul Vargas. 2015. Novelty and the Value of Reminders within E-Commerce Recommendations.
Diversity in Recommender Systems. In Recommender Systems In Proceedings of the 2016 Conference on User Modeling Adap-
Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira tation and Personalization, (UMAP’16).
(Eds.). Springer US, 881–918. [21] Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and
[2] Wei Chu and Seung-Taek Park. 2009. Personalized recommen- Jun Ma. 2017. Neural Attentive Session-based Recommendation.
dation on dynamic content using predictive bilinear models. In In Proceedings of the 2017 ACM on Conference on Information
Proceedings of the 18th International Conference on World and Knowledge Management (CIKM ’17). 1419–1428.
Wide Web (WWW’09). 691–700. [22] Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Pad-
[3] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Per- manabhan. 2011. SCENE: a scalable two-stage personalized news
formance of recommender algorithms on top-n recommendation recommendation system. In Proceedings of the 34th Interna-
tasks. In Proceedings of the fourth ACM Conference on Recom- tional Conference on Research and Development in Information
mender Systems (RecSys ’10). 39–46. Retrieval (SIGIR’11). 125–134.
[4] Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence [23] Lei Li, Li Zheng, Fan Yang, and Tao Li. 2014. Modeling and broad-
learning. In Advances in neural information processing systems. ening temporal user interest in personalized news recommendation.
3079–3087. Expert Systems with Applications 41, 7 (2014), 3168–3177.
[5] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam [24] Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support
Rajaram. 2007. Google news personalization: scalable online vector machines and word2vec for text classification with semantic
collaborative filtering. In Proceedings of the 16th International features. In 14th International Conference on Cognitive Infor-
Conference on World Wide Web (WWW’07). 271–280. matics & Cognitive Computing (ICCI*CC ’15). 136–140.
[6] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. [25] Chen Lin, Runquan Xie, Xinjun Guan, Lei Li, and Tao Li. 2014.
Landauer, and Richard Harshman. 1990. Indexing by latent seman- Personalized news recommendation via implicit social experts.
tic analysis. Journal of the American Society for Information Information Sciences 254 (2014), 1–18.
INRA’19, September, 2019, Copenhagen, Denmark de Souza P. Moreira et al.

[26] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personal- [34] Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018.
ized news recommendation based on click behavior. In Proceedings Sequence-Aware Recommender Systems. ACM Computing Sur-
of the 15th International Conference on Intelligent User Inter- veys (CSUR) 51, 4 (2018), 66.
faces (IUI ’10). 31–40. [35] Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and
[27] Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. Paolo Cremonesi. 2017. Personalizing Session-based Recommen-
STAMP: Short-Term Attention/Memory Priority Model for dations with Hierarchical Recurrent Neural Networks. In Proceed-
Session-based Recommendation. In Proceedings of the 24th ACM ings of the 11th ACM Conference on Recommender Systems
SIGKDD International Conference on Knowledge Discovery & (RecSys’17). 130–137.
Data Mining, (KDD ’18). 1831–1839. [36] Juan Ramos. 2003. Using TF-IDF to determine word relevance in
[28] Malte Ludewig and Dietmar Jannach. 2018. Evaluation of Session- document queries. In Technical Report, Department of Computer
based Recommendation Algorithms. User-Modeling and User- Science, Rutgers University.
Adapted Interaction 28, 4–5 (2018), 331–390. [37] Junyang Rao, Aixia Jia, Yansong Feng, and Dongyan Zhao. 2013.
[29] Malte Ludewig, Noemi Mauro, Sara Latifi, and Dietmar Jannach. Personalized news recommendation using ontologies harvested
2019. Performance Comparison of Neural and Non-Neural Ap- from the web. In International Conference on Web-Age Infor-
proaches to Session-based Recommendation. In Proceedings of mation Management. 781–787.
the 2019 ACM Conference on Recommender Systems (RecSys [38] A. Spangher. 2015. Building the Next New York Times Recom-
2019). mendation Engine. https://open.blogs.nytimes.com/2015/08/
[30] Cornelius A Ludmann. 2017. Recommending News Articles in 11/building-the-next-new-york-times-recommendation-engine/.
the CLEF News Recommendation Evaluation Lab with the Data (Aug 2015).
Stream Management System Odysseus.. In Working Notes of the [39] Michele Trevisiol, Luca Maria Aiello, Rossano Schifanella, and
Conference and Labs of the Evaluation Forum (CLEF’17). Alejandro Jaimes. 2014. Cold-start news recommendation with
[31] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and domain-dependent browse graph. In Proceedings of the 8th ACM
Jeff Dean. 2013. Distributed representations of words and phrases Conference on Recommender Systems (RecSys’14). 81–88.
and their compositionality. In Proceedings of Advances in Neural [40] Bartlomiej Twardowski. 2016. Modelling Contextual Information
Information Processing Systems (NIPS ’13). 3111–3119. in Session-Aware Recommender Systems with Neural Networks.
[32] Gabriel de Souza Pereira Moreira, Felipe Ferreira, and Adilson In Proceedings of the 10th ACM Conference on Recommender
Marques da Cunha. 2018. News Session-Based Recommendations Systems (RecSys’16). 273–276.
using Deep Neural Networks. In Proceedings of the 3rd Workshop [41] Saúl Vargas. 2015. Novelty and Diversity Evaluation and En-
on Deep Learning for Recommender Systems (DLRS) at ACM hancement in Recommender Systems. PhD thesis. Universidad
RecSys’18. 15–23. Autónoma de Madrid.
[33] Gabriel de Souza Pereira Moreira, Dietmar Jannach, and Adil- [42] Saúl Vargas and Pablo Castells. 2011. Rank and relevance in
son Marques da Cunha. 2019. Contextual Hybrid Session-based novelty and diversity metrics for recommender systems. In Pro-
News Recommendation with Recurrent Neural Networks. arXiv ceedings of the fifth ACM Conference on Recommender Systems
preprint arXiv:1904.10367 (2019). (RecSys’11). 109–116.