=Paper=
{{Paper
|id=Vol-2554/paper9
|storemode=property
|title=Enriched Network Embeddings for News Recommendation
|pdfUrl=https://ceur-ws.org/Vol-2554/paper_09.pdf
|volume=Vol-2554
|authors=Janu Verma
|dblpUrl=https://dblp.org/rec/conf/recsys/Verma19
}}
==Enriched Network Embeddings for News Recommendation==
Enriched Network Embeddings for News Recommendation
Janu Verma
Hike Messenger
New Delhi, Delhi, India
j.verma5@gmail.com
ABSTRACT Netflix), music (e.g. Pandora) have seen great success of
News aggregators collects content from various sources and recommendation systems. Two popular approaches of recom-
presents them in one website or mobile application for easy mendation systems are collaborative filtering and content-
access. A key challenge for the news applications is to help based filtering which form the basis for major recommen-
users discover relevant articles. Both the user experience and dation systems. In collaborative-filtering, news articles are
the key metrics depend on the high-quality personalized rec- recommended based on the reading history of users with
ommendations. However, building a news recommendation similar preferences. This is a very popular method which has
presents a set of challenges due the large number of articles the advantage of being domain free. A major drawback of
being published every hour, the surge and decline in the popu- collaborative-filtering is its inability to handle new content
larity of news, and critical nature of recency etc. In this paper, i.e. a breaking news that has not yet seen much traction, this
we present a graph-based news recommendation model which is called the item cold-start problem.
is deployed on a real-world news application. Our system is a A solution of the problem of fresh content would be to build
hybrid of collaborative-filtering and the content-based filter- user profiles comprising of genuine interests and use them to
ing. We enrich the user-article interaction graph by adding make news recommendations. This is called content-filtering
new nodes corresponding to the named entities extracted where the contents of the articles are analyzed to extract
from the contents of the articles. The random walk based topics of user’s interest. If a user’s past reading preferences
graph embeddings are used to learn latent representation are known, new articles can be recommended based on their
for users, articles and named entities in the same space. We similarity to the previously read. Content-based filtering
evaluate the learned embeddings via a multi-class classifica- requires sufficient reading history of users to be able to build
tion of news articles into high-level categories. We propose a strong profile of user’s interests. For users with little history,
a recommendation system based on the binary classification i.e. user cold-start, this becomes problematic. Collaborative-
problem which takes as input a combination of the user, item filtering relies on the fact that there are always users who read
and entity embeddings and computes the probability of the some news, and these users may serve as a basis to help to
user clicking on the article. We perform experiments to show predict the interests of the long-tail users. Thus, collaborative-
the superiority of our model to the previous system. filtering and content-based filtering are complementary to
each other.
KEYWORDS The user preferences are not straightforward—a user might
want to read an article even if she is not interested in the
News recommendation, network embeddings, NLP, Named
topic but finds the particular story relevant. For example,
Entities, Topic Modeling, Collaborative Filtering
wanting to read news about World Cup even if no general
interest in Sports. This requires a carefully designed content-
1 INTRODUCTION filtering model at a proper level of granularity. Furthermore,
News reading on the mobile devices has become very popular not all users are equal to each other, and the collaborative
in recent years owing to the availability of tremendous amount filtering method may not account for the individual variability
of data coming from millions of sources across the world. between users. Highly read topics are recommended to most
As per a 2017 survey by Pew Research [8], about 85% of of the users, even if some of the them have no interest in these
US adults read news on a mobile device. News applications topics. For example, entertainment and lifestyle articles are
like Google News, Bing News, Flipboard, Pocket etc. collect most popular and they get reflected in the recommendations
news from various sources and provide readers with news for a lot of users via other seemingly similar users.
from around the world in an aggregated user-interface. The Thus, news recommendation presents challenges that do
sheer volume of available articles can be overwhelming to the not exist in other domains. For example, the recency and the
readers. A key problem that news aggregation applications popularity of news articles can change drastically with time.
struggle with is to help users discover articles that are most Another compounding factor is the influx of a large number
interesting to them. of new articles every hour.
Recommendation system aims to capture users preferences
and interests so that relevant information is shown to them. Present Work: In this work, we propose a graph-based
Domains like online shopping (e.g. Amazon), movies (e.g. news recommendation system which combines the collaborative-
Copyright © 2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
filtering and content information. Our model is based on
graph embeddings [3], [5] where we map nodes of the graph
INRA’19, September, 2019, Copenhagen, Denmark Janu Verma
to vectors in a low-dimensional space such that the struc- ∙ Learn embeddings for the enriched user-item graph
tural attributes of the graph translate to the geometrical that contains entity nodes capturing contents of the
properties in the embedding space. The user-item interaction article in addition to the user and item nodes.
data can be defined as a bipartite graph with user nodes and ∙ Evaluate the learned embeddings via multi-class article
item nodes. In collaborative filtering, the adjacency matrix classification.
of this bipartite graph is used to learn similarity of users ∙ Build and evaluate the binary classification model for
and items [10]. In the current work, we enrich the user-item recommendation.
graph by adding new nodes corresponding to the named enti- ∙ Study the efficacy of our method for cold-start problem.
ties - person, place, organization, event - extracted from the The remainder of the article is organized as follows: In Section
contents of the articles. For instance, if an article is about 2, we provide a discussion of the related work on recommen-
2016 Presidential elections, we might add entity nodes like - dation systems and network embeddings. Next, we explain
Donald Trump, 2016 Elections, Republican Party etc. Also graph embeddings for the bipartite and the enriched graphs
added are the edges from the item nodes to the entity nodes. and their utilization in the recommendation model in Sec-
The enriched graph has three types of nodes - user, item, tion 3. Section 4 provides the analysis and evaluation of the
entity, and 2 types of edges - user-item and item-entity. proposed model. Finally, we conclude in Section 5.
To learn node embeddings, truncated random walks (biased
or unbiased) of fixed length are generated emanating from
each node [5]. These random walk sequences can be thought
2 RELATED WORK
of as sentences in an artificial language. Using the Skip-gram In this section, we discuss some of the related work on news
[7] model where words in a corpus of sentences in a natural recommendation systems.
language can be mapped to a low-dimensional space, we
Collaborative Filtering: Collaborative filtering [10] rec-
obtain dense representations for nodes. The embeddings aim
ommends articles which were clicked by users with similar
to capture contextual similarity of the nodes i.e. nodes co-
reading history. Collaborative filtering has been applied to
occurring on a fixed window on a random walk are mapped
personalized news reading applications, such as GroupLens
to nearby points. In collaborative-filtering nodes which co-
[11] and in the initial version of Google News recommenda-
occur in the adjacency list (direct connections) of a node
tion [9]. There are two types of collaborative-filtering [10] :
are mapped to nearby points [10]. Thus, graph embeddings
em neighborhood model and latent factor model. In neighbor-
provide an extension of the collaborative-filtering and has
hood model, the click history of users is used to compute the
been shown to perform better.
’neighborhood’ of a user and then articles are recommended
Using graph embedding, we map all nodes—users, items
to users from the articles clicked on by their neighbors. In the
and entities—to the same space. In this space, we can com-
latent factor model, a latent representation of both users and
pute similarity of a user to an item and also the user-entity
articles in the same space, usually via a matrix factorization
similarity. This allows us to suggest news articles based on
of the user-article interaction matrix. The latent representa-
a user’s affinity towards the entities which are at a much
tion can be interpreted as describing an article or a user in a
lower level than topical affinity in content-based filtering.
’concept’ space which captures the factors e.g. topic of the
This equips us to handle the situation where e.g. a consumer
article. The collaborative-filtering is employed when there is
is interested in reading the news about elections even if she
scarce click history available for users.
is not generally interested in politics. The entities are also
more interpretable than abstract topics. Content-based Filtering: A Content-based recommen-
We provide an evaluation of the learned representation by dation system tries to recommend items similar to those a
studying its efficacy in multi-class classification of the article given user has liked in the past. Thus, it requires a notion of
nodes into 8 pre-defined high-level categories e.g. Politics, similarity of articles. There has been a lot of work in NLP to
Entertainment, Sports etc. A simple linear classifier trained compute similarity between text documents e.g. tf-idf, word
only on the node embeddings of 60% of the article nodes embeddings [7] and doc2vec [12]. This relies on sufficient click
(without using any content information explicitly) labeled history to be able to build a genuine profile of user’s interests.
with their respective categories, and evaluated on the remain- The content-based filtering has been applied to personalized
ing 40% provides 0.901 AUC. We also cluster embeddings news recommendations e.g. news reading on devices ([13] ,
of the entity nodes and qualitatively evaluate the results. [15]) and web-based news aggregation services [14].
Finally, we evaluate the system for article recommendation
by computing Precision@k for k values ranging from 1 to 5. Hybrid Collaborative and Content-based Filtering:
Concretely, we make following contributions: Hybrid model which combine both collaborative-filtering and
content-based filtering are more stable against the problems of
any one of the approaches. This is accomplished by using both
∙ Provide a news recommendation system based on graph the user similarity based on historical information and the
embeddings that is a hybrid of collaborative and content- content similarity to make recommendation. Hybrid methods
based filtering. have been seen applications in news recommendation [16]
and [17].
Enriched Network Embeddings for News Recommendation INRA’19, September, 2019, Copenhagen, Denmark
Graph Embeddings: Our approach presented in this Algorithm 1 Bipartite Graph Embedding
paper is based on graph embeddings. Instead of working with Input: Bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), walk length 𝑙, walks
global summary attributes of the graph, there has been a lot per node 𝑟, embedding dimension 𝑑, context window size 𝑤.
of work recently to find a representation of the nodes that
1: Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦.
incorporates the local structural information [5], [3]. The idea
2: for 𝑖 = 0 to 𝑖 = 𝑟 do
is to learn a mapping from the graph to a low-dimensional real
vector space such that the structural attributes of the graph 3: for 𝑛 ∈ 𝑉 do
translate to the geometrical properties in the embedding 4: Initialize curr walk to [𝑛].
space. Random walks of pre-decided length are generated 5: while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do
starting at each node in the network to produce ”sentences” 6: curr step = curr walk[−1]
of nodes, similar to sentences of words in a natural language. 7: adj list = 𝐸(curr step)
The Skip-gram algorithm devised by Mikolov et al. [7] is used 8: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(adj list)
to obtain node embedding from the random walks, which 9: Append next step to curr walk
are expected to capture the contextual properties of the 10: Append curr walk to 𝑤𝑎𝑙𝑘𝑠
11: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑)
network nodes : the nodes that occur in same context have
similar vector embedding. For a survey on graph embeddings, Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R𝑑
see Cui et al. [6]. Recently, random walks on graph and
embeddings have been used in recommendation tasks e.g.
Pixie [18], GraphSage [19] etc. 𝑛 to the node 𝑚. 3) Thee current node is updated to 𝑚 and
the steps repeat. The procedure is described in Algorithm 1.
3 PROPOSED METHOD The random walks on the bipartite graph have paths of the
In this section, we describe our model for news recommenda- form
tion which uses graph embeddings as feature learning. News 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
aggregation applications gather data from various sources
and present the results as a list to the users. The recommen- The random walks thus generated are sequences of the
dation system attempts to rank the list of articles according nodes, which can be thought as ’sentences’ in an artificial
to a user’s preference. Mathematically, the articles are ranked language. The SkipGram [7] model, developed for word em-
based on the probability of the user clicking on them. We beddings, measures the probability of two words to co-occur
model this as a binary classification problem which takes a within a fixed window on a sentence. It uses a fixed size win-
user and an article as input and computes the probability dow around every word to extract context and non-context
of click i.e. 𝑃 𝑟𝑜𝑏(𝑐𝑙𝑖𝑐𝑘 = 1|𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚). The input features words for the word under consideration. The model employs
for this model are a combination of the dense representation a single hidden layer neural network to learn a mapping
of the user and the article which we learn through graph from the word to its context word. In graph embeddings, the
embeddings. skip-gram model [7] computes the probability of two nodes
During their activity on the application, users interact with to co-occur within a fixed window on a random walk.
many articles. More formally, the user-article interaction can This procedure is an extension of the latent factor models
be organized as bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), where 𝑈 [10] of collaborative-filtering where matrix factorization is
denotes the set of users and 𝐴 denotes the set of articles. used to obtain dense representations of users and items in the
The set 𝑉 = 𝑈 ∪ 𝐴 is the set of nodes of 𝐺. There is an edge same space. Graph embeddings have been shown to perform
𝑒 ∈ 𝐸 between a user 𝑢 ∈ 𝑈 and an article 𝑎 ∈ 𝐴 if the 𝑢 better than the matrix factorization for various tasks e.g.
clicked on the snippet for 𝑎 to read the story. The set of all classification, clustering, link prediction etc. Mathematically,
nodes connected to a node 𝑛, called the adjacency list of 𝑛 graph embeddings extend the contextual similarity from
is denoted by 𝐸(𝑛). immediate neighborhood to truncated random walks. Graph
embedding for the user-item bipartite graph, though effective,
Graph embedding for the bipartite graph: Graph suffers from the problem of cold start i.e. it’s unable to handle
representation learning techniques such as DeepWalk [5] and new content which was not the part of the graph.
Node2vec [3] use random walks to embed a graph onto a
low-dimensional space which maps each node to a dense Content-based filtering has been an alternative to collaborative-
vector. Following DeepWalk [5], we generate short truncated filtering for handling the problem of fresh, unseen items. We
random walks starting from each node. The random walks are merge collaborative and content filtering in a natural way
generated in a completely uniform and unbiased fashion, each in our setting by enriching the bipartite graph 𝐺 to a new
adjacent node has equal probability to be picked. The random graph 𝐺′ = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ) where 𝐶 denotes set of content
walks produce sequences of nodes of pre-decided length. For nodes and 𝐸 ′ ) is the set of edges between the article and
the bipartite graph, the random walks are generated by the content nodes. We use the named entities of the form
repeating the operations - 1) Given the current node 𝑛 which place, person, organization as the content nodes. The named
is initialized at the starting node of the random walk, get its entities are at a finer level than topic modeling appraoch
adjacency list 𝐸(𝑛). 2) Sample an edge from 𝐸(𝑛) which links [20] and are more constrained than the tf-idf based keyword
INRA’19, September, 2019, Copenhagen, Denmark Janu Verma
filtering [21]. The set of topics and keywords is not exhaus- Algorithm 2 Enriched Graph Embedding
tive, new articles can introduce topics and keywords, thus the Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), walk length 𝑙,
models for extracting topics or keywords need to be retrained. walks per node 𝑟, embedding dimension 𝑑, context window
Named Entity Recognition (NER), in turn, is article agnostic. size 𝑤.
It can extract entities in new articles without being retrained
1: Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦.
explicitly. In fact, there are off-shelf tools e.g. Spacy [22] to
2: for 𝑖 = 0 to 𝑖 = 𝑟 do
extract names entities on any article. Thus, NER is cheaper
to achieve than topic modeling and keyword extraction. 3: for 𝑛 ∈ 𝑉 do
4: Initialize curr walk to [𝑛].
Graph embedding for the enriched graph: The en- 5: while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do
riched graph 𝐺′ contains user-user interaction via their shared 6: curr step = curr walk[−1]
interests in articles as well as the article-article interaction 7: next step = 𝑁 𝑒𝑥𝑡𝑁 𝑜𝑑𝑒(𝐺, curr step)
via the co-occurrence of entities in them. In the bipartite 8: Append next step to curr walk
setting, the random walks transition between user and article 9: Append curr walk to 𝑤𝑎𝑙𝑘𝑠
nodes. While, in the enriched graph, the random walks are 10: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑)
more complex. The random walker at the article node can Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R𝑑
either jump to the user node or to the entity node. The paths
in the random walks can be of following types:
Algorithm 3 NextNode
∙ 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), curr node
∙ 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 1: If curr node ∈ 𝑈
∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 2: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node))
∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 3: else if curr node ∈ 𝐶
4: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node))
This provides a lot more possibilities then the 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 5: else if curr node ∈ 𝐴
𝑈 𝑠𝑒𝑟 and 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 choices in the bipartite. Thus, 6: Generate a random number 𝑡 from 𝑈 𝑛𝑖𝑓 𝑜𝑟𝑚(0, 1).
the embeddings learned on these random walks are able to 7: If 𝑡 > 0.5
capture relationships between different types of nodes due to 8: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node))
different types of ways they can be connected. 9: if 𝑡 ≤ 0.5
The enriched graph 𝐺′ is heterogeneous with two types of 10: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node))
edges i.e. user-article and article-entity. The graph embedding
Output: Next node in the random walk next step
methods like DeepWalk [5] are restricted to homogeneous
graphs, and may not be directly applicable to heterogeneous
networks. One obvious problem is that by having unbiased
Recommendation Model: Having learned the embed-
random walks, we assign equal probability to each edge type.
dings for the users, items and the entities, we now describe
This increases the probability of the random walk going
the model for recommendation. We define the recommenda-
through an edge-type which has multiple edges from the
tion problem as a binary classification model which learns
current node. For any article node in the enriched graph,
the probability of a user clicking on an article. We use logistic
there are far more edges to the user nodes than to the entity
regression model on the embeddings learned on the graph.
nodes. Thus the random walk has higher chances of going
The input feature vector to the logistic regression model is the
to the user nodes, and in some cases completely avoid entity
Hadamard product of the user feature vector and the article
edges. This stems from an imbalance at the global level,
feature vector. If both the user and the article were present
we have a tens of thousands of entities, where as there are
in the enriched graph i.e. we have an embedding for them,
millions of users. We provide a simple way to resolve the
then the user and article feature vectors are their respective
problem of random walks being biased by the dominant edge
graph embeddings. The purpose of a news recommendation
type. The random walk is generated in two steps: (i) an edge
system is to recommend new, unseen articles, and we may
type is chosen randomly from all possible edge types, (ii) an
not have an embedding for the fresh articles. In this case,
edge is randomly chosen from all edges of the selected edge
we use the average of the embeddings of the entities present
type. This amounts to biasing the random walks uniformly
in the article as the article feature vector. Thus the input
with equal weights for each edge type. The procedure for
feature to the logistic regression is in the same space as the
learning representations of the nodes for the enriched graph
graph embedding space.
is described in the Algorithm 2 and Algorithm 3.
If 𝐹𝑢 ∈ R𝑑 is the vector representation of user 𝑢 and
In reality, different edge types contribute differently to the
𝐹𝑎 ∈ R𝑑 be the vector representation of article 𝑎, then the
random walks and would have unequal weights. However,
input to the logistic regression model is a vector 𝑥(𝑢,𝑎) =
we do not have an intuitive way to obtain these weights.
Some work has been done in this direction e.g. Metapath2vec (𝑥1 , 𝑥2 , . . . 𝑥𝑑 ) ∈ R𝑑 defined as:
metapath2vec, heterogeneous edge embeddings [23] etc. (𝑥(𝑢,𝑎) )𝑖 = (𝐹𝑢 )𝑖 * (𝐹𝑎 )𝑖 (1)
Enriched Network Embeddings for News Recommendation INRA’19, September, 2019, Copenhagen, Denmark
The user feature vector is defined as the user embedding Table 1: Comparison of Embeddings
vector 𝑉𝑢 learned via graph embeddings. The article feature
vector 𝐹𝑎 is given by the graph embedding 𝑉𝑎 ∈ R𝑑 if 𝑎 Features Precision Recall F-1
has an embedding. Else, if 𝐶(𝑎) is the set of entities in the word2vec 0.90 0.92 0.91
contents of 𝑎 doc2vec 0.91 0.92 0.91
1 ∑︁
node2vec 0.88 0.89 0.88
𝐹𝑎 = 𝑉𝑐 (2)
|𝐶(𝑎)|
𝑐∈𝐶(𝑎)
The pipeline for recommendation system described above as Table 2: Confusion Matrix of the Evaluation
three components:
Category Precision Recall F-1
∙ Feature Learning: A large part of the user-article
Business 0.83 0.71 0.77
interaction data between time 𝑡0 and 𝑡1 is taken to
Sports 0.90 0.92 0.91
build the enriched graph and then embeddings are
Entertainment 0.97 0.95 0.96
learned on this graph. e.g. the user activity on the
Tech 0.86 0.83 0.84
application between Jan 2016 to Jan 2019.
∙ Recommendation Model Training: The logistic Local 0.88 0.95 0.91
regression model for the computing probability of a World 0.82 0.68 0.75
user clicking on an article is trained on a different Lifestyle 0.93 0.85 0.89
sub-graph which comprises of interaction data between Offbeat 0.87 0.62 0.72
time 𝑡1 and 𝑡2 e.g. between Jan 2019 to Jun 2019. A Average 0.88 0.89 0.88
portion of this training data is held out to validate the
model.
To learn embeddings of all the nodes in the enriched graph,
∙ Evaluation and Deployment: The trained model
we generate 30 random walks of length 100 for every node.
is then evaluated on the unseen data by considering
The skip-gram model is trained using stochastic gradient
the predictions of the model on usage between June
(SGD) with a learning rate of 0.01 to minimize the nega-
2019 to July 2019. The model is then deployed to the
tive sampling loss. Finally, we obtain an embedding of 128-
application.
dimension for every node.
An alternate interpretation of the recommendation model is
as the link prediction [24] in a user-article bipartite graph 4.2 Multi-class Article Classification
selected at a future time i.e. it contains users and new articles.
We evaluate the feature representations obtained through
graph embeddings on a standard supervised learning task -
4 RESULTS AND ANALYSIS multi-class classification of the news articles. News publish-
In this section, we provide discussion on evaluation and ers often add some high level categories to the articles they
analysis of the graph embeddings and the recommendation publish e.g. Sports, Entertainment etc. We train a machine
model. learning model on a the set of the labeled articles using their
graph embeddings as input. The task is to predict the labels
4.1 Experimental Setup for the remaining articles. There are 8 news categories in our
The data for this experiment was taken from a real-world data - Business, Entertainment, Sports, Local, Tech, World,
news aggregation mobile application X1 . The news content on Lifestyle, Offbeat. In this experiment, we use a fraction of the
X is presented as a scroll-able list which shows the headline labeled article nodes to train a multinomial logistic regression
and first couple of lines of the article. The user can read the model with L2 regularization. Without explicitly using any
full article by clicking on the shown snippet for that article. content information, the performance of thee logistic regres-
During their activity on the platform, users click on the sion model trained on node embeddings (node2vec) is similar
displayed article snippet if they want to read the full story or to that of a model trained explicitly on content features e.g.
they scroll past it. We consider user clicking on an article as word2vec [7] or doc2vec [12]. The comparison is presented in
an indicator of their interest in the article and use the clicks Table 1. The detailed results of the model are described in
as the positive data for the recommendation model. If a user the confusion matrix in Table 2.
scrolls past an article and does not click on it, we use that as
the negative example. Thus, our model attempt to increase 4.3 Recommendation Evaluation
the click-through rate (CTR) as the metric. For learning The recommendation model is trained on a set of user-article
the graph embeddings, we create an enriched graph using pairs of positive examples (user clicked) and negative exam-
the activity of 500K users during the period of 3 months, ples (no click) taken from the activity over a period of one
adding 40K articles. There are also 6K entity nodes, which month which is different from the data used for building the
we extracted using the freely available Spacy API. enriched graph. The logistic regression with L2 regulariza-
tion is used for binary classification of the pairs as click or
1
Name annonymized for the double-blind review. no-click. We then evaluate the model to predict whether a
INRA’19, September, 2019, Copenhagen, Denmark Janu Verma
Table 3: Comparison of Recommendation Models on Knowledge Discovery and Data Mining. pp. 119–128. ACM
(2015)
[2] Dong, Y., Chawla, N.V., Swami, A.: Metapath2vec: Scalable repre-
Model AUC Precision@5 sentation learning for heterogeneous networks. In: Proceedings of
Hybrid CF and Content 0.72 0.89 the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. pp. 135–144. KDD ’17, ACM, New
Enriched graph embeddings 0.87 0.97 York, NY, USA (2017)
[3] Grover, A., Leskovec, J.: node2vec: Scalable feature learning for
networks. CoRR abs/1607.00653 (2016), http://arxiv.org/abs/
set of user-article pairs were clicked on not. We measure the 1607.00653
performance of the model by computing the Area Under the [4] Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on
graphs: Methods and applications. CoRR abs/1709.05584 (2017),
ROC curve (AUC) and Precision at 5. The model shows a http://arxiv.org/abs/1709.05584
significant improvement over the previously used approach [5] Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of
which is a combination of collaborative-filtering and word social representations. In: Proceedings of the 20th ACM SIGKDD
International Conference on Knowledge Discovery and Data Min-
embeddings (Hybrid CF and Content). One major advantage ing. pp. 701–710. KDD ’14, ACM, New York, NY, USA (2014)
is that our model is that it naturally blends collaborative- [6] P. Cui, X. Wang, J. Pei, and W. Zhu: 2018. A survey on net-
filtering and content-based filtering by putting both the user, work embedding. In IEEE Transactions on Knowledge and Data
Engineering, 2018
item, and the content into the same space. [7] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation
of word representations in vector space. CoRR abs/1301.3781
5 CONCLUSION AND FUTURE WORK (2013), http://arxiv.org/abs/1301.3781
[8] https://www.pewresearch.org/fact-tank/2017/06/12/
In this work, we proposed a graph-based news recommenda- growth-in-mobile-news-use-driven-by-older-adults/
[9] Das, A. S., Datar, M., Garg, A., Rajaram, S. : Google news
tion system which is a hybrid of collaborative-filtering and personalization: scalable online collaborative filtering, Proceedings
content information-filtering. Our method employs graph of the 16th international conference on World Wide Web, 2007
embeddings to automatically learn latent representation of [10] Yehuda Koren, Robert Bell and Chris Volinsky : Matrix
factorization for recommender systems. https://datajobs.com/
users and articles. We extend the user-item bipartite graph data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf
to contain names entities from the articles. The entities are [11] Konstan, J. A.,Miller, B.N.,Maltz,D.,Herlocker, J. L.,Gordon, L.
expected to capture the user preferences at a finer-level. The R., And Riedl, J. Group-Lens: Applying collaborative filtering to
usenet news. Commun. ACM 40, 77-87. 1997.
embedding methods bring user, items and the entities in the [12] Quoc V. Le, Tomas Mikolov : Distributed Representations of
same space. We evaluated the learned latent representations Sentences and Documents. Proceedings of the 31 st International
Conference on Machine Learning (ICML), Beijing, China, 2014.
via classification of article nodes into 8 high-level categories. [13] Billsus, D., Pazzani, M. J.: User Modeling for Adaptive News
We show that without explicitly using the contents of the Access, User Modeling and User-Adapted Interaction, v.10 n.2-3,
article, we achieve results comparable to the NLP based fea- p.147-180, 2000
[14] Tan, A. and Tee, C. : Learning User Profiles for Personalized In-
tures. We also design and evaluate a recommendation model formation Dissemination. Proceedings of 1998 IEEE International
as a binary classification model for computing the likelihood Joint conference on Neural Networks, pp. 183- 188, May 1998
of the user clicking on the article. This model performs better [15] Carreira, R., Crato, J. M., Gon?alves, D., Jorge, J. A. : Evaluating
adaptive user profiles for news classification, Proceedings of the
than the hybrid collaborative-filtering and word embeddings 9th international conference on Intelligent user interfaces, 2004.
based article similarity model. [16] Billsus, D., & Pazzani :M. A hybrid user model for news story clas-
sification. In Proceedings of the Seventh International Conference
Though the model performs satisfactorily, this work has on User Modeling. 1999
focused on simplicity since the goal of this paper is to show [17] Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes,
the efficacy of graph embedding methods for content recom- D. and Sartin, M. : Combining Content-Based and Collaborative
Filters in an Online Newspaper. In Proceedings of ACM SIGIR
mendation. The embedding method has two hyperparameters Workshop on Recommender Systems, 1999
- walk length and number of walks per node - which we chose [18] Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen
based on heuristics and did not learn them. There is work Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec
: Pixie: A System for Recommending 3+ Billion Items to 200+
using attention mechanism [27] to learn the hyperparameters Million Users in Real-Time.
as an end-to-end system. The enriched network we used has [19] William L. Hamilton, Rex Ying, Jure Leskovec: Inductive Repre-
sentation Learning on Large Graphs, 31st Conference on Neural
two types of edges. We resolved the heterogeneity by giv- Information Processing Systems (NIPS 2017), Long Beach, CA,
ing equal importance to each edge type. Graph embeddings USA.
for heterogeneous networks is an active area of research e.g. [20] Chong Wang and Chong Wang: Collaborative Topic Modeling for
Recommending Scientific Articles, KDD11, August 2124, 2011,
metapath2vec [2], heterogeneous edge embeddings [23] etc. San Diego, California, USA.
The logistic regression model for recommendation was chosen [21] Christian Wartena, Wout Slakhorst, Martin Wibbels: Selecting
its simplicity, a more suitable approach would be a neural keywords for content based recommendation, Proceedings of the
19th ACM international conference on Information and knowledge
network based model which is either trained on Siamese like management (CIKM), Toronto, ON, Canada October 26 - 30,
loss (two input - user and item embeddings) [28] or the triplet 2010.
[22] Spacy - Industrial-Strength Natural Language Processing https:
loss (three input - user prefers item 1 over item 2) [29]. We //spacy.io/
plan to address some of these issues in a future work. [23] Janu Verma, Srishti Gupta, Debdoot Mukherjee, Tanmoy
Chakraborty : Heterogeneous Edge Embeddings for Friend Rec-
ommendation, ECIR Cologne, April 2019.
REFERENCES [24] Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for
[1] Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang, social networks. Journal of the American society for information
T.S.: Heterogeneous network embedding via deep architectures. In: science and technology 58(7), 1019–1031 (2007)
Proceedings of the 21th ACM SIGKDD International Conference
Enriched Network Embeddings for News Recommendation INRA’19, September, 2019, Copenhagen, Denmark
[25] McInnes, Leland and Healy, John and Astels, Steve : hdbscan: (NeurIPS 2018), Montral, Canada.
Hierarchical density based clustering, The Journal of Open Source [28] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf:
Software, Vol 2, number 11, 2017. DeepFace: Closing the Gap to Human-Level Performance in Face
[26] L.J.P. van der Maaten and G.E. Hinton. Visualizing High- Verification, Proceedings of the IEEE Computer Society Confer-
Dimensional Data Using t-SNE. Journal of Machine Learning ence on Computer Vision and Pattern Recognition 2014.
Research 9(Nov):2579-2605, 2008 [29] Florian Schroff, Dmitry Kalenichenko, James Philbin: FaceNet:
[27] Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi: A Unified Embedding for Face Recognition and Clustering, Pro-
Watch Your Step: Learning Node Embeddings via Graph Atten- ceedings of the IEEE Computer Society Conference on Computer
tion, 32nd Conference on Neural Information Processing Systems Vision and Pattern Recognition 2015/