=Paper=
{{Paper
|id=Vol-2554/paper9
|storemode=property
|title=Enriched Network Embeddings for News Recommendation
|pdfUrl=https://ceur-ws.org/Vol-2554/paper_09.pdf
|volume=Vol-2554
|authors=Janu Verma
|dblpUrl=https://dblp.org/rec/conf/recsys/Verma19
}}
==Enriched Network Embeddings for News Recommendation==
<pdf width="1500px">https://ceur-ws.org/Vol-2554/paper_09.pdf</pdf>
<pre>
      Enriched Network Embeddings for News Recommendation
                                                             Janu Verma
                                                            Hike Messenger
                                                        New Delhi, Delhi, India
                                                         j.verma5@gmail.com
ABSTRACT                                                              Netflix), music (e.g. Pandora) have seen great success of
News aggregators collects content from various sources and            recommendation systems. Two popular approaches of recom-
presents them in one website or mobile application for easy           mendation systems are collaborative filtering and content-
access. A key challenge for the news applications is to help          based filtering which form the basis for major recommen-
users discover relevant articles. Both the user experience and        dation systems. In collaborative-filtering, news articles are
the key metrics depend on the high-quality personalized rec-          recommended based on the reading history of users with
ommendations. However, building a news recommendation                 similar preferences. This is a very popular method which has
presents a set of challenges due the large number of articles         the advantage of being domain free. A major drawback of
being published every hour, the surge and decline in the popu-        collaborative-filtering is its inability to handle new content
larity of news, and critical nature of recency etc. In this paper,    i.e. a breaking news that has not yet seen much traction, this
we present a graph-based news recommendation model which              is called the item cold-start problem.
is deployed on a real-world news application. Our system is a             A solution of the problem of fresh content would be to build
hybrid of collaborative-filtering and the content-based filter-       user profiles comprising of genuine interests and use them to
ing. We enrich the user-article interaction graph by adding           make news recommendations. This is called content-filtering
new nodes corresponding to the named entities extracted               where the contents of the articles are analyzed to extract
from the contents of the articles. The random walk based              topics of user’s interest. If a user’s past reading preferences
graph embeddings are used to learn latent representation              are known, new articles can be recommended based on their
for users, articles and named entities in the same space. We          similarity to the previously read. Content-based filtering
evaluate the learned embeddings via a multi-class classifica-         requires sufficient reading history of users to be able to build
tion of news articles into high-level categories. We propose          a strong profile of user’s interests. For users with little history,
a recommendation system based on the binary classification            i.e. user cold-start, this becomes problematic. Collaborative-
problem which takes as input a combination of the user, item          filtering relies on the fact that there are always users who read
and entity embeddings and computes the probability of the             some news, and these users may serve as a basis to help to
user clicking on the article. We perform experiments to show          predict the interests of the long-tail users. Thus, collaborative-
the superiority of our model to the previous system.                  filtering and content-based filtering are complementary to
                                                                      each other.
KEYWORDS                                                                  The user preferences are not straightforward—a user might
                                                                      want to read an article even if she is not interested in the
News recommendation, network embeddings, NLP, Named
                                                                      topic but finds the particular story relevant. For example,
Entities, Topic Modeling, Collaborative Filtering
                                                                      wanting to read news about World Cup even if no general
                                                                      interest in Sports. This requires a carefully designed content-
1    INTRODUCTION                                                     filtering model at a proper level of granularity. Furthermore,
News reading on the mobile devices has become very popular            not all users are equal to each other, and the collaborative
in recent years owing to the availability of tremendous amount        filtering method may not account for the individual variability
of data coming from millions of sources across the world.             between users. Highly read topics are recommended to most
As per a 2017 survey by Pew Research [8], about 85% of                of the users, even if some of the them have no interest in these
US adults read news on a mobile device. News applications             topics. For example, entertainment and lifestyle articles are
like Google News, Bing News, Flipboard, Pocket etc. collect           most popular and they get reflected in the recommendations
news from various sources and provide readers with news               for a lot of users via other seemingly similar users.
from around the world in an aggregated user-interface. The                Thus, news recommendation presents challenges that do
sheer volume of available articles can be overwhelming to the         not exist in other domains. For example, the recency and the
readers. A key problem that news aggregation applications             popularity of news articles can change drastically with time.
struggle with is to help users discover articles that are most        Another compounding factor is the influx of a large number
interesting to them.                                                  of new articles every hour.
   Recommendation system aims to capture users preferences
and interests so that relevant information is shown to them.              Present Work: In this work, we propose a graph-based
Domains like online shopping (e.g. Amazon), movies (e.g.              news recommendation system which combines the collaborative-
Copyright © 2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                      filtering and content information. Our model is based on
                                                                      graph embeddings [3], [5] where we map nodes of the graph
INRA’19, September, 2019, Copenhagen, Denmark                                                                            Janu Verma


to vectors in a low-dimensional space such that the struc-               ∙ Learn embeddings for the enriched user-item graph
tural attributes of the graph translate to the geometrical                 that contains entity nodes capturing contents of the
properties in the embedding space. The user-item interaction               article in addition to the user and item nodes.
data can be defined as a bipartite graph with user nodes and             ∙ Evaluate the learned embeddings via multi-class article
item nodes. In collaborative filtering, the adjacency matrix               classification.
of this bipartite graph is used to learn similarity of users             ∙ Build and evaluate the binary classification model for
and items [10]. In the current work, we enrich the user-item               recommendation.
graph by adding new nodes corresponding to the named enti-               ∙ Study the efficacy of our method for cold-start problem.
ties - person, place, organization, event - extracted from the       The remainder of the article is organized as follows: In Section
contents of the articles. For instance, if an article is about       2, we provide a discussion of the related work on recommen-
2016 Presidential elections, we might add entity nodes like -        dation systems and network embeddings. Next, we explain
Donald Trump, 2016 Elections, Republican Party etc. Also             graph embeddings for the bipartite and the enriched graphs
added are the edges from the item nodes to the entity nodes.         and their utilization in the recommendation model in Sec-
The enriched graph has three types of nodes - user, item,            tion 3. Section 4 provides the analysis and evaluation of the
entity, and 2 types of edges - user-item and item-entity.            proposed model. Finally, we conclude in Section 5.
   To learn node embeddings, truncated random walks (biased
or unbiased) of fixed length are generated emanating from
each node [5]. These random walk sequences can be thought
                                                                     2    RELATED WORK
of as sentences in an artificial language. Using the Skip-gram       In this section, we discuss some of the related work on news
[7] model where words in a corpus of sentences in a natural          recommendation systems.
language can be mapped to a low-dimensional space, we
                                                                         Collaborative Filtering: Collaborative filtering [10] rec-
obtain dense representations for nodes. The embeddings aim
                                                                      ommends articles which were clicked by users with similar
to capture contextual similarity of the nodes i.e. nodes co-
                                                                      reading history. Collaborative filtering has been applied to
occurring on a fixed window on a random walk are mapped
                                                                      personalized news reading applications, such as GroupLens
to nearby points. In collaborative-filtering nodes which co-
                                                                      [11] and in the initial version of Google News recommenda-
occur in the adjacency list (direct connections) of a node
                                                                      tion [9]. There are two types of collaborative-filtering [10] :
are mapped to nearby points [10]. Thus, graph embeddings
                                                                      em neighborhood model and latent factor model. In neighbor-
provide an extension of the collaborative-filtering and has
                                                                      hood model, the click history of users is used to compute the
been shown to perform better.
                                                                     ’neighborhood’ of a user and then articles are recommended
   Using graph embedding, we map all nodes—users, items
                                                                      to users from the articles clicked on by their neighbors. In the
and entities—to the same space. In this space, we can com-
                                                                      latent factor model, a latent representation of both users and
pute similarity of a user to an item and also the user-entity
                                                                      articles in the same space, usually via a matrix factorization
similarity. This allows us to suggest news articles based on
                                                                      of the user-article interaction matrix. The latent representa-
a user’s affinity towards the entities which are at a much
                                                                      tion can be interpreted as describing an article or a user in a
lower level than topical affinity in content-based filtering.
                                                                     ’concept’ space which captures the factors e.g. topic of the
This equips us to handle the situation where e.g. a consumer
                                                                      article. The collaborative-filtering is employed when there is
is interested in reading the news about elections even if she
                                                                      scarce click history available for users.
is not generally interested in politics. The entities are also
more interpretable than abstract topics.                                Content-based Filtering: A Content-based recommen-
   We provide an evaluation of the learned representation by         dation system tries to recommend items similar to those a
studying its efficacy in multi-class classification of the article   given user has liked in the past. Thus, it requires a notion of
nodes into 8 pre-defined high-level categories e.g. Politics,        similarity of articles. There has been a lot of work in NLP to
Entertainment, Sports etc. A simple linear classifier trained        compute similarity between text documents e.g. tf-idf, word
only on the node embeddings of 60% of the article nodes              embeddings [7] and doc2vec [12]. This relies on sufficient click
(without using any content information explicitly) labeled           history to be able to build a genuine profile of user’s interests.
with their respective categories, and evaluated on the remain-       The content-based filtering has been applied to personalized
ing 40% provides 0.901 AUC. We also cluster embeddings               news recommendations e.g. news reading on devices ([13] ,
of the entity nodes and qualitatively evaluate the results.          [15]) and web-based news aggregation services [14].
Finally, we evaluate the system for article recommendation
by computing Precision@k for k values ranging from 1 to 5.             Hybrid Collaborative and Content-based Filtering:
   Concretely, we make following contributions:                      Hybrid model which combine both collaborative-filtering and
                                                                     content-based filtering are more stable against the problems of
                                                                     any one of the approaches. This is accomplished by using both
    ∙ Provide a news recommendation system based on graph            the user similarity based on historical information and the
      embeddings that is a hybrid of collaborative and content-      content similarity to make recommendation. Hybrid methods
      based filtering.                                               have been seen applications in news recommendation [16]
                                                                     and [17].
Enriched Network Embeddings for News Recommendation                           INRA’19, September, 2019, Copenhagen, Denmark


   Graph Embeddings: Our approach presented in this                Algorithm 1 Bipartite Graph Embedding
paper is based on graph embeddings. Instead of working with        Input: Bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), walk length 𝑙, walks
global summary attributes of the graph, there has been a lot       per node 𝑟, embedding dimension 𝑑, context window size 𝑤.
of work recently to find a representation of the nodes that
                                                                    1: Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦.
incorporates the local structural information [5], [3]. The idea
                                                                    2: for 𝑖 = 0 to 𝑖 = 𝑟 do
is to learn a mapping from the graph to a low-dimensional real
vector space such that the structural attributes of the graph       3:   for 𝑛 ∈ 𝑉 do
translate to the geometrical properties in the embedding            4:   Initialize curr walk to [𝑛].
space. Random walks of pre-decided length are generated             5:     while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do
starting at each node in the network to produce ”sentences”         6:      curr step = curr walk[−1]
of nodes, similar to sentences of words in a natural language.      7:      adj list = 𝐸(curr step)
The Skip-gram algorithm devised by Mikolov et al. [7] is used       8:      next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(adj list)
to obtain node embedding from the random walks, which               9:      Append next step to curr walk
are expected to capture the contextual properties of the           10:   Append curr walk to 𝑤𝑎𝑙𝑘𝑠
                                                                   11: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑)
network nodes : the nodes that occur in same context have
similar vector embedding. For a survey on graph embeddings,        Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R𝑑
see Cui et al. [6]. Recently, random walks on graph and
embeddings have been used in recommendation tasks e.g.
Pixie [18], GraphSage [19] etc.                                    𝑛 to the node 𝑚. 3) Thee current node is updated to 𝑚 and
                                                                   the steps repeat. The procedure is described in Algorithm 1.
3    PROPOSED METHOD                                               The random walks on the bipartite graph have paths of the
In this section, we describe our model for news recommenda-        form
tion which uses graph embeddings as feature learning. News                            𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
aggregation applications gather data from various sources
and present the results as a list to the users. The recommen-         The random walks thus generated are sequences of the
dation system attempts to rank the list of articles according      nodes, which can be thought as ’sentences’ in an artificial
to a user’s preference. Mathematically, the articles are ranked    language. The SkipGram [7] model, developed for word em-
based on the probability of the user clicking on them. We          beddings, measures the probability of two words to co-occur
model this as a binary classification problem which takes a        within a fixed window on a sentence. It uses a fixed size win-
user and an article as input and computes the probability          dow around every word to extract context and non-context
of click i.e. 𝑃 𝑟𝑜𝑏(𝑐𝑙𝑖𝑐𝑘 = 1|𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚). The input features      words for the word under consideration. The model employs
for this model are a combination of the dense representation       a single hidden layer neural network to learn a mapping
of the user and the article which we learn through graph           from the word to its context word. In graph embeddings, the
embeddings.                                                        skip-gram model [7] computes the probability of two nodes
   During their activity on the application, users interact with   to co-occur within a fixed window on a random walk.
many articles. More formally, the user-article interaction can        This procedure is an extension of the latent factor models
be organized as bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), where 𝑈             [10] of collaborative-filtering where matrix factorization is
denotes the set of users and 𝐴 denotes the set of articles.        used to obtain dense representations of users and items in the
The set 𝑉 = 𝑈 ∪ 𝐴 is the set of nodes of 𝐺. There is an edge       same space. Graph embeddings have been shown to perform
𝑒 ∈ 𝐸 between a user 𝑢 ∈ 𝑈 and an article 𝑎 ∈ 𝐴 if the 𝑢           better than the matrix factorization for various tasks e.g.
clicked on the snippet for 𝑎 to read the story. The set of all     classification, clustering, link prediction etc. Mathematically,
nodes connected to a node 𝑛, called the adjacency list of 𝑛        graph embeddings extend the contextual similarity from
is denoted by 𝐸(𝑛).                                                immediate neighborhood to truncated random walks. Graph
                                                                   embedding for the user-item bipartite graph, though effective,
   Graph embedding for the bipartite graph: Graph                  suffers from the problem of cold start i.e. it’s unable to handle
representation learning techniques such as DeepWalk [5] and        new content which was not the part of the graph.
Node2vec [3] use random walks to embed a graph onto a
low-dimensional space which maps each node to a dense                  Content-based filtering has been an alternative to collaborative-
vector. Following DeepWalk [5], we generate short truncated        filtering for handling the problem of fresh, unseen items. We
random walks starting from each node. The random walks are         merge collaborative and content filtering in a natural way
generated in a completely uniform and unbiased fashion, each       in our setting by enriching the bipartite graph 𝐺 to a new
adjacent node has equal probability to be picked. The random       graph 𝐺′ = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ) where 𝐶 denotes set of content
walks produce sequences of nodes of pre-decided length. For        nodes and 𝐸 ′ ) is the set of edges between the article and
the bipartite graph, the random walks are generated by             the content nodes. We use the named entities of the form
repeating the operations - 1) Given the current node 𝑛 which       place, person, organization as the content nodes. The named
is initialized at the starting node of the random walk, get its    entities are at a finer level than topic modeling appraoch
adjacency list 𝐸(𝑛). 2) Sample an edge from 𝐸(𝑛) which links       [20] and are more constrained than the tf-idf based keyword
INRA’19, September, 2019, Copenhagen, Denmark                                                                           Janu Verma


filtering [21]. The set of topics and keywords is not exhaus-       Algorithm 2 Enriched Graph Embedding
tive, new articles can introduce topics and keywords, thus the      Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), walk length 𝑙,
models for extracting topics or keywords need to be retrained.      walks per node 𝑟, embedding dimension 𝑑, context window
Named Entity Recognition (NER), in turn, is article agnostic.       size 𝑤.
It can extract entities in new articles without being retrained
                                                                     1: Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦.
explicitly. In fact, there are off-shelf tools e.g. Spacy [22] to
                                                                     2: for 𝑖 = 0 to 𝑖 = 𝑟 do
extract names entities on any article. Thus, NER is cheaper
to achieve than topic modeling and keyword extraction.               3:   for 𝑛 ∈ 𝑉 do
                                                                     4:   Initialize curr walk to [𝑛].
   Graph embedding for the enriched graph: The en-                   5:     while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do
riched graph 𝐺′ contains user-user interaction via their shared      6:      curr step = curr walk[−1]
interests in articles as well as the article-article interaction     7:      next step = 𝑁 𝑒𝑥𝑡𝑁 𝑜𝑑𝑒(𝐺, curr step)
via the co-occurrence of entities in them. In the bipartite          8:      Append next step to curr walk
setting, the random walks transition between user and article        9:   Append curr walk to 𝑤𝑎𝑙𝑘𝑠
nodes. While, in the enriched graph, the random walks are           10: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑)
more complex. The random walker at the article node can             Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R𝑑
either jump to the user node or to the entity node. The paths
in the random walks can be of following types:
                                                                    Algorithm 3 NextNode
    ∙ 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
                                                                    Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), curr node
    ∙ 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟
    ∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒                                      1: If curr node ∈ 𝑈
    ∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟                             2:    next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node))
    ∙ 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦                            3: else if curr node ∈ 𝐶
                                                                     4:    next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node))
This provides a lot more possibilities then the 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 →            5: else if curr node ∈ 𝐴
𝑈 𝑠𝑒𝑟 and 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 choices in the bipartite. Thus,            6:    Generate a random number 𝑡 from 𝑈 𝑛𝑖𝑓 𝑜𝑟𝑚(0, 1).
the embeddings learned on these random walks are able to             7:    If 𝑡 > 0.5
capture relationships between different types of nodes due to        8:      next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node))
different types of ways they can be connected.                       9:    if 𝑡 ≤ 0.5
   The enriched graph 𝐺′ is heterogeneous with two types of         10:      next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node))
edges i.e. user-article and article-entity. The graph embedding
                                                                    Output: Next node in the random walk next step
methods like DeepWalk [5] are restricted to homogeneous
graphs, and may not be directly applicable to heterogeneous
networks. One obvious problem is that by having unbiased
                                                                       Recommendation Model: Having learned the embed-
random walks, we assign equal probability to each edge type.
                                                                    dings for the users, items and the entities, we now describe
This increases the probability of the random walk going
                                                                    the model for recommendation. We define the recommenda-
through an edge-type which has multiple edges from the
                                                                    tion problem as a binary classification model which learns
current node. For any article node in the enriched graph,
                                                                    the probability of a user clicking on an article. We use logistic
there are far more edges to the user nodes than to the entity
                                                                    regression model on the embeddings learned on the graph.
nodes. Thus the random walk has higher chances of going
                                                                    The input feature vector to the logistic regression model is the
to the user nodes, and in some cases completely avoid entity
                                                                    Hadamard product of the user feature vector and the article
edges. This stems from an imbalance at the global level,
                                                                    feature vector. If both the user and the article were present
we have a tens of thousands of entities, where as there are
                                                                    in the enriched graph i.e. we have an embedding for them,
millions of users. We provide a simple way to resolve the
                                                                    then the user and article feature vectors are their respective
problem of random walks being biased by the dominant edge
                                                                    graph embeddings. The purpose of a news recommendation
type. The random walk is generated in two steps: (i) an edge
                                                                    system is to recommend new, unseen articles, and we may
type is chosen randomly from all possible edge types, (ii) an
                                                                    not have an embedding for the fresh articles. In this case,
edge is randomly chosen from all edges of the selected edge
                                                                    we use the average of the embeddings of the entities present
type. This amounts to biasing the random walks uniformly
                                                                    in the article as the article feature vector. Thus the input
with equal weights for each edge type. The procedure for
                                                                    feature to the logistic regression is in the same space as the
learning representations of the nodes for the enriched graph
                                                                    graph embedding space.
is described in the Algorithm 2 and Algorithm 3.
                                                                       If 𝐹𝑢 ∈ R𝑑 is the vector representation of user 𝑢 and
   In reality, different edge types contribute differently to the
                                                                    𝐹𝑎 ∈ R𝑑 be the vector representation of article 𝑎, then the
random walks and would have unequal weights. However,
                                                                    input to the logistic regression model is a vector 𝑥(𝑢,𝑎) =
we do not have an intuitive way to obtain these weights.
Some work has been done in this direction e.g. Metapath2vec         (𝑥1 , 𝑥2 , . . . 𝑥𝑑 ) ∈ R𝑑 defined as:
metapath2vec, heterogeneous edge embeddings [23] etc.                                  (𝑥(𝑢,𝑎) )𝑖 = (𝐹𝑢 )𝑖 * (𝐹𝑎 )𝑖              (1)
Enriched Network Embeddings for News Recommendation                             INRA’19, September, 2019, Copenhagen, Denmark


The user feature vector is defined as the user embedding                      Table 1: Comparison of Embeddings
vector 𝑉𝑢 learned via graph embeddings. The article feature
vector 𝐹𝑎 is given by the graph embedding 𝑉𝑎 ∈ R𝑑 if 𝑎                           Features     Precision   Recall   F-1
has an embedding. Else, if 𝐶(𝑎) is the set of entities in the                    word2vec       0.90       0.92    0.91
contents of 𝑎                                                                     doc2vec       0.91       0.92    0.91
                           1    ∑︁
                                                                                 node2vec       0.88       0.89    0.88
                   𝐹𝑎 =               𝑉𝑐                  (2)
                         |𝐶(𝑎)|
                                     𝑐∈𝐶(𝑎)

The pipeline for recommendation system described above as                Table 2: Confusion Matrix of the Evaluation
three components:
                                                                                Category        Precision    Recall   F-1
       ∙ Feature Learning: A large part of the user-article
                                                                                Business          0.83        0.71    0.77
         interaction data between time 𝑡0 and 𝑡1 is taken to
                                                                                 Sports           0.90        0.92    0.91
         build the enriched graph and then embeddings are
                                                                              Entertainment       0.97        0.95    0.96
         learned on this graph. e.g. the user activity on the
                                                                                  Tech            0.86        0.83    0.84
         application between Jan 2016 to Jan 2019.
       ∙ Recommendation Model Training: The logistic                              Local           0.88        0.95    0.91
         regression model for the computing probability of a                      World           0.82        0.68    0.75
         user clicking on an article is trained on a different                  Lifestyle         0.93        0.85    0.89
         sub-graph which comprises of interaction data between                   Offbeat          0.87        0.62    0.72
         time 𝑡1 and 𝑡2 e.g. between Jan 2019 to Jun 2019. A                     Average          0.88        0.89    0.88
         portion of this training data is held out to validate the
         model.
                                                                        To learn embeddings of all the nodes in the enriched graph,
       ∙ Evaluation and Deployment: The trained model
                                                                     we generate 30 random walks of length 100 for every node.
         is then evaluated on the unseen data by considering
                                                                     The skip-gram model is trained using stochastic gradient
         the predictions of the model on usage between June
                                                                     (SGD) with a learning rate of 0.01 to minimize the nega-
         2019 to July 2019. The model is then deployed to the
                                                                     tive sampling loss. Finally, we obtain an embedding of 128-
         application.
                                                                     dimension for every node.
An alternate interpretation of the recommendation model is
as the link prediction [24] in a user-article bipartite graph        4.2    Multi-class Article Classification
selected at a future time i.e. it contains users and new articles.
                                                                     We evaluate the feature representations obtained through
                                                                     graph embeddings on a standard supervised learning task -
4       RESULTS AND ANALYSIS                                         multi-class classification of the news articles. News publish-
In this section, we provide discussion on evaluation and             ers often add some high level categories to the articles they
analysis of the graph embeddings and the recommendation              publish e.g. Sports, Entertainment etc. We train a machine
model.                                                               learning model on a the set of the labeled articles using their
                                                                     graph embeddings as input. The task is to predict the labels
4.1       Experimental Setup                                         for the remaining articles. There are 8 news categories in our
The data for this experiment was taken from a real-world             data - Business, Entertainment, Sports, Local, Tech, World,
news aggregation mobile application X1 . The news content on         Lifestyle, Offbeat. In this experiment, we use a fraction of the
X is presented as a scroll-able list which shows the headline        labeled article nodes to train a multinomial logistic regression
and first couple of lines of the article. The user can read the      model with L2 regularization. Without explicitly using any
full article by clicking on the shown snippet for that article.      content information, the performance of thee logistic regres-
During their activity on the platform, users click on the            sion model trained on node embeddings (node2vec) is similar
displayed article snippet if they want to read the full story or     to that of a model trained explicitly on content features e.g.
they scroll past it. We consider user clicking on an article as      word2vec [7] or doc2vec [12]. The comparison is presented in
an indicator of their interest in the article and use the clicks     Table 1. The detailed results of the model are described in
as the positive data for the recommendation model. If a user         the confusion matrix in Table 2.
scrolls past an article and does not click on it, we use that as
the negative example. Thus, our model attempt to increase            4.3    Recommendation Evaluation
the click-through rate (CTR) as the metric. For learning             The recommendation model is trained on a set of user-article
the graph embeddings, we create an enriched graph using              pairs of positive examples (user clicked) and negative exam-
the activity of 500K users during the period of 3 months,            ples (no click) taken from the activity over a period of one
adding 40K articles. There are also 6K entity nodes, which           month which is different from the data used for building the
we extracted using the freely available Spacy API.                   enriched graph. The logistic regression with L2 regulariza-
                                                                     tion is used for binary classification of the pairs as click or
1
    Name annonymized for the double-blind review.                    no-click. We then evaluate the model to predict whether a
INRA’19, September, 2019, Copenhagen, Denmark                                                                                       Janu Verma

 Table 3: Comparison of Recommendation Models                                 on Knowledge Discovery and Data Mining. pp. 119–128. ACM
                                                                              (2015)
                                                                          [2] Dong, Y., Chawla, N.V., Swami, A.: Metapath2vec: Scalable repre-
               Model                       AUC      Precision@5               sentation learning for heterogeneous networks. In: Proceedings of
       Hybrid CF and Content               0.72        0.89                   the 23rd ACM SIGKDD International Conference on Knowledge
                                                                              Discovery and Data Mining. pp. 135–144. KDD ’17, ACM, New
    Enriched graph embeddings              0.87        0.97                   York, NY, USA (2017)
                                                                          [3] Grover, A., Leskovec, J.: node2vec: Scalable feature learning for
                                                                              networks. CoRR abs/1607.00653 (2016), http://arxiv.org/abs/
set of user-article pairs were clicked on not. We measure the                 1607.00653
performance of the model by computing the Area Under the                  [4] Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on
                                                                              graphs: Methods and applications. CoRR abs/1709.05584 (2017),
ROC curve (AUC) and Precision at 5. The model shows a                         http://arxiv.org/abs/1709.05584
significant improvement over the previously used approach                 [5] Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of
which is a combination of collaborative-filtering and word                    social representations. In: Proceedings of the 20th ACM SIGKDD
                                                                              International Conference on Knowledge Discovery and Data Min-
embeddings (Hybrid CF and Content). One major advantage                       ing. pp. 701–710. KDD ’14, ACM, New York, NY, USA (2014)
is that our model is that it naturally blends collaborative-              [6] P. Cui, X. Wang, J. Pei, and W. Zhu: 2018. A survey on net-
filtering and content-based filtering by putting both the user,               work embedding. In IEEE Transactions on Knowledge and Data
                                                                              Engineering, 2018
item, and the content into the same space.                                [7] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation
                                                                              of word representations in vector space. CoRR abs/1301.3781
5    CONCLUSION AND FUTURE WORK                                               (2013), http://arxiv.org/abs/1301.3781
                                                                          [8] https://www.pewresearch.org/fact-tank/2017/06/12/
In this work, we proposed a graph-based news recommenda-                      growth-in-mobile-news-use-driven-by-older-adults/
                                                                          [9] Das, A. S., Datar, M., Garg, A., Rajaram, S. : Google news
tion system which is a hybrid of collaborative-filtering and                  personalization: scalable online collaborative filtering, Proceedings
content information-filtering. Our method employs graph                       of the 16th international conference on World Wide Web, 2007
embeddings to automatically learn latent representation of               [10] Yehuda Koren, Robert Bell and Chris Volinsky : Matrix
                                                                              factorization for recommender systems. https://datajobs.com/
users and articles. We extend the user-item bipartite graph                   data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf
to contain names entities from the articles. The entities are            [11] Konstan, J. A.,Miller, B.N.,Maltz,D.,Herlocker, J. L.,Gordon, L.
expected to capture the user preferences at a finer-level. The                R., And Riedl, J. Group-Lens: Applying collaborative filtering to
                                                                              usenet news. Commun. ACM 40, 77-87. 1997.
embedding methods bring user, items and the entities in the              [12] Quoc V. Le, Tomas Mikolov : Distributed Representations of
same space. We evaluated the learned latent representations                   Sentences and Documents. Proceedings of the 31 st International
                                                                              Conference on Machine Learning (ICML), Beijing, China, 2014.
via classification of article nodes into 8 high-level categories.        [13] Billsus, D., Pazzani, M. J.: User Modeling for Adaptive News
We show that without explicitly using the contents of the                     Access, User Modeling and User-Adapted Interaction, v.10 n.2-3,
article, we achieve results comparable to the NLP based fea-                  p.147-180, 2000
                                                                         [14] Tan, A. and Tee, C. : Learning User Profiles for Personalized In-
tures. We also design and evaluate a recommendation model                     formation Dissemination. Proceedings of 1998 IEEE International
as a binary classification model for computing the likelihood                 Joint conference on Neural Networks, pp. 183- 188, May 1998
of the user clicking on the article. This model performs better          [15] Carreira, R., Crato, J. M., Gon?alves, D., Jorge, J. A. : Evaluating
                                                                              adaptive user profiles for news classification, Proceedings of the
than the hybrid collaborative-filtering and word embeddings                   9th international conference on Intelligent user interfaces, 2004.
based article similarity model.                                          [16] Billsus, D., & Pazzani :M. A hybrid user model for news story clas-
                                                                              sification. In Proceedings of the Seventh International Conference
   Though the model performs satisfactorily, this work has                    on User Modeling. 1999
focused on simplicity since the goal of this paper is to show            [17] Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes,
the efficacy of graph embedding methods for content recom-                    D. and Sartin, M. : Combining Content-Based and Collaborative
                                                                              Filters in an Online Newspaper. In Proceedings of ACM SIGIR
mendation. The embedding method has two hyperparameters                       Workshop on Recommender Systems, 1999
- walk length and number of walks per node - which we chose              [18] Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen
based on heuristics and did not learn them. There is work                     Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, Jure Leskovec
                                                                              : Pixie: A System for Recommending 3+ Billion Items to 200+
using attention mechanism [27] to learn the hyperparameters                   Million Users in Real-Time.
as an end-to-end system. The enriched network we used has                [19] William L. Hamilton, Rex Ying, Jure Leskovec: Inductive Repre-
                                                                              sentation Learning on Large Graphs, 31st Conference on Neural
two types of edges. We resolved the heterogeneity by giv-                     Information Processing Systems (NIPS 2017), Long Beach, CA,
ing equal importance to each edge type. Graph embeddings                      USA.
for heterogeneous networks is an active area of research e.g.            [20] Chong Wang and Chong Wang: Collaborative Topic Modeling for
                                                                              Recommending Scientific Articles, KDD11, August 2124, 2011,
metapath2vec [2], heterogeneous edge embeddings [23] etc.                     San Diego, California, USA.
The logistic regression model for recommendation was chosen              [21] Christian Wartena, Wout Slakhorst, Martin Wibbels: Selecting
its simplicity, a more suitable approach would be a neural                    keywords for content based recommendation, Proceedings of the
                                                                              19th ACM international conference on Information and knowledge
network based model which is either trained on Siamese like                   management (CIKM), Toronto, ON, Canada October 26 - 30,
loss (two input - user and item embeddings) [28] or the triplet               2010.
                                                                         [22] Spacy - Industrial-Strength Natural Language Processing https:
loss (three input - user prefers item 1 over item 2) [29]. We                 //spacy.io/
plan to address some of these issues in a future work.                   [23] Janu Verma, Srishti Gupta, Debdoot Mukherjee, Tanmoy
                                                                              Chakraborty : Heterogeneous Edge Embeddings for Friend Rec-
                                                                              ommendation, ECIR Cologne, April 2019.
REFERENCES                                                               [24] Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for
 [1] Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang,           social networks. Journal of the American society for information
     T.S.: Heterogeneous network embedding via deep architectures. In:        science and technology 58(7), 1019–1031 (2007)
     Proceedings of the 21th ACM SIGKDD International Conference
Enriched Network Embeddings for News Recommendation                                  INRA’19, September, 2019, Copenhagen, Denmark


[25] McInnes, Leland and Healy, John and Astels, Steve : hdbscan:             (NeurIPS 2018), Montral, Canada.
     Hierarchical density based clustering, The Journal of Open Source   [28] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf:
     Software, Vol 2, number 11, 2017.                                        DeepFace: Closing the Gap to Human-Level Performance in Face
[26] L.J.P. van der Maaten and G.E. Hinton. Visualizing High-                 Verification, Proceedings of the IEEE Computer Society Confer-
     Dimensional Data Using t-SNE. Journal of Machine Learning                ence on Computer Vision and Pattern Recognition 2014.
     Research 9(Nov):2579-2605, 2008                                     [29] Florian Schroff, Dmitry Kalenichenko, James Philbin: FaceNet:
[27] Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi:              A Unified Embedding for Face Recognition and Clustering, Pro-
     Watch Your Step: Learning Node Embeddings via Graph Atten-               ceedings of the IEEE Computer Society Conference on Computer
     tion, 32nd Conference on Neural Information Processing Systems           Vision and Pattern Recognition 2015/

</pre>