Enriched Network Embeddings for News Recommendation

Enriched Network Embeddings for News Recommendation JanuVerma j.verma5@gmail.com Hike Messenger New Delhi

Delhi India

Enriched Network Embeddings for News Recommendation 81E2C15EEC52F264CBC332375CADD1CC GROBID - A machine learning software for extracting information from scholarly documents News recommendation network embeddings NLP Named Entities Topic Modeling Collaborative Filtering

News aggregators collects content from various sources and presents them in one website or mobile application for easy access. A key challenge for the news applications is to help users discover relevant articles. Both the user experience and the key metrics depend on the high-quality personalized recommendations. However, building a news recommendation presents a set of challenges due the large number of articles being published every hour, the surge and decline in the popularity of news, and critical nature of recency etc. In this paper, we present a graph-based news recommendation model which is deployed on a real-world news application. Our system is a hybrid of collaborative-filtering and the content-based filtering. We enrich the user-article interaction graph by adding new nodes corresponding to the named entities extracted from the contents of the articles. The random walk based graph embeddings are used to learn latent representation for users, articles and named entities in the same space. We evaluate the learned embeddings via a multi-class classification of news articles into high-level categories. We propose a recommendation system based on the binary classification problem which takes as input a combination of the user, item and entity embeddings and computes the probability of the user clicking on the article. We perform experiments to show the superiority of our model to the previous system.

INTRODUCTION

News reading on the mobile devices has become very popular in recent years owing to the availability of tremendous amount of data coming from millions of sources across the world. As per a 2017 survey by Pew Research [8], about 85% of US adults read news on a mobile device. News applications like Google News, Bing News, Flipboard, Pocket etc. collect news from various sources and provide readers with news from around the world in an aggregated user-interface. The sheer volume of available articles can be overwhelming to the readers. A key problem that news aggregation applications struggle with is to help users discover articles that are most interesting to them.

Recommendation system aims to capture users preferences and interests so that relevant information is shown to them. Domains like online shopping (e.g. Amazon), movies (e.g. Netflix), music (e.g. Pandora) have seen great success of recommendation systems. Two popular approaches of recommendation systems are collaborative filtering and contentbased filtering which form the basis for major recommendation systems. In collaborative-filtering, news articles are recommended based on the reading history of users with similar preferences. This is a very popular method which has the advantage of being domain free. A major drawback of collaborative-filtering is its inability to handle new content i.e. a breaking news that has not yet seen much traction, this is called the item cold-start problem.

A solution of the problem of fresh content would be to build user profiles comprising of genuine interests and use them to make news recommendations. This is called content-filtering where the contents of the articles are analyzed to extract topics of user's interest. If a user's past reading preferences are known, new articles can be recommended based on their similarity to the previously read. Content-based filtering requires sufficient reading history of users to be able to build a strong profile of user's interests. For users with little history, i.e. user cold-start, this becomes problematic. Collaborativefiltering relies on the fact that there are always users who read some news, and these users may serve as a basis to help to predict the interests of the long-tail users. Thus, collaborativefiltering and content-based filtering are complementary to each other.

The user preferences are not straightforward-a user might want to read an article even if she is not interested in the topic but finds the particular story relevant. For example, wanting to read news about World Cup even if no general interest in Sports. This requires a carefully designed contentfiltering model at a proper level of granularity. Furthermore, not all users are equal to each other, and the collaborative filtering method may not account for the individual variability between users. Highly read topics are recommended to most of the users, even if some of the them have no interest in these topics. For example, entertainment and lifestyle articles are most popular and they get reflected in the recommendations for a lot of users via other seemingly similar users.

Thus, news recommendation presents challenges that do not exist in other domains. For example, the recency and the popularity of news articles can change drastically with time. Another compounding factor is the influx of a large number of new articles every hour.

to vectors in a low-dimensional space such that the structural attributes of the graph translate to the geometrical properties in the embedding space. The user-item interaction data can be defined as a bipartite graph with user nodes and item nodes. In collaborative filtering, the adjacency matrix of this bipartite graph is used to learn similarity of users and items [10]. In the current work, we enrich the user-item graph by adding new nodes corresponding to the named entities -person, place, organization, event -extracted from the contents of the articles. For instance, if an article is about 2016 Presidential elections, we might add entity nodes like -Donald Trump, 2016 Elections, Republican Party etc. Also added are the edges from the item nodes to the entity nodes. The enriched graph has three types of nodes -user, item, entity, and 2 types of edges -user-item and item-entity.

To learn node embeddings, truncated random walks (biased or unbiased) of fixed length are generated emanating from each node [5]. These random walk sequences can be thought of as sentences in an artificial language. Using the Skip-gram [7] model where words in a corpus of sentences in a natural language can be mapped to a low-dimensional space, we obtain dense representations for nodes. The embeddings aim to capture contextual similarity of the nodes i.e. nodes cooccurring on a fixed window on a random walk are mapped to nearby points. In collaborative-filtering nodes which cooccur in the adjacency list (direct connections) of a node are mapped to nearby points [10]. Thus, graph embeddings provide an extension of the collaborative-filtering and has been shown to perform better.

Using graph embedding, we map all nodes-users, items and entities-to the same space. In this space, we can compute similarity of a user to an item and also the user-entity similarity. This allows us to suggest news articles based on a user's affinity towards the entities which are at a much lower level than topical affinity in content-based filtering. This equips us to handle the situation where e.g. a consumer is interested in reading the news about elections even if she is not generally interested in politics. The entities are also more interpretable than abstract topics.

We provide an evaluation of the learned representation by studying its efficacy in multi-class classification of the article nodes into 8 pre-defined high-level categories e.g. Politics, Entertainment, Sports etc. A simple linear classifier trained only on the node embeddings of 60% of the article nodes (without using any content information explicitly) labeled with their respective categories, and evaluated on the remaining 40% provides 0.901 AUC. We also cluster embeddings of the entity nodes and qualitatively evaluate the results. Finally, we evaluate the system for article recommendation by computing Precision@k for k values ranging from 1 to 5.

Concretely, we make following contributions:

• Provide a news recommendation system based on graph embeddings that is a hybrid of collaborative and contentbased filtering.

• Learn embeddings for the enriched user-item graph that contains entity nodes capturing contents of the article in addition to the user and item nodes. • Evaluate the learned embeddings via multi-class article classification. • Build and evaluate the binary classification model for recommendation.

• Study the efficacy of our method for cold-start problem.

The remainder of the article is organized as follows: In Section 2, we provide a discussion of the related work on recommendation systems and network embeddings. Next, we explain graph embeddings for the bipartite and the enriched graphs and their utilization in the recommendation model in Section 3. Section 4 provides the analysis and evaluation of the proposed model. Finally, we conclude in Section 5.

RELATED WORK

In this section, we discuss some of the related work on news recommendation systems.

Collaborative Filtering: Collaborative filtering [10] recommends articles which were clicked by users with similar reading history. Collaborative filtering has been applied to personalized news reading applications, such as GroupLens [11] and in the initial version of Google News recommendation [9]. There are two types of collaborative-filtering [10] : em neighborhood model and latent factor model. In neighborhood model, the click history of users is used to compute the 'neighborhood' of a user and then articles are recommended to users from the articles clicked on by their neighbors. In the latent factor model, a latent representation of both users and articles in the same space, usually via a matrix factorization of the user-article interaction matrix. The latent representation can be interpreted as describing an article or a user in a 'concept' space which captures the factors e.g. topic of the article. The collaborative-filtering is employed when there is scarce click history available for users.

Content-based Filtering:

A Content-based recommendation system tries to recommend items similar to those a given user has liked in the past. Thus, it requires a notion of similarity of articles. There has been a lot of work in NLP to compute similarity between text documents e.g. tf-idf, word embeddings [7] and doc2vec [12]. This relies on sufficient click history to be able to build a genuine profile of user's interests. The content-based filtering has been applied to personalized news recommendations e.g. news reading on devices ( [13] , [15]) and web-based news aggregation services [14].

Hybrid Collaborative and Content-based Filtering:

Hybrid model which combine both collaborative-filtering and content-based filtering are more stable against the problems of any one of the approaches. This is accomplished by using both the user similarity based on historical information and the content similarity to make recommendation. Hybrid methods have been seen applications in news recommendation [16] and [17].

Graph Embeddings: Our approach presented in this paper is based on graph embeddings. Instead of working with global summary attributes of the graph, there has been a lot of work recently to find a representation of the nodes that incorporates the local structural information [5], [3]. The idea is to learn a mapping from the graph to a low-dimensional real vector space such that the structural attributes of the graph translate to the geometrical properties in the embedding space. Random walks of pre-decided length are generated starting at each node in the network to produce "sentences" of nodes, similar to sentences of words in a natural language. The Skip-gram algorithm devised by Mikolov et al. [7] is used to obtain node embedding from the random walks, which are expected to capture the contextual properties of the network nodes : the nodes that occur in same context have similar vector embedding. For a survey on graph embeddings, see Cui et al. [6]. Recently, random walks on graph and embeddings have been used in recommendation tasks e.g. Pixie [18], GraphSage [19] etc.

PROPOSED METHOD

In this section, we describe our model for news recommendation which uses graph embeddings as feature learning. News aggregation applications gather data from various sources and present the results as a list to the users. The recommendation system attempts to rank the list of articles according to a user's preference. Mathematically, the articles are ranked based on the probability of the user clicking on them. We model this as a binary classification problem which takes a user and an article as input and computes the probability of click i.e. 𝑃 𝑟𝑜𝑏(𝑐𝑙𝑖𝑐𝑘 = 1|𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚). The input features for this model are a combination of the dense representation of the user and the article which we learn through graph embeddings.

During their activity on the application, users interact with many articles. More formally, the user-article interaction can be organized as bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), where 𝑈 denotes the set of users and 𝐴 denotes the set of articles. The set 𝑉 = 𝑈 ∪ 𝐴 is the set of nodes of 𝐺. There is an edge 𝑒 ∈ 𝐸 between a user 𝑢 ∈ 𝑈 and an article 𝑎 ∈ 𝐴 if the 𝑢 clicked on the snippet for 𝑎 to read the story. The set of all nodes connected to a node 𝑛, called the adjacency list of 𝑛 is denoted by 𝐸(𝑛).

Graph embedding for the bipartite graph: Graph representation learning techniques such as DeepWalk [5] and Node2vec [3] use random walks to embed a graph onto a low-dimensional space which maps each node to a dense vector. Following DeepWalk [5], we generate short truncated random walks starting from each node. The random walks are generated in a completely uniform and unbiased fashion, each adjacent node has equal probability to be picked. The random walks produce sequences of nodes of pre-decided length. For the bipartite graph, the random walks are generated by repeating the operations -1) Given the current node 𝑛 which is initialized at the starting node of the random walk, get its adjacency list 𝐸(𝑛). 2) Sample an edge from 𝐸(𝑛) which links Algorithm 1 Bipartite Graph Embedding Input: Bipartite graph 𝐺 = (𝑈, 𝐴, 𝐸), walk length 𝑙, walks per node 𝑟, embedding dimension 𝑑, context window size 𝑤. Append curr walk to 𝑤𝑎𝑙𝑘𝑠 11: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑) Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R 𝑑 𝑛 to the node 𝑚. 3) Thee current node is updated to 𝑚 and the steps repeat. The procedure is described in Algorithm 1. The random walks on the bipartite graph have paths of the form

𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟

The random walks thus generated are sequences of the nodes, which can be thought as 'sentences' in an artificial language. The SkipGram [7] model, developed for word embeddings, measures the probability of two words to co-occur within a fixed window on a sentence. It uses a fixed size window around every word to extract context and non-context words for the word under consideration. The model employs a single hidden layer neural network to learn a mapping from the word to its context word. In graph embeddings, the skip-gram model [7] computes the probability of two nodes to co-occur within a fixed window on a random walk.

This procedure is an extension of the latent factor models [10] of collaborative-filtering where matrix factorization is used to obtain dense representations of users and items in the same space. Graph embeddings have been shown to perform better than the matrix factorization for various tasks e.g. classification, clustering, link prediction etc. Mathematically, graph embeddings extend the contextual similarity from immediate neighborhood to truncated random walks. Graph embedding for the user-item bipartite graph, though effective, suffers from the problem of cold start i.e. it's unable to handle new content which was not the part of the graph. Content-based filtering has been an alternative to collaborativefiltering for handling the problem of fresh, unseen items. We merge collaborative and content filtering in a natural way in our setting by enriching the bipartite graph 𝐺 to a new graph 𝐺 ′ = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ) where 𝐶 denotes set of content nodes and 𝐸 ′ ) is the set of edges between the article and the content nodes. We use the named entities of the form place, person, organization as the content nodes. The named entities are at a finer level than topic modeling appraoch [20] and are more constrained than the tf-idf based keyword filtering [21]. The set of topics and keywords is not exhaustive, new articles can introduce topics and keywords, thus the models for extracting topics or keywords need to be retrained. Named Entity Recognition (NER), in turn, is article agnostic. It can extract entities in new articles without being retrained explicitly. In fact, there are off-shelf tools e.g. Spacy [22] to extract names entities on any article. Thus, NER is cheaper to achieve than topic modeling and keyword extraction.

Graph embedding for the enriched graph: The enriched graph 𝐺 ′ contains user-user interaction via their shared interests in articles as well as the article-article interaction via the co-occurrence of entities in them. In the bipartite setting, the random walks transition between user and article nodes. While, in the enriched graph, the random walks are more complex. The random walker at the article node can either jump to the user node or to the entity node. The paths in the random walks can be of following types:

• 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 • 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 • 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 • 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 • 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝐸𝑛𝑡𝑖𝑡𝑦

This provides a lot more possibilities then the 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 → 𝑈 𝑠𝑒𝑟 and 𝑈 𝑠𝑒𝑟 → 𝐴𝑟𝑡𝑖𝑐𝑙𝑒 choices in the bipartite. Thus, the embeddings learned on these random walks are able to capture relationships between different types of nodes due to different types of ways they can be connected.

The enriched graph 𝐺 ′ is heterogeneous with two types of edges i.e. user-article and article-entity. The graph embedding methods like DeepWalk [5] are restricted to homogeneous graphs, and may not be directly applicable to heterogeneous networks. One obvious problem is that by having unbiased random walks, we assign equal probability to each edge type. This increases the probability of the random walk going through an edge-type which has multiple edges from the current node. For any article node in the enriched graph, there are far more edges to the user nodes than to the entity nodes. Thus the random walk has higher chances of going to the user nodes, and in some cases completely avoid entity edges. This stems from an imbalance at the global level, we have a tens of thousands of entities, where as there are millions of users. We provide a simple way to resolve the problem of random walks being biased by the dominant edge type. The random walk is generated in two steps: (i) an edge type is chosen randomly from all possible edge types, (ii) an edge is randomly chosen from all edges of the selected edge type. This amounts to biasing the random walks uniformly with equal weights for each edge type. The procedure for learning representations of the nodes for the enriched graph is described in the Algorithm 2 and Algorithm 3.

In reality, different edge types contribute differently to the random walks and would have unequal weights. However, we do not have an intuitive way to obtain these weights. Some work has been done in this direction e.g. Metapath2vec metapath2vec, heterogeneous edge embeddings [23] etc.

Algorithm 2 Enriched Graph Embedding

Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), walk length 𝑙, walks per node 𝑟, embedding dimension 𝑑, context window size 𝑤. Generate a random number 𝑡 from 𝑈 𝑛𝑖𝑓 𝑜𝑟𝑚(0, 1).

If 𝑡 > 0.5 8:

next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node)) 9:

if 𝑡 ≤ 0.5 10:

next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node)) Output: Next node in the random walk next step Recommendation Model: Having learned the embeddings for the users, items and the entities, we now describe the model for recommendation. We define the recommendation problem as a binary classification model which learns the probability of a user clicking on an article. We use logistic regression model on the embeddings learned on the graph. The input feature vector to the logistic regression model is the Hadamard product of the user feature vector and the article feature vector. If both the user and the article were present in the enriched graph i.e. we have an embedding for them, then the user and article feature vectors are their respective graph embeddings. The purpose of a news recommendation system is to recommend new, unseen articles, and we may not have an embedding for the fresh articles. In this case, we use the average of the embeddings of the entities present in the article as the article feature vector. Thus the input feature to the logistic regression is in the same space as the graph embedding space.

If 𝐹𝑢 ∈ R 𝑑 is the vector representation of user 𝑢 and 𝐹𝑎 ∈ R 𝑑 be the vector representation of article 𝑎, then the input to the logistic regression model is a vector 𝑥 (𝑢,𝑎) = (𝑥1, 𝑥2, . . . 𝑥 𝑑 ) ∈ R 𝑑 defined as:

(𝑥 (𝑢,𝑎) )𝑖 = (𝐹𝑢)𝑖 * (𝐹𝑎)𝑖(1)

The user feature vector is defined as the user embedding vector 𝑉𝑢 learned via graph embeddings. The article feature vector 𝐹𝑎 is given by the graph embedding 𝑉𝑎 ∈ R 𝑑 if 𝑎 has an embedding. Else, if 𝐶(𝑎) is the set of entities in the contents of 𝑎

𝐹𝑎 = 1 |𝐶(𝑎)| ∑︁ 𝑐∈𝐶(𝑎) 𝑉𝑐(2)

The pipeline for recommendation system described above as three components:

• Feature Learning: A large part of the user-article interaction data between time 𝑡0 and 𝑡1 is taken to build the enriched graph and then embeddings are learned on this graph. e.g. the user activity on the application between Jan 2016 to Jan 2019. An alternate interpretation of the recommendation model is as the link prediction [24] in a user-article bipartite graph selected at a future time i.e. it contains users and new articles.

RESULTS AND ANALYSIS

In this section, we provide discussion on evaluation and analysis of the graph embeddings and the recommendation model.

Experimental Setup

The data for this experiment was taken from a real-world news aggregation mobile application X 1 . The news content on X is presented as a scroll-able list which shows the headline and first couple of lines of the article. The user can read the full article by clicking on the shown snippet for that article.

During their activity on the platform, users click on the displayed article snippet if they want to read the full story or they scroll past it. We consider user clicking on an article as an indicator of their interest in the article and use the clicks as the positive data for the recommendation model. If a user scrolls past an article and does not click on it, we use that as the negative example. Thus, our model attempt to increase the click-through rate (CTR) as the metric. For learning the graph embeddings, we create an enriched graph using the activity of 500K users during the period of 3 months, adding 40K articles. There are also 6K entity nodes, which we extracted using the freely available Spacy API.

1 Name annonymized for the double-blind review. To learn embeddings of all the nodes in the enriched graph, we generate 30 random walks of length 100 for every node. The skip-gram model is trained using stochastic gradient (SGD) with a learning rate of 0.01 to minimize the negative sampling loss. Finally, we obtain an embedding of 128dimension for every node.

Multi-class Article Classification

We evaluate the feature representations obtained through graph embeddings on a standard supervised learning taskmulti-class classification of the news articles. News publishers often add some high level categories to the articles they publish e.g. Sports, Entertainment etc. We train a machine learning model on a the set of the labeled articles using their graph embeddings as input. The task is to predict the labels for the remaining articles. There are 8 news categories in our data -Business, Entertainment, Sports, Local, Tech, World, Lifestyle, Offbeat. In this experiment, we use a fraction of the labeled article nodes to train a multinomial logistic regression model with L2 regularization. Without explicitly using any content information, the performance of thee logistic regression model trained on node embeddings (node2vec) is similar to that of a model trained explicitly on content features e.g. word2vec [7] or doc2vec [12]. The comparison is presented in Table 1. The detailed results of the model are described in the confusion matrix in Table 2.

Recommendation Evaluation

The recommendation model is trained on a set of user-article pairs of positive examples (user clicked) and negative examples (no click) taken from the activity over a period of one month which is different from the data used for building the enriched graph. The logistic regression with L2 regularization is used for binary classification of the pairs as click or no-click. We then evaluate the model to predict whether a

CONCLUSION AND FUTURE WORK

In this work, we proposed a graph-based news recommendation system which is a hybrid of collaborative-filtering and content information-filtering. Our method employs graph embeddings to automatically learn latent representation of users and articles. We extend the user-item bipartite graph to contain names entities from the articles. The entities are expected to capture the user preferences at a finer-level. The embedding methods bring user, items and the entities in the same space. We evaluated the learned latent representations via classification of article nodes into 8 high-level categories. We show that without explicitly using the contents of the article, we achieve results comparable to the NLP based features. We also design and evaluate a recommendation model as a binary classification model for computing the likelihood of the user clicking on the article. This model performs better than the hybrid collaborative-filtering and word embeddings based article similarity model. Though the model performs satisfactorily, this work has focused on simplicity since the goal of this paper is to show the efficacy of graph embedding methods for content recommendation. The embedding method has two hyperparameters -walk length and number of walks per node -which we chose based on heuristics and did not learn them. There is work using attention mechanism [27] to learn the hyperparameters as an end-to-end system. The enriched network we used has two types of edges. We resolved the heterogeneity by giving equal importance to each edge type. Graph embeddings for heterogeneous networks is an active area of research e.g. metapath2vec [2], heterogeneous edge embeddings [23] etc. The logistic regression model for recommendation was chosen its simplicity, a more suitable approach would be a neural network based model which is either trained on Siamese like loss (two input -user and item embeddings) [28] or the triplet loss (three input -user prefers item 1 over item 2) [29]. We plan to address some of these issues in a future work.

1 : 4 :14Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦. 2: for 𝑖 = 0 to 𝑖 = 𝑟 do 3: for 𝑛 ∈ 𝑉 do Initialize curr walk to [𝑛].

5 : 7 : 9 :579while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do 6: curr step = curr walk[−1] adj list = 𝐸(curr step) 8: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(adj list) Append next step to curr walk 10:

1 : 4 :14Initialize 𝑤𝑎𝑙𝑘𝑠 to 𝐸𝑚𝑝𝑡𝑦. 2: for 𝑖 = 0 to 𝑖 = 𝑟 do 3: for 𝑛 ∈ 𝑉 do Initialize curr walk to [𝑛].

5 : 7 : 9 :579while 𝑠𝑡𝑒𝑝 = 𝑙 to 𝑙 do 6: curr step = curr walk[−1] next step = 𝑁 𝑒𝑥𝑡𝑁 𝑜𝑑𝑒(𝐺, curr step) 8: Append next step to curr walk Append curr walk to 𝑤𝑎𝑙𝑘𝑠 10: 𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚(𝑤𝑎𝑙𝑘𝑠, 𝑤, 𝑑) Output: Embeddings of every node 𝑛 ∈ 𝑉 , 𝑣𝑛 ∈ R 𝑑 Algorithm 3 NextNode Input: Enriched graph 𝐺 = (𝑈, 𝐴, 𝐶, 𝐸, 𝐸 ′ ), curr node 1: If curr node ∈ 𝑈 2: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸(curr node)) 3: else if curr node ∈ 𝐶 4: next step = 𝑅𝑎𝑛𝑑𝑜𝑚𝑆𝑎𝑚𝑝𝑙𝑒(𝐸 ′ (curr node)) 5: else if curr node ∈ 𝐴 6:

• Recommendation Model Training: The logistic regression model for the computing probability of a user clicking on an article is trained on a different sub-graph which comprises of interaction data between time 𝑡1 and 𝑡2 e.g. between Jan 2019 to Jun 2019. A portion of this training data is held out to validate the model. • Evaluation and Deployment: The trained model is then evaluated on the unseen data by considering the predictions of the model on usage between June 2019 to July 2019. The model is then deployed to the application.

Table 1 :1Comparison of Embeddings

Features Precision Recall F-1word2vec0.900.920.91doc2vec0.910.920.91node2vec0.880.890.88

Table 2 :2Confusion Matrix of the EvaluationCategoryPrecision Recall F-1Business0.830.710.77Sports0.900.920.91Entertainment0.970.950.96Tech0.860.830.84Local0.880.950.91World0.820.680.75Lifestyle0.930.850.89Offbeat0.870.620.72Average0.880.890.88

Table 3 :3Comparison of Recommendation Models -article pairs were clicked on not. We measure the performance of the model by computing the Area Under the ROC curve (AUC) and Precision at 5. The model shows a significant improvement over the previously used approach which is a combination of collaborative-filtering and word embeddings (Hybrid CF and Content). One major advantage is that our model is that it naturally blends collaborativefiltering and content-based filtering by putting both the user, item, and the content into the same space.ModelAUC Precision@5Hybrid CF and Content0.720.89Enriched graph embeddings 0.870.97set of user

Heterogeneous network embedding via deep architectures SChang WHan JTang GJQi CCAggarwal TSHuang Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM 2015 Metapath2vec: Scalable representation learning for heterogeneous networks YDong NVChawla ASwami Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

New York, NY, USA

ACM 2017 KDD '17 node2vec: Scalable feature learning for networks AGrover JLeskovec CoRR abs/1607.00653 2016 Representation learning on graphs: Methods and applications WLHamilton RYing JLeskovec CoRR abs/1709.05584 2017 Deepwalk: Online learning of social representations BPerozzi RAl-Rfou SSkiena Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

New York, NY, USA

ACM 2014 KDD '14 A survey on network embedding PCui XWang JPei WZhu IEEE Transactions on Knowledge and Data Engineering 2018. 2018 Efficient estimation of word representations in vector space TMikolov KChen GCorrado JDean CoRR abs/1301.3781 2013 Google news personalization: scalable online collaborative filtering ASDas MDatar AGarg SRajaram Proceedings of the 16th international conference on World Wide Web the 16th international conference on World Wide Web 2007 Matrix factorization for recommender systems YehudaKoren RobertBell ChrisVolinsky Group-Lens: Applying collaborative filtering to usenet news JAKonstan BNMiller DMaltz JLHerlocker LRGordon JRiedl Commun. ACM 40 1997 Tomas Mikolov : Distributed Representations of Sentences and Documents VQuoc Le Proceedings of the 31 st International Conference on Machine Learning (ICML) the 31 st International Conference on Machine Learning (ICML)

Beijing, China

2014 User Modeling for Adaptive News Access, User Modeling and User-Adapted Interaction DBillsus MJPazzani 2000 10 Learning User Profiles for Personalized Information Dissemination ATan CTee Proceedings of 1998 IEEE International Joint conference on Neural Networks 1998 IEEE International Joint conference on Neural Networks May 1998 Evaluating adaptive user profiles for news classification RCarreira JMCrato DGon?alves JAJorge Proceedings of the 9th international conference on Intelligent user interfaces the 9th international conference on Intelligent user interfaces 2004 A hybrid user model for news story classification DBillsus ;MPazzani Proceedings of the Seventh International Conference on User Modeling the Seventh International Conference on User Modeling 1999 Combining Content-Based and Collaborative Filters in an Online Newspaper MClaypool AGokhale TMiranda PMurnikov DNetes MSartin Proceedings of ACM SIGIR Workshop on Recommender Systems ACM SIGIR Workshop on Recommender Systems 1999 Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time ChantatEksombatchai PranavJindal JerryZitaoLiu YuchenLiu RahulSharma CharlesSugnet MarkUlrich JureLeskovec Inductive Representation Learning on Large Graphs WilliamLHamilton RexYing JureLeskovec 31st Conference on Neural Information Processing Systems (NIPS 2017)

Long Beach, CA, USA

Collaborative Topic Modeling for Recommending Scientific Articles, KDD11 ChongWang ChongWang August 2124, 2011 San Diego, California, USA Selecting keywords for content based recommendation ChristianWartena WoutSlakhorst MartinWibbels Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM) the 19th ACM international conference on Information and knowledge management (CIKM)

Toronto, ON, Canada

October 26 -30, 2010 Spacy -Industrial-Strength Natural Language Processing Heterogeneous Edge Embeddings for Friend Recommendation JanuVerma SrishtiGupta DebdootMukherjee TanmoyChakraborty April 2019 ECIR Cologne The link-prediction problem for social networks DLiben-Nowell JKleinberg Journal of the American society for information science and technology 58 7 2007 hdbscan: Hierarchical density based clustering LelandMcinnes JohnHealy SteveAstels The Journal of Open Source Software 2 11 2017 Visualizing High-Dimensional Data Using t-SNE LJ PVan Der Maaten GEHinton Journal of Machine Learning Research 9 Nov. 2008 Watch Your Step: Learning Node Embeddings via Graph Atten SamiAbu-El-Haija BryanPerozzi RamiAl-Rfou AlexAlemi tion, 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

Montral, Canada

DeepFace: Closing the Gap to Human-Level Performance in Face Verification YanivTaigman MingYang Marc' AurelioRanzato LiorWolf Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2014 FaceNet: A Unified Embedding for Face Recognition and Clustering FlorianSchroff DmitryKalenichenko JamesPhilbin Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2015