Trend-responsive User Segmentation Enabling Traceable Publishing Insights. A Case Study of a Real-world Large-scale News Recommendation System Joanna Misztal-Radecka Dominik Rusiecki Ringier Axel Springer Polska Ringier Axel Springer Polska AGH University of Science and Technology dominik.rusiecki@ringieraxelspringer.pl joanna.misztal-radecka@ringieraxelspringer.pl Michal Żmuda Artur Bujak Ringier Axel Springer Polska Ringier Axel Springer Polska michal.zmuda@ringieraxelspringer.pl artur.bujak@ringieraxelspringer.pl ABSTRACT for current real-world systems for several reasons discussed The traditional offline approaches are no longer sufficient for below. building modern recommender systems in domains such as For instance, as observed by [16], classic matrix factoriza- online news services, mainly due to the high dynamics of tion is no longer sufficient for many modern recommendation environment changes and necessity to operate on a large scale scenarios. In particular, aspects such as “scarce feedback, dy- with high data sparsity. The ability to balance exploration namic catalogue and time-sensitivity”, including popularity with exploitation makes the multi-armed bandits an efficient trends and interests changes, have been mentioned as factors alternative to the conventional methods, and a robust user that require “continuous and fast learning” not sufficiently segmentation plays a crucial role in providing the context addressed by these traditional approaches. As further noticed for such online recommendation algorithms. In this work, we by [18], such offline recommender systems are particularly present an unsupervised and trend-responsive method for seg- unsuitable for generating recommendations in domains such menting users according to their semantic interests, which has as news services due to the need of real-time processing at a been integrated with a real-world system for large-scale news large scale and dynamic changes in recency and popularity of recommendations. The results of an online A/B test show items. The inability to follow popularity trends is particularly significant improvements compared to a global-optimization troublesome, as it has been found that user preferences are algorithm on several services with different characteristics. not constant but are influenced by temporal factors such as Based on the experimental results as well as the exploration time of day, day of the week or the season. A large-scale of segments descriptions and trend dynamics, we propose study on Polish Internet users [22] found that users browse extensions to this approach that address particular real-world more items related to culture and entertainment during their challenges for different use-cases. Moreover, we describe a work time than at home. Some seasonal holidays also have an method of generating traceable publishing insights facilitat- impact on the type of consumed items. For instance, shopping ing the creation of content that serves the diversity of all offers and inspirations are more popular before Black Friday users needs. and Christmas while photo galleries are preferred during summer holidays. Moreover, news topics have a significant KEYWORDS impact on the popularity of items—events such as political elections or Olympic Games significantly influence the users’ user segmentation, topic modeling, trend responsiveness, rec- interests. ommender system, news recommendation, contextual multi- Another critical issue which is not addressed by the stan- armed bandit, large scale, case study dard techniques is data sparsity. Online services provide a vast number of items, but only a few are read by particular 1 INTRODUCTION visitors. Hence, generating recommendation lists for users Why user segmentation? with a short or, in a cold-start scenario, no browsing history In order to understand why user segmentation is a crucial becomes a critical problem. Having noticed the insufficiency component of a modern real-world recommender system, it of offline collaborative filtering approaches, one could con- is essential to review the context of the recommendation sider popularity-based social recommender systems. They problem as a whole. It could be argued that it is not neces- are designed to address the trend-responsiveness requirement sary to consider any user segmentation in a recommender and are capable of generating recommendations for less ac- system, and such an approach is applied in many traditional tive users by making use of the crowd wisdom. However, as recommendation methods. However, this claim may not hold noted by [6], over-exploitation in such scenarios may lead to Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). the information bubble effect as users become homogenized INRA’19, September, 2019, Copenhagen, Denmark J. Misztal-Radecka et al. according to their interests so that similar preferences groups are constantly provided with the same types of items. Another approach which has recently attracted much at- tention is based on the multi-armed bandit algorithm. Chiefly, the ability to balance exploration with exploitation makes 1. Request multi-armed bandits a promising solution for this type of (with placement & user ID) problems — they provide stable recommendation quality due to the exploitation component while responding to changing popularity trends in the exploration phase. Due to their high Recommendations API efficiency and scalability, the bandit algorithms have been successfully applied to large-scale real-world recommender sce- narios ([17] [10] [16] [20]). However, as further noted by [25], 2. Resolve segments 3. Resolve 4. Resolve global optimization approaches may introduce the tyranny of reward function value recommendation the majority effect and thus cannot serve the diversity of all users. Hence for bandit approaches the recommendations are Segments often performed for groups of users with similar behavioral API patterns. Additionally, other contextual factors such as type Performance API of website influence the user behavior, hence the approach to Algorithm representing their preferences should be suited to particular Periodically updates Combines simple Service assignment of users to use cases. In our solution, we have adapted the contextual the segments stored measures (often based Selected bandit approach [17] to generating recommendations for dy- on browser events) to on Hadoop recommendation calculate reward value namically adjusted interests segments. based on business' KPIs algorithm implementation (for instance 1.1 Contributions e-Greedy bandit) The long-term objective of our research is to build a scal- Hadoop able recommender system that can be applied in a dynamic domain such as a news service. Towards this goal, we focus here on presenting an unsupervised method for clustering Figure 1: Selected elements of the recommendation users based on their semantic interests (Section 3) which resolution flow. was successfully integrated with a real-time recommenda- tion system described in Section 2. The proposed solution has been used to personalize the largest Polish news service Onet1 , with over 10 million real users2 and nearly 500 million 2 RECOMMENDER SYSTEM pageviews on the main page monthly [12], and has proven to CONTEXT be: In this section, we present an overview of the news recom- ∙ trend-responsive in terms of dynamic adaptation to mendations system. First, we present a simplified flow of currently popular topics, the recommendations generation process. Next, we describe ∙ scalable in terms of number of users and generated the multi-armed bandit algorithm, which has been applied recommendations, to provide recommendations lists for user segments in our ∙ and effective, as it vastly improves the performance experiments (Section 4.1). of the news services. To prove the relevance of our method, the evaluation of 2.1 Recommendation flow overview proposed solution is performed with online A/B tests on sev- The general flow of news recommendations may be simpli- eral news sections with different characteristics as described fied by describing the following three major steps that are in Section 4.1. Based on experimental results (Section 4.3) as performed for each request, as presented in Figure 1: well as exploration of segments descriptions and trends dy- ∙ First, the user segments, stored on Hadoop, are fetched namics, in Section 5 we propose extensions to this approach by a dedicated service that provides an online view that address particular real-world challenges. Most notably, of the user-segment assignments (Figure 1,2.). This we explain how our solution is used to generate traceable information, combined with the recommendation place- and actionable publishing insights for enhancing article ment provided in the request, constitutes a complete diversity. We compare our solution to the current state of recommendation context for the bandit algorithm. the art in Section 6 and discuss how this approach may be ∙ Next, the reward for each item is calculated accord- extended to further improve the service quality in Section 7. ing to a business-defined Key Performance Indicator 1 (KPI) formula (Figure 1,3.) processed in real-time and www.onet.pl 2 Real users are different from cookies (which are often used to estimate updated with sub-second latency. The reward function a number of users) as each user may have several cookies. calculation is context-aware so that the item rewards Trend-responsive User Segmentation Enabling Traceable Publishing Insights INRA’19, September, 2019, Copenhagen, Denmark are computed for each segment and placement inde- and the recommendation placement 𝑑 (the destination pendently. section). ∙ Finally, the recommendation algorithm configuration In the context of news recommendation, the item pool is determined for a given context, and a final list of 𝐴𝑡 in trial 𝑡 is represented by the set of available articles. recommendations is generated (Figure 1,4.). The con- The algorithm aims at providing a list of articles which is figuration may define the appropriate algorithm as well the most suitable for a given context, in order to maximize as other parameters such as exploration-exploitation the reward 𝑝𝑡 . Since the item pool changes dynamically, the ratio, as described in Section 2.3. knowledge also needs to be constantly updated in order to For simplicity, we focus here on the most important stages estimate the rewards for new items. Moreover, we adapt the of the recommendation generation process and omit some trend-sensitivity of the algorithm by adjusting the knowledge additional engineering challenges which had to be considered window 𝑙. in the real-world system implementation. 3 USER SEGMENTATION 2.2 Scalability concerns ALGORITHM To reduce latency, the flow presented in Section 2.1 is executed As described in Section 2.3, we use a contextual multi-armed only as often as necessary and the generated recommendations bandit approach in which the context is represented by the are cached for a short period of time for every recommenda- recommendation placement as well as the user segment. In tion context. The cache is refreshed asynchronously so that this section, we describe the algorithm for building clusters the system is resilient to temporal break-downs. To achieve of users for which the recommendations are generated. minimal latency, fresh recommendations populate the cache Our solution is based on the machine learning pipeline in the background (in the meantime stale recommendations concept, by extending PySpark ML python API3 , that en- are returned). ables chaining multiple data transforming operations into Since the recommendations are served from an in-memory one. Such a modular system may be easily extended with dif- cache, extremely low latency is guaranteed (retrieval from ferent encapsulated components within a common interface cache takes less than ten milliseconds) and scalability is and combined with other processes. We build data process- achieved as recommendations are generated only once per ing and modeling pipelines by incorporating available ML context, each of which is shared by thousands of users. transformers and custom data preparation steps for filtering user events and processing article texts. Since our solution is integrated with a distributed environment of Hadoop cluster 2.3 Recommendation algorithm and due to large-scale computations, we use Apache Spark The goal of a multi-armed bandit algorithm [3] is to maxi- for data processing. The main stages of the proposed solution mize the total payoff 𝑃 which is a sum of single payoffs 𝑝𝑡 are described below. achieved in each of the trials 𝑡 ∈ {1, . . . , 𝑇 }. In the context of recommendation problem, in every trial 𝑡 a list of recommen- 3.1 Article topic model dations is chosen from the set of available items 𝐴𝑡 based on the knowledge about the payoffs of articles in 𝐴𝑡 from We are primarily interested in retrieving universal user in- previous trials, where the knowledge window 𝑙 defines how terest profiles that are independent of website structure and many previous trials 𝑡−𝑙, ..., 𝑡−1 are considered. We consider language characteristics, considering latent semantic interest an additional variable which represents the context of the features. Hence in the first stage, we aim at discovering ab- recommendation—this approach is known as the contextual stract topics within the collection of all articles texts from bandit algorithm [17]. Thus, the contextual multi-armed ban- the database. We apply Latent Dirichlet Allocation (LDA) [4] dit for the recommendation problem may be defined by the which is a generative statistical model of a corpus, that de- following components: fines the representation of 𝑀 documents as a mixture of 𝑁 abstract latent topics: 𝜃𝑚𝑛 , 𝑚 ∈ {1, . . . , 𝑀 }, 𝑛 ∈ {1, . . . , 𝑁 }. (1) Exploration-exploitation policy—balancing between the Each of the topics is characterized by a distribution over 𝑉 ob- choice of items from 𝐴𝑡 to maximize a single payoff served words (assuming Dirichlet priors): 𝜑𝑛𝑣 , 𝑣 ∈ {1, . . . , 𝑉 }. (based on the gained knowledge) and exploring new We define a topic description 𝑑𝑒𝑠𝑐(𝜑, 𝑛) as a list of top 4 candidates with high potential. We use the 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 words from 𝜑𝑛𝑣 sorted in descending order. variant of the bandit algorithm in which the item with As a preprocessing step, stopwords based on a predefined the highest payoff estimate 𝑝𝑡 is selected with probabil- list as well as words that appear in less than 10 texts or ity 1−𝜖, and a random item is selected with probability more than 10% of all documents are removed (the thresholds 𝜖. are selected arbitrarily). Next, the text is normalized to (2) Reward function 𝑃 —the reward may be defined as lowercase and words shorter than three characters and with a custom business objective metric, depending on a non-alphabetic characters are filtered out. We preprocess and particular use case. lemmatize tokens with SpaCy Python library extended to (3) Context 𝐶—in our case the recommendation context is defined by the segment of users 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾} 3 https://spark.apache.org/docs/latest/ml-pipeline.html INRA’19, September, 2019, Copenhagen, Denmark J. Misztal-Radecka et al. support Polish language 4 . For words with ambiguous base Axel Springer Polska, including the news service Onet and forms, the first form in alphabetic order is returned. other websites, from anonymous users who accepted our cookie policy and terms of use. The data is stored on a 3.2 User interest profiles Hadoop cluster. Each record in the history table represents We construct user behavioral profiles by averaging the vec- an interaction between a user (represented by a cookie) and tors of the articles in their browsing history. Thus, the an item (when an article was viewed by a user). The user resulting user profile describes the user’s average interest profiles are calculated daily from 14-day browsing histories in each of the latent topics from the LDA representation: to represent medium-length user interests. Only users who 𝜃𝑢𝑖 = 𝐴𝑣𝑔(𝜃𝑚 ), 𝑚 ∈ 𝐴𝑢𝑖 , where 𝐴𝑢𝑖 are indices of articles viewed at least two articles during this period are considered, in the user’s 𝑢𝑖 history. We used the average of the vectors resulting in approximately 13 million users scored daily and rather than the sum, as it provides feature normalization over 30 000 items in their browsing history. Article texts are in the context of user activity (the vectors represent user in Polish and cover a wide range of topics (such as news, interests independently of how many articles they read). To sports, business, and entertainment) and content types (such avoid dominance of popular topics in the user profile repre- as long texts, videos, and gallery descriptions). sentation and to extract their unique interest characteristics, we additionally apply vector standardization to ensure unit 4.1 Experiment setup standard deviation and zero mean. To perform A/B testing, the traffic is randomly split be- tween experiment variants. Each variant is defined by a tuple 3.3 User segments (𝑑, 𝑆, 𝑅), where 𝑑 is the destination section, 𝑆 is the set In order to produce user segments in an unsupervised way, of user segments, and 𝑅 is the recommendation algorithm we apply the bisecting k-Means algorithm to their profiles configuration. In this experiment, we compare the following described in Section 3.2. The algorithm is a hierarchical recommendation configurations: variant of the popular k-Means clustering [15] with a divisive approach: it starts with a single cluster and performs bisecting (1) Destination section 𝑑 is one of the 6 selected website splits recursively until the desired number of groups is reached. thematic sections: general, news, sports, travel, auto- As shown in [24], the bisecting k-Means algorithm generally motive and entertainment. outperforms other clustering techniques in terms of clusters (2) Recommendation algorithm 𝑅 is one of the following: quality and run time, while it tends to produce segments of ∙ Random—a baseline variant that returns items in a relatively uniform size. We use a variant of this algorithm random order, where larger clusters get higher priority during the split. Only ∙ 𝜖−𝑔𝑟𝑒𝑒𝑑𝑦—contextual multi-armed bandit algorithm users with at least five pageviews during the analyzed period described in Section 2.3. The context of the bandit are considered for model training. algorithm is defined by the tuple (𝑑, 𝑠𝑘 ), where 𝑑 One of the advantages of using topic modeling technique is is the destination section and 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾} is the ability to generate interpretable topics descriptions (as de- the user segment. Additionally, since segments are scribed in Section 3.1). For each segment 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾}, assigned only for users who were active recently, we we define its representation 𝜃𝑘𝑛 as the average topic dis- define an extra segment 𝑠0 for new users without tribution of included users: 𝜃𝑘𝑛 = 𝐴𝑣𝑔(𝜃𝑢𝑖 ), 𝑢𝑖 ∈ 𝑠𝑘 . We known history. further use this distribution to provide characteristics of (3) Segments set 𝑆 is a set of user segments 𝑠𝑘 , 𝑘 ∈ {0, . . . , 𝐾} resulting user segments by retrieving the descriptions of top- for the general segmentation (described in Section 3) ics above global average for each cluster center: 𝑑𝑒𝑠𝑐(𝑠𝑘 ) = or an empty set (global optimization for all users). {𝑑𝑒𝑠𝑐(𝜑, 𝑛)}, 𝜃𝑘𝑛 > 0. Thus, we aim to compare the performance of contextual 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦, a non-contextualized 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 (which is a global 4 INITIAL EVALUATION popularity-based baseline), and a referential random algo- In this section, we describe the process of online experiments rithm on the six sections mentioned above. In the following involving a real-world recommendation system. We have section we describe the parameter choice for recommendation decided to perform online A/B tests which are capable of algorithms 𝑅 and segmentation 𝑆. representing the dynamic nature of news recommendation scenarios (such as trend-responsiveness). We believe that this 4.2 Parameters selection method is more appropriate than offline tests for an end-to- We perform a two-fold parameter selection procedure by end system evaluation. The goal of this test is to evaluate the selecting the configuration for the contextual multi-armed general approach to user interests segmentation and indicate bandit as well as the user segmentation algorithm. further improvement directions based on particular use-case analysis. 4.2.1 Recommendation algorithm configuration. We use an For building the user segments, we use a private database offline experiment simulation to efficiently select the contex- of articles and events from multiple publisher sites of Ringier tual multi-armed bandit algorithm (described in Section 2.3) 4 https://spacy.io/ hyperparameters for different recommendation scenarios. Trend-responsive User Segmentation Enabling Traceable Publishing Insights INRA’19, September, 2019, Copenhagen, Denmark 25,0% 19,3% 20,0% 8,1 15,0% 13,8% 10,4% 10,0% 10,1% 8,05 9,4% 9,4% 10,0% 7,8% 7,4% log perplexity 8 5,0% 2,4% 0,0% 7,95 1 2 3 4 5 6 7 8 9 10 % of users in segment 7,9 7,85 Figure 3: Exemplary distribution of users in 10 seg- 10 20 30 40 50 60 70 80 90 100 Number of topics ments, based on 14-day interest profiles. results in some high-quality topics that are responsive to Figure 2: Perplexity of a held-out documents dataset dynamic publishing trends. as function of topics count for LDA model trained To avoid the information bubble effect and to provide with 0.7M articles. The algorithm converges at ap- sufficient statistics for per-segment contextual bandits (Sec- proximately 50 topics. tion 2.3) in real time, we selected a relatively small number of 𝐾 = 10 clusters which provides a satisfactory level of interest consistency within segments while ensuring efficient training of the real-time recommendation algorithm. Moreover, larger First, we estimate the initial parameters for calculating segments support recommendation diversity (which reduces the algorithm payoffs 𝑝𝑡 in given recommendation scenario, risk of information bubble) as the optimization is performed including traffic size (estimated number of views of items in for a wider interest group. The distribution of users in these given time slot), item pool 𝐴𝑡 size (number of items available segments is shown in Figure 3. in each trial), article lifetime (number of trials in which it is available in the pool) and a distribution of the reward 4.3 Results function 𝑃 . The results of a 30-day online experiment for six different Next, the simulation procedure is performed by running sections of Onet home page are presented in Table 1. We com- multiple iterations for different algorithm configurations. As pare the performance of analyzed algorithms to the random a result, the optimal configuration for each of the recommen- baseline. Each of the experiments uses a custom business- dation settings is returned, including the knowledge window defined KPI based on user engagement metrics (incorporating 𝑙 and the 𝜖 value for the 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 bandit algorithm. pageviews, time spent on the website and bounce rate pe- nalization), the details of which cannot be disclosed as it 4.2.2 Segmentation algorithm configuration . For configuring concerns business secrets. the segmentation algorithm, first, we need to select an optimal The experiment results show that 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 recommen- number of topics 𝑁 for the LDA algorithm. We use perplexity dations consistently outperform randomly generated lists for to measure how well the word probability distribution of all the experimental settings during the whole test period. LDA model predicts a sample of held-out documents [14] for We observed some fluctuations in the performance of all the different topic dimensionalities. The model is trained with a variants that may be caused by publishing trends affecting random sample of 0.7M articles from the database. Figure 2 user behavior (as shown in Figure 4). Additionally, for all presents log perplexity for the LDA model with varying the experiments the contextual bandit approach substan- number of topics. Based on this analysis, we concluded that tially outperforms the globally optimized version. The most the algorithm converges at approximately 50 topics. considerable difference is achieved for the general thematic However as shown by [7], the predictive likelihood eval- section (+15.2 pp. vs. global optimization, measured as a uation of topic models is often not correlated with human relative improvement over the random baseline) as well as for judgment, thus besides measuring perplexity, we addition- domain-specific sections with longer article lifetime (enter- ally perform a qualitative analysis of the resulting topic tainment, travel, automotive). The semantic context has the interpretability. In particular, we are interested in learn- smallest impact on highly time-sensitive sections (+1.9pp. ing how well the topic model reflects content fluctuations difference to global optimization for news section and +6.2pp. and changing publishing trends by analyzing daily topic dis- for sports). Additionally, further analysis of segment per- tribution changes for selected news topics. Figure 4 shows formance showed that groups of users whose interests were the daily changes for 3 out of 50 selected topics with high underrepresented in the articles pool for a given section, on time-sensitivity. An analysis of their descriptions leads to average performed worse than others. We also noted that the the conclusion that the proposed topic model configuration traffic size has a significant impact on algorithm performance. INRA’19, September, 2019, Copenhagen, Denmark J. Misztal-Radecka et al. 0,6 (a) 0,4 0,2 0 -0,2 air smog pollution particulates norm 0,8 (b) 0,6 0,4 0,2 0 -0,2 movie role Oscar prize love 0,8 (c) 0,6 0,4 0,2 0 -0,2 -0,4 goal minute penalty ball Figure 4: Daily changes in publishing trends for three exemplary topics selected from a 50-topic LDA model for articles published between Feb 15th and March 15th 2019, calculated from standardized mean topic values for each day in the month. (a) topic related to air pollution has peaks on days with high smog rates in Poland, (b) a topic related to Oscar Awards with a peak on the day after the Oscar Gala, (c) topic related to football with highest values during the weekends when popular matches are transmitted. Table 1: A 30-day online experiment results for dif- hot topics (as shown in Figure 4) more than individual ferent sections of the Onet home page. Results pre- preferences. sented as a percentage increase in average daily op- (3) Need for fine-grained interests representation: timized metric over a random recommender. Interests in domain-specific thematic sections such as sports or technology do not necessarily match general Section characteristics Contextual E-Greedy Global E-Greedy user preference groups. Based on the analysis of the General 44.1% 28.9% general segment descriptions (Table 2 left), we observe News 23.1% 21.2% that for instance all users interested in entertainment Sports 23.6% 17.4% are in the same segment, hence their more fine-grained Travel 72.1% 59.5% interests in this domain cannot be recognized. Automotive 36.3% 23.6% Entertainment 74.8% 59.7% 5 ADDRESSING REAL-WORLD CHALLENGES As observed in Section 4.3, due to the variety of recommen- This may be caused by the fact that for a larger sample, dation scenarios, a general-purpose user segmentation cannot the exploration may be performed faster and the algorithm serve the diversity of all business cases. Hence, in the follow- converges more quickly. ing sections, we propose some extensions to our method that To summarize, the analysis of the experiment results leads are designed for particular use cases and address these real- us to the following observations: world challenges. First, we present two alternative variants of (1) Need for diversity: The general user segmentation the segmentation algorithm which are aimed at representing gives the highest improvement for recommendations domain-specific and time-sensitive user interests. Next, we within a wide thematic range (as shown in the ex- propose an approach to address the lack of diversity in the periment results for the general section). Moreover, item pool by indicating missing thematic areas. diversity of the item pool has an impact on the per- formance of particular segments — our analysis has 5.1 Segmentation variants revealed that users who cannot find articles relevant As discussed, in order to obtain a more satisfying level of to their preferences become less active than others. performance for some of the challenging sections, we had to (2) Need for time-sensitivity: Semantic interests have a adopt different algorithms depending on the use-case. This lower impact on dynamical and time-sensitive sections could be achieved by leveraging the modular, extendable ar- such as news or sports feeds (as shown in Table 1). User chitecture described in Section 3 and resulted in three general behavior in such services may be influenced by short- types of user segmentation which are shown in Figure 5 and term interest patterns caused by popularity trends and described below. Trend-responsive User Segmentation Enabling Traceable Publishing Insights INRA’19, September, 2019, Copenhagen, Denmark Article data Event data Article data Event data Article data Event data Join Filter Topic modeling Join Topic modeling Join Topic modeling Segmentation Segmentation Segmentation Save Save Save (a) (b) (c) Figure 5: Flow diagrams of the described segmentation variants: general (a), news-specific (b) and site-specific (c) The interchangeability of these algorithms enables us to 5.1.3 Domain-specific recommendations (Figure 5c). As shown easily adjust the segmentation to different circumstances, e.g. in Table 1, besides the long and short-term topics, for some by employing the general approach when launching personal- sections, there is also a need to accurately represent subcate- ization on the entire page and the site-specific method for a gories in readers’ interests. The idea is to consider only the small, thematic section. traffic and articles on a specific section of the page which leads the reader to a set of topically related articles. To 5.1.1 General long-term users interests (Figure 5a). This al- achieve this, we applied a slight modification to the original, gorithm is described in Section 3 and tested in the first general-topic architecture, which filters the articles based on experiment. Its idea is the following: create topics for the a section in which they appeared. This leads to a topic space entire set of articles but cluster the users in the topic vector and segments that capture the smaller sub-topics within a space according to their activity in a recent limited period. broader category. An example of this variant for the sports This, on the one hand, ensures that the topics obtained section is shown in Table 3. are general (such as sports, politics, news, and others) but on the other hand produces clusters which “follow” a user’s interests as they change with time. The balance between news- 5.2 Insights generation responsiveness and general topic representation is achieved Recommendation algorithms tend to improve user satisfaction by adjusting the period to which user activity is limited. by providing the most suitable items according to their inter- ests. However, to provide personalized recommendations lists, 5.1.2 Hot topics interests (Figure 5b). In our analyses we the diversity of the item pool should be sufficient to match par- have discovered that sudden, popular events tend to attract ticular user needs. Since for the news domain the article fresh- the attention of users regardless of their general interests (see ness is required, it is essential to provide meaningful and trace- Figure 4). This seems to be confirmed by the insignificant able insights about the types of content that are currently improvement in performance for the news section seen in missing, so that these shortages may be addressed by the Table 1. In order to measure this phenomenon more closely content provider. In the simplest scenario, we assume that if a and quantify its impact on the quality of recommendations group of users becomes less active, it may be caused by an in- as well as enable more meaningful segment descriptions, we sufficient number of articles relevant to their interests. Hence have created an alternative version of the segmentation. Its we address this issue by indicating segments that perform main difference to the first, general approach is that the worse than the global average during each day and providing topics are computed only after the articles are joined with their descriptions as the topics that are missing in the avail- (and effectively filtered by) the user activity data. This in able article set along with titles of articles that they liked in effect produces a topic vector space which more accurately the past. First, we calculate the average performance (regard- captures these transient trends (see Table 2) and is capable ing the objective metric 𝑃 ) for each of the user segments 𝑠𝑘 : of representing short-term user interests. Additionally, this ∑︀𝑀 𝑚=1 𝑃𝑢𝑚 𝑃 𝑠𝑘 = 𝑀 , 𝑢𝑚 ∈ 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾}, 𝑚 ∈ {1, . . . , 𝑀 }. method is more efficient due to a much smaller set of articles ∑︀𝐾 𝑘=1 𝑃𝑠𝑘 for which the topic space is computed. Next, we calculate the average 𝑃 𝑆 = 𝐾 , 𝑠𝑘 ∈ 𝑆 and INRA’19, September, 2019, Copenhagen, Denmark J. Misztal-Radecka et al. Table 2: Comparison of the most popular topic words for two variants of segmentation: general (left) and hot. The segments have been grouped by their general subject to increase readability. Explanation of some of the terms follows below. The words for hot-topics segmentation relate to specific, recent events (such as The Oscars) and contain more proper names. Subject General topics Hot topics olympic, Olympics, medal, competition — win, tournament, match, coach, player, team — breast, photo, Fabia2 , Sport team, final Skoda2 race, rally, season, player — club, player, coach, footballer driver, Kubica4 , Williams4 , car — match, coach, player, team season, footballer, club, player — Legia6 , goal, coach, foot- baller Russia, Ukraine, Russian, USA — Tusk, Prime Minister, gas, court, police, death, President — castle, city, hotel, age Politics Donald President, PiS, Kaczyński1 , deputy — court, prosecution, Olszewski1 , Jan1 , rape, police — court, police, death, accuse, sentence President city, urban, resident, street — tourist, water, city, island Kaczyński1 , PiS, President, monument — church, Israel, Trump, summit star, actress, picture, look — role, musician, record, song star, love, beautiful, album — star, dance, Joanna5 , jour- Entertainment nalist Oscar, role, actress, award — Oscar, Biedronka3 , nomi- nate Automotive car, auto, engine, model — company, customer, shop, price breast, photo, Fabia2 , Skoda2 — match, coach, player, team photo, do, look, al — water, eat, product, coal hair, skin, color, face — organism, disease, vitamin, contain Other child, woman, family, home — man, perc, publish, photo Warsaw, network, arrange, TV set — perc, retirement, bank, amount 1 Jan Olszewski, Jaroslaw Kaczyński - Polish politicians; 2 Skoda Fabia - car model; 3 Biedronka - Polish supermarket; 4 Kubica, Williams - Formula One competitors; 5 Joanna Mazur - runner, participant of the Polish edition of Dancing with the Stars; 6 Legia - Polish football club Table 3: Examples of domain-specific segments de- scriptions for sports section. Some more specific Hey! Users in the segment most interested in season, footballer, club, player | Legia, goal, coach, footballer interests are visible such as automotive (segment are less active than others. 1), ski jumping (segment 2), general sports and Consider writing more articles on these subjects. Estimated users affected: 1.2M. Olympics (segment 3), football — national (segment Inspiration article: Copa America 2019: results and transmission 4) and international (segment 5). 1 driver, race, Kubica, rally — mln, Euro, milion, company Figure 6: Exemplary insight generated by the sys- 2 jump, competition, contest, Malysz — Stoch, Kamil, competition, tem, jump based on the description in Section 5.2. 3 Olympic Games, medal, child — competition, sportsman, prize, accident 4 penalty kick, Borussia, host — Legia, Lech, Jagiellonia, Warsaw 6 DISCUSSION AND RELATED WORK 5 Barcelona, Real Madrid — Manchester, United, City, League As noticed by [18], the main challenges of news recommenda- tions are large-scale computations, high dynamics and popu- larity trends of news articles. The content changes in three the standard deviation 𝑆𝑡𝑑(𝑃𝑆 ) of the performance for all major news publishers have been analyzed by [5], showing segments and we define that a segment 𝑠𝑘 is “unsatisfied” that the publishing patterns are characterized by tempo- if 𝑃 𝑠𝑘 < 𝑃 𝑆 − 𝑆𝑡𝑑(𝑃𝑆 ). Finally, we retrieve the articles that ral dimensions such as days and hours. For this reason, Li were particularly interesting for this segment in the past by et al. [18] proposes a scalable two-stage solution to news calculating a performance metric 𝑃𝑎,𝑠𝑘 of each article 𝑎 in recommendations. First, clusters of newly published articles a given segment 𝑠𝑘 and we standardize this metric for all are constructed, and then personalized recommendations are segments. The titles of articles with the highest score for each generated by retrieving items most relevant to the user’s “unsatisfied” segment along with the segment descriptions interest profile. This approach provides high efficiency for and their cardinalities are presented in the form of textual news recommendations; however, content-based recommen- insights (as shown in Figure 6). dations are not capable of responding to changing popularity Trend-responsive User Segmentation Enabling Traceable Publishing Insights INRA’19, September, 2019, Copenhagen, Denmark trends. To address this problem, in [17] the authors propose A/B test on several news sections with different characteris- a contextual multi-armed bandit approach which is capable tics. Based on the analysis of the results for particular use of representing popularity trends as well as group interests cases as well as qualitative analysis of segment descriptions and apply it for large-scale news recommendations for Yahoo and trend dynamics, we proposed further extensions of the News module. They note that this approach outperformed segmentation algorithm that address these real-world issues. a standard context-free bandit algorithm by 12.5% in click In future work, we plan to incorporate into the model other ratio. types of behavioral features and content metadata. Moreover, To fully exploit the potential of contextual bandits, it is we aim to improve the recommendation quality by exploring essential to apply a suitable method for user segmentation. different segmentation and recommendation techniques to In [8], clusters of Yahoo users were built based on over a address other real-world challenges such as the user cold-start thousand categorical features describing their demographics problem. and behavioral patterns. However, such an arbitrary choice of features is limited and does not represent the relations REFERENCES among distinct features. The unsupervised clustering tech- [1] Jae-wook Ahn, Peter Brusilovsky, Jonathan Grady, Daqing He, nique has been identified in [21] as the most flexible method and Sue Yeon Syn. 2007. Open User Profiles for Adaptive News Systems: Help or Harm?. In Proceedings of the 16th International for automatic detection of underlying behavioral patterns. Conference on World Wide Web (WWW ’07). ACM, New York, In [23] the authors scale up the user neighborhood formation NY, USA, 11–20. https://doi.org/10.1145/1242572.1242575 [2] Xiao Bai, B. Barla Cambazoglu, Francesco Gullo, Amin Mantrach, process through the use of bisecting k-means clustering for an and Fabrizio Silvestri. 2017. Exploiting Search History of Users e-commerce application. In [9], a large-scale collaborative fil- for News Personalization. Inf. Sci. 385, C (April 2017), 125–137. tering recommender system for Google News personalization https://doi.org/10.1016/j.ins.2016.12.038 [3] D. A. Berry and B. Fristedt. 1985. Bandit Problems: Sequential was built by applying several clustering techniques, and the Allocation of Experiments. Chapman and Hall, London. authors demonstrate efficacy and scalability of their system [4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent with a real-world experiment on millions of users. A prob- Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993– 1022. http://dl.acm.org/citation.cfm?id=944919.944937 abilistic latent semantic analysis topic modeling technique [5] Maria Carla Calzarossa and Daniele Tessera. 2015. Modeling and for building clusters of users for online advertising has been predicting temporal patterns of web content changes. Journal of Network and Computer Applications 56 (2015), 115 – 123. presented in [13]. However, no additional content metadata https://doi.org/10.1016/j.jnca.2015.06.008 has been incorporated, and in contrast to our approach, the [6] Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. En- interpretability of resulting segments is low. gelhardt. 2018. How Algorithmic Confounding in Recommenda- tion Systems Increases Homogeneity and Decreases Utility. In A summary of approaches to user modeling in Internet Proceedings of the 12th ACM Conference on Recommender applications has been presented by Gauch et al. [11]. In Systems (RecSys ’18). ACM, New York, NY, USA, 224–232. recent approaches, the profile is usually inferred from user https://doi.org/10.1145/3240323.3240370 [7] Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong behavior (such as content clicks [19] or web searches [2]), and Wang, and David M. Blei. 2009. Reading Tea Leaves: How the preferences are defined by the type of content read by Humans Interpret Topic Models. In Proceedings of the 22Nd International Conference on Neural Information Processing them. In [26], the authors introduced a Collaborative Topic Systems (NIPS’09). Curran Associates Inc., USA, 288–296. Modeling technique that combines the collaborating filtering http://dl.acm.org/citation.cfm?id=2984093.2984126 approach with the content-based features extracted by topic [8] Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit Phadke, Seinjuti Chakraborty, and Joe Zachariah. 2009. A Case modeling. A user study has been presented in [1], showing Study of Behavior-driven Conjoint Analysis on Yahoo!: Front that profile transparency is an essential aspect of personalized Page Today Module. In Proceedings of the 15th ACM SIGKDD news systems. In our approach, we represent a user profile by International Conference on Knowledge Discovery and Data Mining (KDD ’09). ACM, New York, NY, USA, 1097–1104. a distribution over topics, which enables generating textual https://doi.org/10.1145/1557019.1557138 descriptions of their interest segments. [9] Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07). ACM, New York, 7 CONCLUSIONS AND FUTURE NY, USA, 271–280. https://doi.org/10.1145/1242572.1242610 [10] Deepak Agarwal. 2013. Recommending Items to Users: An Ex- WORK plore Exploit Perspective. http://www.ueo-workshop.com/wp- content/uploads/2013/10/UEO-Deepak.pdf. We described a universal method for segmenting users ac- [11] Susan Gauch, Mirco Speretta, Aravind Chandramouli, and cording to their semantic interests. Our solution is based Alessandro Micarelli. 2007. User Profiles for Personalized In- on an unsupervised bisecting k-means clustering algorithm formation Access. In The Adaptive Web: Methods and Strategies of Web Personalization. Springer Berlin Heidelberg, Berlin, Hei- and is therefore capable of representing changing popularity delberg, 54–89. https://doi.org/10.1007/978-3-540-72079-9 2 trends. Moreover, using the topic modeling technique enables [12] Gemius Polska. 2019. Wyniki badania Gemius/PBI za styczeń 2019. https://www.wirtualnemedia.pl/artykul/strony-glowne- us to generate high-quality textual descriptions of users seg- onet-i-wp-pl-leb-w-leb-w-dol-interia-o2-pl-i-tvn24-pl. ments characteristics, which can provide traceable publishing [13] Sahin Cem Geyik, Ali Dasdan, and Kuang-Chih Lee. 2015. User insights for enhancing article diversity. This solution has Clustering in Online Advertising via Topic Models. CoRR abs/1501.06595 (2015). arXiv:1501.06595 http://arxiv.org/abs/ been integrated with a large-scale news recommender system 1501.06595 for personalizing the largest Polish news service Onet. The [14] Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010. efficacy of our proposed system was evaluated in an online Online Learning for Latent Dirichlet Allocation. In Proceedings of the 23rd International Conference on Neural Information INRA’19, September, 2019, Copenhagen, Denmark J. Misztal-Radecka et al. Processing Systems - Volume 1 (NIPS’10). Curran Associates 2018. Explore, Exploit, and Explain: Personalizing Explainable Inc., USA, 856–864. http://dl.acm.org/citation.cfm?id=2997189. Recommendations with Bandits. In Proceedings of the 12th ACM 2997285 Conference on Recommender Systems (RecSys ’18). ACM, New [15] Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clus- York, NY, USA, 31–39. https://doi.org/10.1145/3240323.3240354 tering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. [21] Juni Nurma Sari, Lukito Nugroho, Ridi Ferdiana, and Paulus [16] Jaya Kawale, Fernando Amat. 2018. A Multi-Armed Santosa. 2016. Review on Customer Segmentation Technique on Bandit Framework For Recommendations at Netflix. Ecommerce. Advanced Science Letters 22 (10 2016), 3018–3022. https://www.slideshare.net/JayaKawale/a-multiarmed-bandit- https://doi.org/10.1166/asl.2016.7985 framework-for-recommendations-at-netflix. [22] PBI. 2018. Polskie Badania Internetu. http://pbi.org.pl/raporty [17] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. (July-December 2018). 2010. A Contextual-bandit Approach to Personalized News Arti- [23] Badrul M. Sarwar, George Karypis, Joseph Konstan, and John cle Recommendation. In Proceedings of the 19th International Reidl. 2002. Recommender Systems for Large-Scale E-Commerce: Conference on World Wide Web (WWW ’10). ACM, New York, Scalable Neighborhood Formation Using Clustering. In Proceed- NY, USA, 661–670. https://doi.org/10.1145/1772690.1772758 ings of the 5th International Conference on Computer and [18] Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padman- Information Technology (ICCIT). abhan. 2011. SCENE: A Scalable Two-stage Personalized News [24] Michael Steinbach, George Karypis, and Vipin Kumar. 2000. A Recommendation System. In Proceedings of the 34th Interna- comparison of document clustering techniques. In In KDD Work- tional ACM SIGIR Conference on Research and Development shop on Text Mining. in Information Retrieval (SIGIR ’11). ACM, New York, NY, [25] Maurits van der Goes. 2018. Takeaways from Netflix’s Personaliza- USA, 125–134. https://doi.org/10.1145/2009916.2009937 tion Workshop 2018. https://medium.com/rtl-tech/my-takeaways- [19] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Per- from-netflixs-personalization-workshop-2018-f564a19437b6. sonalized News Recommendation Based on Click Behavior. In [26] Chong Wang and David M. Blei. 2011. Collaborative Topic Proceedings of the 15th International Conference on Intelligent Modeling for Recommending Scientific Articles. In Proceedings of User Interfaces (IUI ’10). ACM, New York, NY, USA, 31–40. the 17th ACM SIGKDD International Conference on Knowledge https://doi.org/10.1145/1719970.1719976 Discovery and Data Mining (KDD ’11). ACM, New York, NY, [20] James McInerney, Benjamin Lacker, Samantha Hansen, Karl USA, 448–456. https://doi.org/10.1145/2020408.2020480 Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra.