Trend-responsive User Segmentation Enabling Traceable
    Publishing Insights. A Case Study of a Real-world Large-scale
                    News Recommendation System
                  Joanna Misztal-Radecka                                               Dominik Rusiecki
                Ringier Axel Springer Polska                                     Ringier Axel Springer Polska
         AGH University of Science and Technology                            dominik.rusiecki@ringieraxelspringer.pl
        joanna.misztal-radecka@ringieraxelspringer.pl

                         Michal Żmuda                                                     Artur Bujak
                 Ringier Axel Springer Polska                                     Ringier Axel Springer Polska
              michal.zmuda@ringieraxelspringer.pl                              artur.bujak@ringieraxelspringer.pl
ABSTRACT                                                              for current real-world systems for several reasons discussed
The traditional offline approaches are no longer sufficient for       below.
building modern recommender systems in domains such as                   For instance, as observed by [16], classic matrix factoriza-
online news services, mainly due to the high dynamics of              tion is no longer sufficient for many modern recommendation
environment changes and necessity to operate on a large scale         scenarios. In particular, aspects such as “scarce feedback, dy-
with high data sparsity. The ability to balance exploration           namic catalogue and time-sensitivity”, including popularity
with exploitation makes the multi-armed bandits an efficient          trends and interests changes, have been mentioned as factors
alternative to the conventional methods, and a robust user            that require “continuous and fast learning” not sufficiently
segmentation plays a crucial role in providing the context            addressed by these traditional approaches. As further noticed
for such online recommendation algorithms. In this work, we           by [18], such offline recommender systems are particularly
present an unsupervised and trend-responsive method for seg-          unsuitable for generating recommendations in domains such
menting users according to their semantic interests, which has        as news services due to the need of real-time processing at a
been integrated with a real-world system for large-scale news         large scale and dynamic changes in recency and popularity of
recommendations. The results of an online A/B test show               items. The inability to follow popularity trends is particularly
significant improvements compared to a global-optimization            troublesome, as it has been found that user preferences are
algorithm on several services with different characteristics.         not constant but are influenced by temporal factors such as
Based on the experimental results as well as the exploration          time of day, day of the week or the season. A large-scale
of segments descriptions and trend dynamics, we propose               study on Polish Internet users [22] found that users browse
extensions to this approach that address particular real-world        more items related to culture and entertainment during their
challenges for different use-cases. Moreover, we describe a           work time than at home. Some seasonal holidays also have an
method of generating traceable publishing insights facilitat-         impact on the type of consumed items. For instance, shopping
ing the creation of content that serves the diversity of all          offers and inspirations are more popular before Black Friday
users needs.                                                          and Christmas while photo galleries are preferred during
                                                                      summer holidays. Moreover, news topics have a significant
KEYWORDS                                                              impact on the popularity of items—events such as political
                                                                      elections or Olympic Games significantly influence the users’
user segmentation, topic modeling, trend responsiveness, rec-
                                                                      interests.
ommender system, news recommendation, contextual multi-
                                                                         Another critical issue which is not addressed by the stan-
armed bandit, large scale, case study
                                                                      dard techniques is data sparsity. Online services provide a
                                                                      vast number of items, but only a few are read by particular
1    INTRODUCTION                                                     visitors. Hence, generating recommendation lists for users
Why user segmentation?                                                with a short or, in a cold-start scenario, no browsing history
In order to understand why user segmentation is a crucial             becomes a critical problem. Having noticed the insufficiency
component of a modern real-world recommender system, it               of offline collaborative filtering approaches, one could con-
is essential to review the context of the recommendation              sider popularity-based social recommender systems. They
problem as a whole. It could be argued that it is not neces-          are designed to address the trend-responsiveness requirement
sary to consider any user segmentation in a recommender               and are capable of generating recommendations for less ac-
system, and such an approach is applied in many traditional           tive users by making use of the crowd wisdom. However, as
recommendation methods. However, this claim may not hold              noted by [6], over-exploitation in such scenarios may lead to
Copyright © 2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                      the information bubble effect as users become homogenized
INRA’19, September, 2019, Copenhagen, Denmark                                                                            J. Misztal-Radecka et al.


according to their interests so that similar preferences groups
are constantly provided with the same types of items.
   Another approach which has recently attracted much at-
tention is based on the multi-armed bandit algorithm. Chiefly,
the ability to balance exploration with exploitation makes                                                      1. Request
multi-armed bandits a promising solution for this type of                                               (with placement & user ID)

problems — they provide stable recommendation quality due
to the exploitation component while responding to changing
popularity trends in the exploration phase. Due to their high                                            Recommendations API
efficiency and scalability, the bandit algorithms have been
successfully applied to large-scale real-world recommender sce-
narios ([17] [10] [16] [20]). However, as further noted by [25],            2. Resolve segments               3. Resolve                4. Resolve
global optimization approaches may introduce the tyranny of                                               reward function value      recommendation
the majority effect and thus cannot serve the diversity of all
users. Hence for bandit approaches the recommendations are                            Segments
often performed for groups of users with similar behavioral                              API
patterns. Additionally, other contextual factors such as type                                                  Performance
                                                                                                                    API
of website influence the user behavior, hence the approach to                                                                            Algorithm
representing their preferences should be suited to particular                   Periodically updates
                                                                                                           Combines simple                Service
                                                                               assignment of users to
use cases. In our solution, we have adapted the contextual                      the segments stored
                                                                                                         measures (often based
                                                                                                                                         Selected
bandit approach [17] to generating recommendations for dy-                                               on browser events) to
                                                                                     on Hadoop                                       recommendation
                                                                                                         calculate reward value
namically adjusted interests segments.                                                                  based on business' KPIs
                                                                                                                                         algorithm
                                                                                                                                      implementation
                                                                                                                                       (for instance
1.1     Contributions                                                                                                                e-Greedy bandit)
The long-term objective of our research is to build a scal-
                                                                                        Hadoop
able recommender system that can be applied in a dynamic
domain such as a news service. Towards this goal, we focus
here on presenting an unsupervised method for clustering                   Figure 1: Selected elements of the recommendation
users based on their semantic interests (Section 3) which                  resolution flow.
was successfully integrated with a real-time recommenda-
tion system described in Section 2. The proposed solution
has been used to personalize the largest Polish news service
Onet1 , with over 10 million real users2 and nearly 500 million
                                                                           2       RECOMMENDER SYSTEM
pageviews on the main page monthly [12], and has proven to                         CONTEXT
be:                                                                        In this section, we present an overview of the news recom-
    ∙ trend-responsive in terms of dynamic adaptation to                   mendations system. First, we present a simplified flow of
      currently popular topics,                                            the recommendations generation process. Next, we describe
    ∙ scalable in terms of number of users and generated                   the multi-armed bandit algorithm, which has been applied
      recommendations,                                                     to provide recommendations lists for user segments in our
    ∙ and effective, as it vastly improves the performance                 experiments (Section 4.1).
      of the news services.
   To prove the relevance of our method, the evaluation of                 2.1        Recommendation flow overview
proposed solution is performed with online A/B tests on sev-               The general flow of news recommendations may be simpli-
eral news sections with different characteristics as described             fied by describing the following three major steps that are
in Section 4.1. Based on experimental results (Section 4.3) as             performed for each request, as presented in Figure 1:
well as exploration of segments descriptions and trends dy-                       ∙ First, the user segments, stored on Hadoop, are fetched
namics, in Section 5 we propose extensions to this approach                         by a dedicated service that provides an online view
that address particular real-world challenges. Most notably,                        of the user-segment assignments (Figure 1,2.). This
we explain how our solution is used to generate traceable                           information, combined with the recommendation place-
and actionable publishing insights for enhancing article                            ment provided in the request, constitutes a complete
diversity. We compare our solution to the current state of                          recommendation context for the bandit algorithm.
the art in Section 6 and discuss how this approach may be                         ∙ Next, the reward for each item is calculated accord-
extended to further improve the service quality in Section 7.                       ing to a business-defined Key Performance Indicator
1
                                                                                    (KPI) formula (Figure 1,3.) processed in real-time and
 www.onet.pl
2
 Real users are different from cookies (which are often used to estimate            updated with sub-second latency. The reward function
a number of users) as each user may have several cookies.                           calculation is context-aware so that the item rewards
Trend-responsive User Segmentation Enabling Traceable Publishing Insights              INRA’19, September, 2019, Copenhagen, Denmark


      are computed for each segment and placement inde-                          and the recommendation placement 𝑑 (the destination
      pendently.                                                                 section).
    ∙ Finally, the recommendation algorithm configuration                  In the context of news recommendation, the item pool
      is determined for a given context, and a final list of            𝐴𝑡 in trial 𝑡 is represented by the set of available articles.
      recommendations is generated (Figure 1,4.). The con-              The algorithm aims at providing a list of articles which is
      figuration may define the appropriate algorithm as well           the most suitable for a given context, in order to maximize
      as other parameters such as exploration-exploitation              the reward 𝑝𝑡 . Since the item pool changes dynamically, the
      ratio, as described in Section 2.3.                               knowledge also needs to be constantly updated in order to
   For simplicity, we focus here on the most important stages           estimate the rewards for new items. Moreover, we adapt the
of the recommendation generation process and omit some                  trend-sensitivity of the algorithm by adjusting the knowledge
additional engineering challenges which had to be considered            window 𝑙.
in the real-world system implementation.
                                                                        3       USER SEGMENTATION
2.2     Scalability concerns                                                    ALGORITHM
To reduce latency, the flow presented in Section 2.1 is executed        As described in Section 2.3, we use a contextual multi-armed
only as often as necessary and the generated recommendations            bandit approach in which the context is represented by the
are cached for a short period of time for every recommenda-             recommendation placement as well as the user segment. In
tion context. The cache is refreshed asynchronously so that             this section, we describe the algorithm for building clusters
the system is resilient to temporal break-downs. To achieve             of users for which the recommendations are generated.
minimal latency, fresh recommendations populate the cache                  Our solution is based on the machine learning pipeline
in the background (in the meantime stale recommendations                concept, by extending PySpark ML python API3 , that en-
are returned).                                                          ables chaining multiple data transforming operations into
   Since the recommendations are served from an in-memory               one. Such a modular system may be easily extended with dif-
cache, extremely low latency is guaranteed (retrieval from              ferent encapsulated components within a common interface
cache takes less than ten milliseconds) and scalability is              and combined with other processes. We build data process-
achieved as recommendations are generated only once per                 ing and modeling pipelines by incorporating available ML
context, each of which is shared by thousands of users.                 transformers and custom data preparation steps for filtering
                                                                        user events and processing article texts. Since our solution is
                                                                        integrated with a distributed environment of Hadoop cluster
2.3     Recommendation algorithm
                                                                        and due to large-scale computations, we use Apache Spark
The goal of a multi-armed bandit algorithm [3] is to maxi-              for data processing. The main stages of the proposed solution
mize the total payoff 𝑃 which is a sum of single payoffs 𝑝𝑡             are described below.
achieved in each of the trials 𝑡 ∈ {1, . . . , 𝑇 }. In the context of
recommendation problem, in every trial 𝑡 a list of recommen-            3.1       Article topic model
dations is chosen from the set of available items 𝐴𝑡 based
on the knowledge about the payoffs of articles in 𝐴𝑡 from               We are primarily interested in retrieving universal user in-
previous trials, where the knowledge window 𝑙 defines how               terest profiles that are independent of website structure and
many previous trials 𝑡−𝑙, ..., 𝑡−1 are considered. We consider          language characteristics, considering latent semantic interest
an additional variable which represents the context of the              features. Hence in the first stage, we aim at discovering ab-
recommendation—this approach is known as the contextual                 stract topics within the collection of all articles texts from
bandit algorithm [17]. Thus, the contextual multi-armed ban-            the database. We apply Latent Dirichlet Allocation (LDA) [4]
dit for the recommendation problem may be defined by the                which is a generative statistical model of a corpus, that de-
following components:                                                   fines the representation of 𝑀 documents as a mixture of 𝑁
                                                                        abstract latent topics: 𝜃𝑚𝑛 , 𝑚 ∈ {1, . . . , 𝑀 }, 𝑛 ∈ {1, . . . , 𝑁 }.
  (1) Exploration-exploitation policy—balancing between the             Each of the topics is characterized by a distribution over 𝑉 ob-
      choice of items from 𝐴𝑡 to maximize a single payoff               served words (assuming Dirichlet priors): 𝜑𝑛𝑣 , 𝑣 ∈ {1, . . . , 𝑉 }.
      (based on the gained knowledge) and exploring new                 We define a topic description 𝑑𝑒𝑠𝑐(𝜑, 𝑛) as a list of top 4
      candidates with high potential. We use the 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦             words from 𝜑𝑛𝑣 sorted in descending order.
      variant of the bandit algorithm in which the item with               As a preprocessing step, stopwords based on a predefined
      the highest payoff estimate 𝑝𝑡 is selected with probabil-         list as well as words that appear in less than 10 texts or
      ity 1−𝜖, and a random item is selected with probability           more than 10% of all documents are removed (the thresholds
      𝜖.                                                                are selected arbitrarily). Next, the text is normalized to
  (2) Reward function 𝑃 —the reward may be defined as                   lowercase and words shorter than three characters and with
      a custom business objective metric, depending on a                non-alphabetic characters are filtered out. We preprocess and
      particular use case.                                              lemmatize tokens with SpaCy Python library extended to
  (3) Context 𝐶—in our case the recommendation context
      is defined by the segment of users 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾}        3
                                                                            https://spark.apache.org/docs/latest/ml-pipeline.html
INRA’19, September, 2019, Copenhagen, Denmark                                                                J. Misztal-Radecka et al.


support Polish language 4 . For words with ambiguous base            Axel Springer Polska, including the news service Onet and
forms, the first form in alphabetic order is returned.               other websites, from anonymous users who accepted our
                                                                     cookie policy and terms of use. The data is stored on a
3.2        User interest profiles                                    Hadoop cluster. Each record in the history table represents
We construct user behavioral profiles by averaging the vec-          an interaction between a user (represented by a cookie) and
tors of the articles in their browsing history. Thus, the            an item (when an article was viewed by a user). The user
resulting user profile describes the user’s average interest         profiles are calculated daily from 14-day browsing histories
in each of the latent topics from the LDA representation:            to represent medium-length user interests. Only users who
𝜃𝑢𝑖 = 𝐴𝑣𝑔(𝜃𝑚 ), 𝑚 ∈ 𝐴𝑢𝑖 , where 𝐴𝑢𝑖 are indices of articles          viewed at least two articles during this period are considered,
in the user’s 𝑢𝑖 history. We used the average of the vectors         resulting in approximately 13 million users scored daily and
rather than the sum, as it provides feature normalization            over 30 000 items in their browsing history. Article texts are
in the context of user activity (the vectors represent user          in Polish and cover a wide range of topics (such as news,
interests independently of how many articles they read). To          sports, business, and entertainment) and content types (such
avoid dominance of popular topics in the user profile repre-         as long texts, videos, and gallery descriptions).
sentation and to extract their unique interest characteristics,
we additionally apply vector standardization to ensure unit          4.1    Experiment setup
standard deviation and zero mean.                                    To perform A/B testing, the traffic is randomly split be-
                                                                     tween experiment variants. Each variant is defined by a tuple
3.3        User segments                                             (𝑑, 𝑆, 𝑅), where 𝑑 is the destination section, 𝑆 is the set
In order to produce user segments in an unsupervised way,            of user segments, and 𝑅 is the recommendation algorithm
we apply the bisecting k-Means algorithm to their profiles           configuration. In this experiment, we compare the following
described in Section 3.2. The algorithm is a hierarchical            recommendation configurations:
variant of the popular k-Means clustering [15] with a divisive
approach: it starts with a single cluster and performs bisecting       (1) Destination section 𝑑 is one of the 6 selected website
splits recursively until the desired number of groups is reached.          thematic sections: general, news, sports, travel, auto-
As shown in [24], the bisecting k-Means algorithm generally                motive and entertainment.
outperforms other clustering techniques in terms of clusters           (2) Recommendation algorithm 𝑅 is one of the following:
quality and run time, while it tends to produce segments of                ∙ Random—a baseline variant that returns items in a
relatively uniform size. We use a variant of this algorithm                  random order,
where larger clusters get higher priority during the split. Only           ∙ 𝜖−𝑔𝑟𝑒𝑒𝑑𝑦—contextual multi-armed bandit algorithm
users with at least five pageviews during the analyzed period                described in Section 2.3. The context of the bandit
are considered for model training.                                           algorithm is defined by the tuple (𝑑, 𝑠𝑘 ), where 𝑑
   One of the advantages of using topic modeling technique is                is the destination section and 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾} is
the ability to generate interpretable topics descriptions (as de-            the user segment. Additionally, since segments are
scribed in Section 3.1). For each segment 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾},           assigned only for users who were active recently, we
we define its representation 𝜃𝑘𝑛 as the average topic dis-                   define an extra segment 𝑠0 for new users without
tribution of included users: 𝜃𝑘𝑛 = 𝐴𝑣𝑔(𝜃𝑢𝑖 ), 𝑢𝑖 ∈ 𝑠𝑘 . We                   known history.
further use this distribution to provide characteristics of            (3) Segments set 𝑆 is a set of user segments 𝑠𝑘 , 𝑘 ∈ {0, . . . , 𝐾}
resulting user segments by retrieving the descriptions of top-             for the general segmentation (described in Section 3)
ics above global average for each cluster center: 𝑑𝑒𝑠𝑐(𝑠𝑘 ) =              or an empty set (global optimization for all users).
{𝑑𝑒𝑠𝑐(𝜑, 𝑛)}, 𝜃𝑘𝑛 > 0.                                               Thus, we aim to compare the performance of contextual
                                                                     𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦, a non-contextualized 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 (which is a global
4       INITIAL EVALUATION                                           popularity-based baseline), and a referential random algo-
In this section, we describe the process of online experiments       rithm on the six sections mentioned above. In the following
involving a real-world recommendation system. We have                section we describe the parameter choice for recommendation
decided to perform online A/B tests which are capable of             algorithms 𝑅 and segmentation 𝑆.
representing the dynamic nature of news recommendation
scenarios (such as trend-responsiveness). We believe that this       4.2    Parameters selection
method is more appropriate than offline tests for an end-to-         We perform a two-fold parameter selection procedure by
end system evaluation. The goal of this test is to evaluate the      selecting the configuration for the contextual multi-armed
general approach to user interests segmentation and indicate         bandit as well as the user segmentation algorithm.
further improvement directions based on particular use-case
analysis.                                                            4.2.1 Recommendation algorithm configuration. We use an
   For building the user segments, we use a private database         offline experiment simulation to efficiently select the contex-
of articles and events from multiple publisher sites of Ringier      tual multi-armed bandit algorithm (described in Section 2.3)
4
    https://spacy.io/                                                hyperparameters for different recommendation scenarios.
Trend-responsive User Segmentation Enabling Traceable Publishing Insights                                 INRA’19, September, 2019, Copenhagen, Denmark

                                                                                           25,0%

                                                                                                                         19,3%
                                                                                           20,0%
                        8,1
                                                                                           15,0%                                                                             13,8%

                                                                                                                                            10,4%            10,0%   10,1%
                       8,05                                                                                       9,4%             9,4%
                                                                                           10,0%                                                      7,8%
                                                                                                   7,4%
      log perplexity


                         8                                                                  5,0%
                                                                                                           2,4%

                                                                                            0,0%
                       7,95                                                                         1       2      3      4          5        6        7      8       9       10

                                                                                                                              % of users in segment
                        7,9


                       7,85                                                               Figure 3: Exemplary distribution of users in 10 seg-
                              10   20   30   40      50     60       70   80   90   100
                                                  Number of topics
                                                                                          ments, based on 14-day interest profiles.


                                                                                          results in some high-quality topics that are responsive to
Figure 2: Perplexity of a held-out documents dataset                                      dynamic publishing trends.
as function of topics count for LDA model trained                                            To avoid the information bubble effect and to provide
with 0.7M articles. The algorithm converges at ap-                                        sufficient statistics for per-segment contextual bandits (Sec-
proximately 50 topics.                                                                    tion 2.3) in real time, we selected a relatively small number of
                                                                                          𝐾 = 10 clusters which provides a satisfactory level of interest
                                                                                          consistency within segments while ensuring efficient training
                                                                                          of the real-time recommendation algorithm. Moreover, larger
   First, we estimate the initial parameters for calculating                              segments support recommendation diversity (which reduces
the algorithm payoffs 𝑝𝑡 in given recommendation scenario,                                risk of information bubble) as the optimization is performed
including traffic size (estimated number of views of items in                             for a wider interest group. The distribution of users in these
given time slot), item pool 𝐴𝑡 size (number of items available                            segments is shown in Figure 3.
in each trial), article lifetime (number of trials in which it
is available in the pool) and a distribution of the reward                                4.3      Results
function 𝑃 .                                                                              The results of a 30-day online experiment for six different
   Next, the simulation procedure is performed by running                                 sections of Onet home page are presented in Table 1. We com-
multiple iterations for different algorithm configurations. As                            pare the performance of analyzed algorithms to the random
a result, the optimal configuration for each of the recommen-                             baseline. Each of the experiments uses a custom business-
dation settings is returned, including the knowledge window                               defined KPI based on user engagement metrics (incorporating
𝑙 and the 𝜖 value for the 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 bandit algorithm.                                    pageviews, time spent on the website and bounce rate pe-
                                                                                          nalization), the details of which cannot be disclosed as it
4.2.2 Segmentation algorithm configuration . For configuring                              concerns business secrets.
the segmentation algorithm, first, we need to select an optimal                              The experiment results show that 𝜖 − 𝑔𝑟𝑒𝑒𝑑𝑦 recommen-
number of topics 𝑁 for the LDA algorithm. We use perplexity                               dations consistently outperform randomly generated lists for
to measure how well the word probability distribution of                                  all the experimental settings during the whole test period.
LDA model predicts a sample of held-out documents [14] for                                We observed some fluctuations in the performance of all the
different topic dimensionalities. The model is trained with a                             variants that may be caused by publishing trends affecting
random sample of 0.7M articles from the database. Figure 2                                user behavior (as shown in Figure 4). Additionally, for all
presents log perplexity for the LDA model with varying                                    the experiments the contextual bandit approach substan-
number of topics. Based on this analysis, we concluded that                               tially outperforms the globally optimized version. The most
the algorithm converges at approximately 50 topics.                                       considerable difference is achieved for the general thematic
   However as shown by [7], the predictive likelihood eval-                               section (+15.2 pp. vs. global optimization, measured as a
uation of topic models is often not correlated with human                                 relative improvement over the random baseline) as well as for
judgment, thus besides measuring perplexity, we addition-                                 domain-specific sections with longer article lifetime (enter-
ally perform a qualitative analysis of the resulting topic                                tainment, travel, automotive). The semantic context has the
interpretability. In particular, we are interested in learn-                              smallest impact on highly time-sensitive sections (+1.9pp.
ing how well the topic model reflects content fluctuations                                difference to global optimization for news section and +6.2pp.
and changing publishing trends by analyzing daily topic dis-                              for sports). Additionally, further analysis of segment per-
tribution changes for selected news topics. Figure 4 shows                                formance showed that groups of users whose interests were
the daily changes for 3 out of 50 selected topics with high                               underrepresented in the articles pool for a given section, on
time-sensitivity. An analysis of their descriptions leads to                              average performed worse than others. We also noted that the
the conclusion that the proposed topic model configuration                                traffic size has a significant impact on algorithm performance.
INRA’19, September, 2019, Copenhagen, Denmark                                                                     J. Misztal-Radecka et al.

                  0,6
                                                                         (a)
                  0,4

                  0,2

                    0

                  -0,2
                                                                                       air smog pollution particulates norm
                   0,8
                                                                         (b)
                   0,6
                   0,4
                   0,2
                     0
                  -0,2

                                                                                                movie role Oscar prize love
                   0,8
                                                                         (c)
                   0,6
                   0,4
                   0,2
                     0
                  -0,2
                  -0,4                                                                             goal minute penalty ball


Figure 4: Daily changes in publishing trends for three exemplary topics selected from a 50-topic LDA model
for articles published between Feb 15th and March 15th 2019, calculated from standardized mean topic values
for each day in the month. (a) topic related to air pollution has peaks on days with high smog rates in Poland,
(b) a topic related to Oscar Awards with a peak on the day after the Oscar Gala, (c) topic related to football
with highest values during the weekends when popular matches are transmitted.

Table 1: A 30-day online experiment results for dif-                       hot topics (as shown in Figure 4) more than individual
ferent sections of the Onet home page. Results pre-                        preferences.
sented as a percentage increase in average daily op-                   (3) Need for fine-grained interests representation:
timized metric over a random recommender.                                  Interests in domain-specific thematic sections such as
                                                                           sports or technology do not necessarily match general
 Section characteristics   Contextual E-Greedy   Global E-Greedy           user preference groups. Based on the analysis of the
 General                                44.1%             28.9%            general segment descriptions (Table 2 left), we observe
 News                                   23.1%             21.2%            that for instance all users interested in entertainment
 Sports                                 23.6%             17.4%            are in the same segment, hence their more fine-grained
 Travel                                 72.1%             59.5%            interests in this domain cannot be recognized.
 Automotive                             36.3%             23.6%
 Entertainment                          74.8%             59.7%
                                                                   5     ADDRESSING REAL-WORLD
                                                                         CHALLENGES
                                                                   As observed in Section 4.3, due to the variety of recommen-
This may be caused by the fact that for a larger sample,           dation scenarios, a general-purpose user segmentation cannot
the exploration may be performed faster and the algorithm          serve the diversity of all business cases. Hence, in the follow-
converges more quickly.                                            ing sections, we propose some extensions to our method that
  To summarize, the analysis of the experiment results leads       are designed for particular use cases and address these real-
us to the following observations:                                  world challenges. First, we present two alternative variants of
  (1) Need for diversity: The general user segmentation            the segmentation algorithm which are aimed at representing
      gives the highest improvement for recommendations            domain-specific and time-sensitive user interests. Next, we
      within a wide thematic range (as shown in the ex-            propose an approach to address the lack of diversity in the
      periment results for the general section). Moreover,         item pool by indicating missing thematic areas.
      diversity of the item pool has an impact on the per-
      formance of particular segments — our analysis has           5.1         Segmentation variants
      revealed that users who cannot find articles relevant        As discussed, in order to obtain a more satisfying level of
      to their preferences become less active than others.         performance for some of the challenging sections, we had to
  (2) Need for time-sensitivity: Semantic interests have a         adopt different algorithms depending on the use-case. This
      lower impact on dynamical and time-sensitive sections        could be achieved by leveraging the modular, extendable ar-
      such as news or sports feeds (as shown in Table 1). User     chitecture described in Section 3 and resulted in three general
      behavior in such services may be influenced by short-        types of user segmentation which are shown in Figure 5 and
      term interest patterns caused by popularity trends and       described below.
Trend-responsive User Segmentation Enabling Traceable Publishing Insights                               INRA’19, September, 2019, Copenhagen, Denmark


               Article data                  Event data   Article data                     Event data           Article data                  Event data


                                                                             Join

                                                                                                                  Filter

             Topic modeling

                                  Join
                                                                                                              Topic modeling
                                                                                                                                   Join
                                                                         Topic modeling


                              Segmentation                               Segmentation                                          Segmentation


                                 Save                                        Save                                                 Save


                                  (a)                                         (b)                                                   (c)


Figure 5: Flow diagrams of the described segmentation variants: general (a), news-specific (b) and site-specific
(c)


   The interchangeability of these algorithms enables us to                          5.1.3 Domain-specific recommendations (Figure 5c). As shown
easily adjust the segmentation to different circumstances, e.g.                      in Table 1, besides the long and short-term topics, for some
by employing the general approach when launching personal-                           sections, there is also a need to accurately represent subcate-
ization on the entire page and the site-specific method for a                        gories in readers’ interests. The idea is to consider only the
small, thematic section.                                                             traffic and articles on a specific section of the page which
                                                                                     leads the reader to a set of topically related articles. To
5.1.1 General long-term users interests (Figure 5a). This al-
                                                                                     achieve this, we applied a slight modification to the original,
gorithm is described in Section 3 and tested in the first
                                                                                     general-topic architecture, which filters the articles based on
experiment. Its idea is the following: create topics for the
                                                                                     a section in which they appeared. This leads to a topic space
entire set of articles but cluster the users in the topic vector
                                                                                     and segments that capture the smaller sub-topics within a
space according to their activity in a recent limited period.
                                                                                     broader category. An example of this variant for the sports
This, on the one hand, ensures that the topics obtained
                                                                                     section is shown in Table 3.
are general (such as sports, politics, news, and others) but
on the other hand produces clusters which “follow” a user’s
interests as they change with time. The balance between news-
                                                                                     5.2       Insights generation
responsiveness and general topic representation is achieved                         Recommendation algorithms tend to improve user satisfaction
by adjusting the period to which user activity is limited.                          by providing the most suitable items according to their inter-
                                                                                    ests. However, to provide personalized recommendations lists,
5.1.2 Hot topics interests (Figure 5b). In our analyses we                          the diversity of the item pool should be sufficient to match par-
have discovered that sudden, popular events tend to attract                         ticular user needs. Since for the news domain the article fresh-
the attention of users regardless of their general interests (see                   ness is required, it is essential to provide meaningful and trace-
Figure 4). This seems to be confirmed by the insignificant                          able insights about the types of content that are currently
improvement in performance for the news section seen in                             missing, so that these shortages may be addressed by the
Table 1. In order to measure this phenomenon more closely                           content provider. In the simplest scenario, we assume that if a
and quantify its impact on the quality of recommendations                           group of users becomes less active, it may be caused by an in-
as well as enable more meaningful segment descriptions, we                          sufficient number of articles relevant to their interests. Hence
have created an alternative version of the segmentation. Its                        we address this issue by indicating segments that perform
main difference to the first, general approach is that the                          worse than the global average during each day and providing
topics are computed only after the articles are joined with                         their descriptions as the topics that are missing in the avail-
(and effectively filtered by) the user activity data. This in                       able article set along with titles of articles that they liked in
effect produces a topic vector space which more accurately                          the past. First, we calculate the average performance (regard-
captures these transient trends (see Table 2) and is capable                        ing the objective metric 𝑃 ) for each of the user segments 𝑠𝑘 :
of representing short-term user interests. Additionally, this                                   ∑︀𝑀
                                                                                                   𝑚=1 𝑃𝑢𝑚
                                                                                     𝑃 𝑠𝑘 =             𝑀
                                                                                                              , 𝑢𝑚 ∈ 𝑠𝑘 , 𝑘 ∈ {1, . . . , 𝐾}, 𝑚 ∈ {1, . . . , 𝑀 }.
method is more efficient due to a much smaller set of articles                                                                                ∑︀𝐾
                                                                                                                                               𝑘=1 𝑃𝑠𝑘
for which the topic space is computed.                                               Next, we calculate the average 𝑃 𝑆 =                           𝐾
                                                                                                                                                           , 𝑠𝑘 ∈ 𝑆 and
INRA’19, September, 2019, Copenhagen, Denmark                                                                           J. Misztal-Radecka et al.

Table 2: Comparison of the most popular topic words for two variants of segmentation: general (left) and
hot. The segments have been grouped by their general subject to increase readability. Explanation of some
of the terms follows below. The words for hot-topics segmentation relate to specific, recent events (such as
The Oscars) and contain more proper names.

 Subject        General topics                                                    Hot topics
                olympic, Olympics, medal, competition — win, tournament,          match, coach, player, team — breast, photo, Fabia2 ,
 Sport          team, final                                                       Skoda2
                race, rally, season, player — club, player, coach, footballer     driver, Kubica4 , Williams4 , car — match, coach, player,
                                                                                  team
                season, footballer, club, player — Legia6 , goal, coach, foot-
                baller
                Russia, Ukraine, Russian, USA — Tusk, Prime Minister, gas,        court, police, death, President — castle, city, hotel, age
 Politics       Donald
                President, PiS, Kaczyński1 , deputy — court, prosecution,       Olszewski1 , Jan1 , rape, police — court, police, death,
                accuse, sentence                                                 President
                city, urban, resident, street — tourist, water, city, island     Kaczyński1 , PiS, President, monument — church, Israel,
                                                                                 Trump, summit
              star, actress, picture, look — role, musician, record, song         star, love, beautiful, album — star, dance, Joanna5 , jour-
 Entertainment
                                                                                  nalist
                                                                                  Oscar, role, actress, award — Oscar, Biedronka3 , nomi-
                                                                                  nate
 Automotive     car, auto, engine, model — company, customer, shop, price         breast, photo, Fabia2 , Skoda2 — match, coach, player,
                                                                                  team
                photo, do, look, al — water, eat, product, coal                  hair, skin, color, face — organism, disease, vitamin, contain
 Other
                child, woman, family, home — man, perc, publish, photo           Warsaw, network, arrange, TV set — perc, retirement, bank,
                                                                                 amount
1 Jan Olszewski, Jaroslaw Kaczyński - Polish politicians;    2 Skoda Fabia - car model;         3 Biedronka - Polish supermarket;
4 Kubica, Williams - Formula One competitors;       5 Joanna Mazur - runner, participant of the Polish edition of Dancing with the Stars;
6 Legia - Polish football club


Table 3: Examples of domain-specific segments de-
scriptions for sports section. Some more specific                                        Hey! Users in the segment most interested in
                                                                                         season, footballer, club, player | Legia, goal, coach, footballer
interests are visible such as automotive (segment                                        are less active than others.
1), ski jumping (segment 2), general sports and                                          Consider writing more articles on these subjects.
                                                                                         Estimated users affected: 1.2M.
Olympics (segment 3), football — national (segment                                       Inspiration article: Copa America 2019: results and transmission
4) and international (segment 5).

 1   driver, race, Kubica, rally — mln, Euro, milion, company           Figure 6: Exemplary insight generated by the sys-
 2   jump, competition, contest, Malysz — Stoch, Kamil, competition, tem,
                                                                      jump based on the description in Section 5.2.
 3   Olympic Games, medal, child — competition, sportsman, prize, accident
 4   penalty kick, Borussia, host — Legia, Lech, Jagiellonia, Warsaw    6        DISCUSSION AND RELATED WORK
 5   Barcelona, Real Madrid — Manchester, United, City, League          As noticed by [18], the main challenges of news recommenda-
                                                                        tions are large-scale computations, high dynamics and popu-
                                                                        larity trends of news articles. The content changes in three
the standard deviation 𝑆𝑡𝑑(𝑃𝑆 ) of the performance for all              major news publishers have been analyzed by [5], showing
segments and we define that a segment 𝑠𝑘 is “unsatisfied”               that the publishing patterns are characterized by tempo-
 if 𝑃 𝑠𝑘 < 𝑃 𝑆 − 𝑆𝑡𝑑(𝑃𝑆 ). Finally, we retrieve the articles that       ral dimensions such as days and hours. For this reason, Li
were particularly interesting for this segment in the past by           et al. [18] proposes a scalable two-stage solution to news
calculating a performance metric 𝑃𝑎,𝑠𝑘 of each article 𝑎 in             recommendations. First, clusters of newly published articles
a given segment 𝑠𝑘 and we standardize this metric for all               are constructed, and then personalized recommendations are
segments. The titles of articles with the highest score for each        generated by retrieving items most relevant to the user’s
“unsatisfied” segment along with the segment descriptions               interest profile. This approach provides high efficiency for
and their cardinalities are presented in the form of textual            news recommendations; however, content-based recommen-
insights (as shown in Figure 6).                                        dations are not capable of responding to changing popularity
Trend-responsive User Segmentation Enabling Traceable Publishing Insights      INRA’19, September, 2019, Copenhagen, Denmark


trends. To address this problem, in [17] the authors propose       A/B test on several news sections with different characteris-
a contextual multi-armed bandit approach which is capable          tics. Based on the analysis of the results for particular use
of representing popularity trends as well as group interests       cases as well as qualitative analysis of segment descriptions
and apply it for large-scale news recommendations for Yahoo        and trend dynamics, we proposed further extensions of the
News module. They note that this approach outperformed             segmentation algorithm that address these real-world issues.
a standard context-free bandit algorithm by 12.5% in click            In future work, we plan to incorporate into the model other
ratio.                                                             types of behavioral features and content metadata. Moreover,
   To fully exploit the potential of contextual bandits, it is     we aim to improve the recommendation quality by exploring
essential to apply a suitable method for user segmentation.        different segmentation and recommendation techniques to
In [8], clusters of Yahoo users were built based on over a         address other real-world challenges such as the user cold-start
thousand categorical features describing their demographics        problem.
and behavioral patterns. However, such an arbitrary choice
of features is limited and does not represent the relations        REFERENCES
among distinct features. The unsupervised clustering tech-          [1] Jae-wook Ahn, Peter Brusilovsky, Jonathan Grady, Daqing He,
nique has been identified in [21] as the most flexible method           and Sue Yeon Syn. 2007. Open User Profiles for Adaptive News
                                                                        Systems: Help or Harm?. In Proceedings of the 16th International
for automatic detection of underlying behavioral patterns.              Conference on World Wide Web (WWW ’07). ACM, New York,
In [23] the authors scale up the user neighborhood formation            NY, USA, 11–20. https://doi.org/10.1145/1242572.1242575
                                                                    [2] Xiao Bai, B. Barla Cambazoglu, Francesco Gullo, Amin Mantrach,
process through the use of bisecting k-means clustering for an          and Fabrizio Silvestri. 2017. Exploiting Search History of Users
e-commerce application. In [9], a large-scale collaborative fil-        for News Personalization. Inf. Sci. 385, C (April 2017), 125–137.
tering recommender system for Google News personalization               https://doi.org/10.1016/j.ins.2016.12.038
                                                                    [3] D. A. Berry and B. Fristedt. 1985. Bandit Problems: Sequential
was built by applying several clustering techniques, and the            Allocation of Experiments. Chapman and Hall, London.
authors demonstrate efficacy and scalability of their system        [4] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent
with a real-world experiment on millions of users. A prob-              Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993–
                                                                        1022. http://dl.acm.org/citation.cfm?id=944919.944937
abilistic latent semantic analysis topic modeling technique         [5] Maria Carla Calzarossa and Daniele Tessera. 2015. Modeling and
for building clusters of users for online advertising has been          predicting temporal patterns of web content changes. Journal
                                                                        of Network and Computer Applications 56 (2015), 115 – 123.
presented in [13]. However, no additional content metadata              https://doi.org/10.1016/j.jnca.2015.06.008
has been incorporated, and in contrast to our approach, the         [6] Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. En-
interpretability of resulting segments is low.                          gelhardt. 2018. How Algorithmic Confounding in Recommenda-
                                                                        tion Systems Increases Homogeneity and Decreases Utility. In
   A summary of approaches to user modeling in Internet                 Proceedings of the 12th ACM Conference on Recommender
applications has been presented by Gauch et al. [11]. In                Systems (RecSys ’18). ACM, New York, NY, USA, 224–232.
recent approaches, the profile is usually inferred from user            https://doi.org/10.1145/3240323.3240370
                                                                    [7] Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong
behavior (such as content clicks [19] or web searches [2]), and         Wang, and David M. Blei. 2009. Reading Tea Leaves: How
the preferences are defined by the type of content read by              Humans Interpret Topic Models. In Proceedings of the 22Nd
                                                                        International Conference on Neural Information Processing
them. In [26], the authors introduced a Collaborative Topic             Systems (NIPS’09). Curran Associates Inc., USA, 288–296.
Modeling technique that combines the collaborating filtering            http://dl.acm.org/citation.cfm?id=2984093.2984126
approach with the content-based features extracted by topic         [8] Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit
                                                                        Phadke, Seinjuti Chakraborty, and Joe Zachariah. 2009. A Case
modeling. A user study has been presented in [1], showing               Study of Behavior-driven Conjoint Analysis on Yahoo!: Front
that profile transparency is an essential aspect of personalized        Page Today Module. In Proceedings of the 15th ACM SIGKDD
news systems. In our approach, we represent a user profile by           International Conference on Knowledge Discovery and Data
                                                                        Mining (KDD ’09). ACM, New York, NY, USA, 1097–1104.
a distribution over topics, which enables generating textual            https://doi.org/10.1145/1557019.1557138
descriptions of their interest segments.                            [9] Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam
                                                                        Rajaram. 2007. Google News Personalization: Scalable Online
                                                                        Collaborative Filtering. In Proceedings of the 16th International
                                                                        Conference on World Wide Web (WWW ’07). ACM, New York,
7   CONCLUSIONS AND FUTURE                                              NY, USA, 271–280. https://doi.org/10.1145/1242572.1242610
                                                                   [10] Deepak Agarwal. 2013. Recommending Items to Users: An Ex-
    WORK                                                                plore Exploit Perspective. http://www.ueo-workshop.com/wp-
                                                                        content/uploads/2013/10/UEO-Deepak.pdf.
We described a universal method for segmenting users ac-           [11] Susan Gauch, Mirco Speretta, Aravind Chandramouli, and
cording to their semantic interests. Our solution is based              Alessandro Micarelli. 2007. User Profiles for Personalized In-
on an unsupervised bisecting k-means clustering algorithm               formation Access. In The Adaptive Web: Methods and Strategies
                                                                        of Web Personalization. Springer Berlin Heidelberg, Berlin, Hei-
and is therefore capable of representing changing popularity            delberg, 54–89. https://doi.org/10.1007/978-3-540-72079-9 2
trends. Moreover, using the topic modeling technique enables       [12] Gemius Polska. 2019. Wyniki badania Gemius/PBI za styczeń
                                                                        2019. https://www.wirtualnemedia.pl/artykul/strony-glowne-
us to generate high-quality textual descriptions of users seg-          onet-i-wp-pl-leb-w-leb-w-dol-interia-o2-pl-i-tvn24-pl.
ments characteristics, which can provide traceable publishing      [13] Sahin Cem Geyik, Ali Dasdan, and Kuang-Chih Lee. 2015. User
insights for enhancing article diversity. This solution has             Clustering in Online Advertising via Topic Models. CoRR
                                                                        abs/1501.06595 (2015). arXiv:1501.06595 http://arxiv.org/abs/
been integrated with a large-scale news recommender system              1501.06595
for personalizing the largest Polish news service Onet. The        [14] Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010.
efficacy of our proposed system was evaluated in an online              Online Learning for Latent Dirichlet Allocation. In Proceedings
                                                                        of the 23rd International Conference on Neural Information
INRA’19, September, 2019, Copenhagen, Denmark                                                                   J. Misztal-Radecka et al.


     Processing Systems - Volume 1 (NIPS’10). Curran Associates             2018. Explore, Exploit, and Explain: Personalizing Explainable
     Inc., USA, 856–864. http://dl.acm.org/citation.cfm?id=2997189.         Recommendations with Bandits. In Proceedings of the 12th ACM
     2997285                                                                Conference on Recommender Systems (RecSys ’18). ACM, New
[15] Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clus-          York, NY, USA, 31–39. https://doi.org/10.1145/3240323.3240354
     tering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.    [21] Juni Nurma Sari, Lukito Nugroho, Ridi Ferdiana, and Paulus
[16] Jaya Kawale, Fernando Amat. 2018.              A Multi-Armed           Santosa. 2016. Review on Customer Segmentation Technique on
     Bandit Framework For           Recommendations at Netflix.             Ecommerce. Advanced Science Letters 22 (10 2016), 3018–3022.
     https://www.slideshare.net/JayaKawale/a-multiarmed-bandit-             https://doi.org/10.1166/asl.2016.7985
     framework-for-recommendations-at-netflix.                         [22] PBI. 2018. Polskie Badania Internetu. http://pbi.org.pl/raporty
[17] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire.             (July-December 2018).
     2010. A Contextual-bandit Approach to Personalized News Arti-     [23] Badrul M. Sarwar, George Karypis, Joseph Konstan, and John
     cle Recommendation. In Proceedings of the 19th International           Reidl. 2002. Recommender Systems for Large-Scale E-Commerce:
     Conference on World Wide Web (WWW ’10). ACM, New York,                 Scalable Neighborhood Formation Using Clustering. In Proceed-
     NY, USA, 661–670. https://doi.org/10.1145/1772690.1772758              ings of the 5th International Conference on Computer and
[18] Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padman-         Information Technology (ICCIT).
     abhan. 2011. SCENE: A Scalable Two-stage Personalized News        [24] Michael Steinbach, George Karypis, and Vipin Kumar. 2000. A
     Recommendation System. In Proceedings of the 34th Interna-             comparison of document clustering techniques. In In KDD Work-
     tional ACM SIGIR Conference on Research and Development                shop on Text Mining.
     in Information Retrieval (SIGIR ’11). ACM, New York, NY,          [25] Maurits van der Goes. 2018. Takeaways from Netflix’s Personaliza-
     USA, 125–134. https://doi.org/10.1145/2009916.2009937                  tion Workshop 2018. https://medium.com/rtl-tech/my-takeaways-
[19] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Per-           from-netflixs-personalization-workshop-2018-f564a19437b6.
     sonalized News Recommendation Based on Click Behavior. In         [26] Chong Wang and David M. Blei. 2011. Collaborative Topic
     Proceedings of the 15th International Conference on Intelligent        Modeling for Recommending Scientific Articles. In Proceedings of
     User Interfaces (IUI ’10). ACM, New York, NY, USA, 31–40.              the 17th ACM SIGKDD International Conference on Knowledge
     https://doi.org/10.1145/1719970.1719976                                Discovery and Data Mining (KDD ’11). ACM, New York, NY,
[20] James McInerney, Benjamin Lacker, Samantha Hansen, Karl                USA, 448–456. https://doi.org/10.1145/2020408.2020480
     Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra.