<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Temporal Evolution of Behavioral User Personas via Latent Variable Mixture Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nadia Fawaz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sunnyvale</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ajith Pudhiyaveetil Technicolor</institution>
          ,
          <addr-line>Palo Alto, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Statistics, University of Michigan</institution>
          ,
          <addr-line>Ann Arbor, MI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Snigdha Panigrahi</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>This work1 characterizes the users of a VoD streaming service through user-personas based on a tenure timeline and temporal behavioral features in the absence of explicit user profiles. A combination of tenure timeline and temporal characteristics caters to business needs of understanding the evolution and phases of user behavior as their accounts age. The personas constructed via latent variable mixture models successfully represent both dominant and niche characterizations while providing insightful maturation of user behavior in the system. With new users entering the system at any time point, the existing user-profiles are updated in our temporally evolving approach. The two major highlights of our personas are demonstration of stability along tenure timelines on a population level, while exhibiting interesting migrations between labels on an individual granularity and clear interpretability of user labels. Finally, we show a trade-of between an indispensable trio of guarantees, relevance-scalability-interpretability by using summary information from personas in a CTR (Click Through Rate) predictive model. The proposed method of uncovering latent personas, consequent insights from these and application of information from personas to predictive models are broadly applicable to other streaming based products.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Modeling and simulation; •
Computing methodologies → Machine learning.
user personas; temporal labels; personalization; CTR prediction;
mixture model.
1This work was performed while all three authors were with Technicolor Research,
CA, USA.</p>
      <p>IUI Workshops’19, March 20, 2019, Los Angeles, USA
© 2019 Copyright @ 2019 for the individual papers by the papers’ authors. Copying
permitted for private and academic purposes. This volume is published and copyrighted
by its editors.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        User segmentation, the idea of dividing a market up into
homogeneous segments and targeting each group with a distinct product or
message is a basic tool to model similar consumers. This is explored
in diverse sectors like finance [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], health [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], telecommunications
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] etc. and through focus on diferent behavioral aspects [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The current work adopts a latent parametric mixture model
approach to construct segments of homogeneous consumers called
user personas for VoD services from raw transactional logs, using
a tenure timeline and temporal behavioral features. Examples of
such services in the VoD space include itune, googleplay, vudu,
fandangoNOW, etc; where users pay per piece of content they watch.
This is in contrast with subscription based services, where users
pay a monthly subscription, such as netflix, hulu plus, amazon
video etc. The work provides explicit user characterizations based
on spending behavior, content preference and transactional
habits with the main contributions as presented below:
• Align user transaction timelines on a tenure basis at a
monthly granularity, a novel choice for a timeline of
comparison, in place of the conventional calendar timeline
• Construct temporal behavioral feature vectors from
transaction logs, that are aggregates of transactions over a month
along tenure timeline; such features represent the evolving
behavioral consumer traits
• Capture both dominant and niche segments of population
and provide highly interpretable user labels.
• Capture stable latent structure on a population level, even as
individual profiles keep transforming with age. The derived
user personas maintain a consistent clustering over time
while accurately explaining the changes on an individual
level.
• Represent insights on inter-relations between behavioral
characteristics as layers within user profiles.
      </p>
      <p>
        Such a construction of temporally evolving personas with new
insights into behavioral characteristics is the first of its kind in
the streaming space, to the best of our knowledge. The extended
detailed version of this work is available in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>
        A line of prior works [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] has explored
characterization of consumers; another independent set of works has
contributed to methods on personalized recommendations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ],
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Our work concludes with a unification
of these two important goals to demonstrate the utility of user
personas. In particular, we illustrate an application of persona based
features in CTR (Click through Rate) predictions. We show that a
model based on the constructed personas achieves a 3 criteria
relevance - scalability - interpretability tradeof, when compared
against models that do not include persona information. We show
a substantive gain in computational cost through the use of lower
dimensional persona features in the form of soft or hard clustering
information. This gain occurs with retaining clarity in the
interpretation of feature space (as opposed to random projections onto
lower dimensional spaces) and does not compromise with
predictive ability. The CTR model we describe is interesting in its own
right as we use persona features in a logistic model trained per item
to capture item specific variability. The use of persona information
can also aid in preserving anonymity of individual users as well as
of individual transactions. We supplement the CTR model with a
discussion on other commonly used collaborative filtering models
that can potentially achieve a similar trade-of.
      </p>
      <p>Our methods are by no means limited to the VoD space. They
can be extended to lend similar insights and achieve similar benefits
for other product based services. Modeling latent structure from
raw transactional data can overcome the curse of dimensionality
through an eficient reduction in regression size, while maintaining
predictive power and interpretability of feature space.</p>
    </sec>
    <sec id="sec-3">
      <title>Related works</title>
      <p>
        Consumer segmentation is driven by the intuition that predictive
models of customer behavior based on groups of similar customers
outperform a single aggregate model, see [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. A segmented
predictive model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] can be refined further to an individual level, trained
per customer. In doing so, we gain a reduced bias in creating
increasingly more homogeneous customer groups at the cost of increased
variance in estimation as we consider progressively more refined
segments containing fewer customers. Thus, there is a classic
biasvariance trade-of which is efectively dealt by integrating customer
segmentation into such predictive models, termed as segmented
models, see [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In this work, we advocate the use of features based
on user personas not only for improvement of predictive power
but, as a meaningful, lower dimensional, summary space that can
be used to achieve scalability in regression models and facilitate
storage for future debugging.
      </p>
      <p>
        Various techniques of segmenting consumers include neural
net models [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], latent probabilistic models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], combinatorial
optimization based grouping models [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We ofer in this work a
multinomial latent mixture model analysis with both soft and hard
clustering values as outputs, employing the classic
ExpectationMaximization (EM) algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to estimate the mixing
proportions and distribution parameters for building user personas. Most
part of the raw data-logs consists of count features for which a
multinomial model seems a natural choice; except for the spending
amounts which we choose to implement the K-means clustering
which gives similar results as as the more commonly used
parametric Gaussian mixture model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In comparison to prior art, our
goal here goes beyond discovering latent representations. That is,
we want labels that can directly render business insights as opposed
to non-interpretable clusters.
      </p>
      <p>
        One of the key features of our personas is that they exhibit
stability on a population level even as migrations on an individual
level are constantly taking place along the chosen time granularity.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] explores clusters not shifting dramatically from one time-step
to the next and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] establishes equilibrium of average network
properties, a concept resonating with the stability of clusters.
      </p>
    </sec>
    <sec id="sec-4">
      <title>VoD Dataset</title>
      <p>
        The dataset considered in this work consists of transaction logs of
a subset of 730, 000 anonymous users from a large-scale streaming
VoD service across a time span of 16 months from January 2014
to April 2015, with over 2 million transactions. Each record in the
transaction logs consists of a unique user-id, a unique time-stamp,
a unique content-id, the type of transaction–rentals/ purchases, a
net price giving the cost of each transaction, and content meta-data
such as genres, release year, MPAA ratings corresponding to each
transaction. [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] analyzes a processed user-interactional part of
this data set, consisting of 3488 users and 26404 viewing sessions,
to model binge watching behavior for VoD services; we consider a
larger set of users in our analysis and focus on the transactional
data instead.
      </p>
      <p>We present summary statistics based on the transactional data;
these preliminary statistics and observations lead to the belief that
there is a latent structure in the users consumption patterns and
guide the pre-processing stage to construct features from raw
transaction logs. Note that the characterizations of user behavior
discovered as latent structure from raw logs in this work can be viewed
as more precise and refined summaries. The transactions break up
into two types- rentals and purchases with 88% rentals and 12%
purchases. The price categories of rentals vary from 0 − 5$ with
higher price categories falling in the 3 − 5$ range. The purchases
range as high as 25$, mainly for new movies and tv series. The
purchases greater than 10$ in value are considered as higher end
transactions. Most transactions occur in the lower price categories
of both types of transactions with only 10% of consumers
transacting in the higher price ranges. A transactional perspective of the
content catalogue is observed through segregation of transactions
into 15% TV shows and 85% movies, with the movie Frozen being
the most consumed content in the catalogue. The dominant genres
in the transactions are Drama (18%), Comedy (10%), Action (10%),
Family (9%), Animation (7%), Thriller (6%), Biography (5%), Sci-fi
(4%), Crime (4%) etc, with the crucial observation that while some
users (20%) tend to prefer more family-friendly content (Family,
Animation, Super-hero). Other segments (80%) consume genres
such as drama, horror, comedy etc. The time of transactions is
seen mostly to range between evenings and nights, evenly split
between weekdays (Monday-Friday) and weekends. As part of the
pre-processing of raw logs, barely active users (spend less than 1
dollar in a certain month of activity) and one-time deal hunters
(transact only once and never return) were filtered out to prevent
cluster centers being pulled to 0. Summarized information is
subsequently uncovered from the data as cluster centers and cluster sizes,
which preserves anonymity of individual users while not giving
information on any particular transaction.</p>
    </sec>
    <sec id="sec-5">
      <title>CONSTRUCTION OF PERSONAS</title>
      <p>We construct personas on spending traits, content preferences and
transactional habits of users, with interest in above
characterizations stemming from domain knowledge, product intuition, and
business goals. We discuss the timeline, granularity of comparison,
and behavioral features that are aggregated over 1-month
windows of transactions; these play a consequential role in excavating
meaningful latent structure in raw data.</p>
    </sec>
    <sec id="sec-6">
      <title>Timeline of comparison</title>
      <p>Transaction logs consist of time-series data. We make a careful
choice as to how the timelines of diferent users are compared with
regard to the following 2 aspects:
Temporal alignment of user timelines: User timelines can be
aligned on a calendar basis or on a tenure basis. In the calendar
basis, transactions of diferent users happening at the same
calendar dates, for instance in January 2014, are compared against each
other. Aligning timelines according to a calendar basis allows to
detect seasonalities(holidays, end-of-year movie releases), and efects
of specific events happening at a particular date (TV-show new
episode/season release or end). On the other hand, in the tenure
basis, the first transaction of a given user defines the birth of the user
timeline, and transactions of diferent users are compared when
they happen at the same age of the user in the system. For instance,
if user A made his first transaction on January, 15th 2014 and user B
made his first transaction on April, 10th 2014, comparisons would
be drawn for their first month of transactions between Jan. 15th-Feb
14th 2014 for user A and April 10th-May 9th for user B. Aligning
timelines on a tenure basis allows observations on how users age
in the system and helps in understanding behavioral phases and in
predicting churn.</p>
      <p>Temporal granularity: Timestamps in transaction logs can be
specified up to seconds or even milliseconds. When building
features based on time-series, the question arises as to the granularity
at which events should be grouped to devise the desired features.
Transactions can be aggregated at a monthly/ weekly/ daily/hourly
granularity. For instance, to compute a count feature at the monthly
granularity, transactions happening within the same 30 day period
will be aggregated. The granularity level afects the detection of
behavioral patterns and cycles.</p>
      <p>In this work, user timelines are aligned on a tenure basis, and
events are considered at a monthly (30 days) granularity. The
ifrst transaction of a user marks the beginning of its timeline, and
user’s transaction history is divided into successive periods of 30
days each. Our choice of a monthly granularity is guided by
elementary analysis of the transaction logs which show unstable
structures with weekly granularity– Weekly logs were too short
a period to capture behavioral patterns–, and a flat structure at a
quarterly granularity–Quarterly logs were too long to capture the
dynamism in user labels due to an over-cumulation efect of data .
Our choice of a tenure basis was motivated by the business need
to understand the evolution and phases of user behavior along
their transaction histories; this helps model their dynamic behavior,
predict loss of interest in system, predict lifetimes etc.</p>
    </sec>
    <sec id="sec-7">
      <title>Aggregate feature space</title>
      <p>The features used in construction of personas are aggregates of
transactions at monthly granularity, binned into categories. The
choices of binning, arising from a combination of summary
knowledge of data and domain information, lead to the below features.
Monthly Expenditure (ME) characterizes spending behavior: Each
feature is the total net amount spent in one month by a user in
either a rental/purchase transaction type and a given price category
(5 categories for rentals, 8 for purchases).</p>
      <p>Transaction frequency (TF) characterizes economic behavior:
Features are transaction counts binned into 2 price categories in
rentals and 4 categories in purchases.</p>
      <p>Dominant genres (DG) indicates content preference: Features are
monthly counts of transactions in 15 most popular genres: Drama,
Comedy, Action, Family, Animation, Thriller, Biography, Sci-Fi, Crime,
Super Hero, Comedy-Drama, Fantasy, Horror, Romance, Kids,
Miscellaneous.</p>
      <p>Content recency (CR) indicates freshness preference: Features
are counts binned into ranges of content release year: Old: &lt; 1990,
Nostalgia: 1990 − 2000, Not New: 2000 − 2010, Recent:2010 − 2013
and Latest: 2014 − 2015.</p>
      <p>Time &amp; day of transaction (TDT) gives transacting habits:
Timestamps of transactions are processed to generate the day of week and
time of transaction as per the geographic region of the user, then
counts are binned into weekdays or weekends and 4 time slots: 10
AM-5 PM (Ofice Hours), 5PM-10 PM (evening and night), 10PM-5AM
(late night).</p>
    </sec>
    <sec id="sec-8">
      <title>A mixture model for latent characteristics</title>
      <p>
        To fix notations for this section, we have a n × d feature matrix
XT = (x1, x2, · · · , xn ), with xi ∈ Rd representing the feature vector
of user i in a sample of n users, and d representing the dimension of
the feature space. We propose a parametric approach, a mixed
multinomial model MMM [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ],[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], to describe user labels based
on count data. The choice of a multinomial distribution is a natural
model for count feature vectors. The iterative EM algorithm applied
to estimate the mixing proportions and the parameters in mixed
multinomial distribution, is in itself a very powerful mechanism,
with one of its many merits being the ability to deal with missing
features. An MMM assumes that rows of X are independent draws
from a multinomial model, that is xi ∼ M N (d, θZi ), where Zi is
a latent variable from the categorical distribution taking values
j ∈ [K ], where K is the number of clusters; independent of Xi . We
have a hierarchically structured model as
• Zi iid M N (1, π ) with π = (π1, · · · , πK ) representing mixing
∼
probabilities for the K clusters;
• Xi |(Zi in=d j) ∼ M N (d, θj ), where for j ∈ [K ] and the vector
θj = (θj,1, · · · , θj,d ) represents parameters in the
multinomial density given latent factor Z = j.
      </p>
      <p>
        The mixing probabilities π and the parameters of the mixture
model θj , j ∈ [K ] are estimated using an EM algorithm as proposed
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We outline the E and M steps for the (t )-th iteration of the
algorithm for the MMM based on iterates π (t ) and θ (t )–
E-step: computes the posterior probabilities given estimates of
parameters π and θZj of the t -th iteration, that is
τi(,tz) = P(Zi = z |xi ; π (t ), θ (t ))
= P(X = xi |Zi = z; θ (t ))πz(t )/Íkj=1 P(X = xi |Zi = j; θ (t ))πj(t ).
where P(X = xi |Zi = j; θ (t )) ∝ Πvd=1θjx,iv,v .
      </p>
      <p>M-step: maximizes the Expected Complete Log Likelihood (ECLL)
to refine estimates of parameters π and θj for j ∈ [K ]; θ (t +1), π (t +1) =
τi(,tj) × log(πj · P(X = xi |Zi = j; θ ))
xi,v τi(,tj) log θj,v + constant
arg max E(L(θ (t ), π (t ); X )), where ECLL is</p>
      <p>E(L(θ , π ; X )) =
= Õn Õk
i=1 j=1
τi(,tj) log πj +
Õn Õk
i=1 j=1
Õn Õk Õd
i=1 j=1 v=1
(where constant does not depend on π , θ ) yielding estimates
n i=1
π (t +1) = 1 Õn
j</p>
      <p>n
τi(,tj), θ (t +1) = Õ
j,v
i=1
xi,v τi(,tj)/d
n
Õ
i=1
τi(,tj).</p>
      <p>Hard cluster assignments are obtained by calculating
Sxi = arg max P(Zj = z |xi ; π , θ ),</p>
      <p>z
with Sxi ∈ [K ] for i ∈ [n].</p>
    </sec>
    <sec id="sec-9">
      <title>PERSONAS AND INSIGHTS</title>
      <p>Having described our methods of constructing personas, we present
the summaries of personas based on preferential and behavioral
patterns. The significant highlights of these persona labels are clear
characterizations of users in each persona label. We supplement
the persona labels with interesting insights that can lead to future
business actions to understand evolving patterns of both dominant
and niche behavioral traits.</p>
    </sec>
    <sec id="sec-10">
      <title>User persona labels for behavioral characterizations</title>
      <p>We give interpretable persona labels based on the latent structure
excavated from the aggregate features described above. Below, we
list the labels with the cluster sizes reported in percentages (beside
the label) and for each label, we give a brief explanation of user
behavior in that bucket.</p>
      <p>Monthly Expenditure: Cluster centers represent monthly expense
in each of the 13 price categories (5 rental and 8 purchase price
categories). The user persona labels are
• Economic Renters (71%) : 10$ spent in a month of activity,
including smaller 2 − 3$ amounts in higher renting price
categories.
• Heavy Renters (21%) : 17$ in total, including 13$ spent in
the 3 − 5 rental price category.
• Movie Buyers (4.5%): 32$ in total with one purchase on
average in the 16 − 20 price category and 1/4-th of monthly
expenses in higher-priced rentals and lower-priced purchases.
• Movie Bufs (2.5%): 60$ in total, with around 3 purchases in
10 − 16 price category and around 7 dollars in 16 − 20 price
category.</p>
      <p>Frequency of Transaction: Cluster centers denote transaction counts
in 6 price ranges, the persona labels uncovered are
• Frequent High-End Renters (61%): over 85% transactions
in rentals above 3$.
• Frequent Low-End Renters (21%): over 60% and 30%
transactions in rentals below and above 3$ respectively.
• Frequent Movie Buyers &amp; Sporadic Renters (12%): 45%
transactions in purchases in 8 − 16$ price category and 35%
transactions in rentals as well.
• Frequent Low End Purchasers (6%): 80% transactions mostly
in the 0 − 8$ purchase price category.</p>
      <p>Dominant Genre of Content Consumed: The three prime clusters
recovered with cluster centers being percentage of monthly
transactions in 16 genres are–
• Happy Family (23%): content qualifying as family watch
with distribution being family genre (28%)–the most
consumed genre, followed by animation (20%), comedy (13%);
but no or almost no crime, horror, romance, thriller.
• Drama-Comedy: (40%) content with dominant genres– drama
(28%), followed by biography (10%), comedy (10%), bit of
romance but little or almost nothing as compared to other
clusters in terms of consuming family, horror, action, crime.
• Action-Horror-Thrill: (37%) dominant genre is action (20%),
followed by drama (15%), thriller (12%), sci-fi ( 8%), comedy
(6%), horror (5%), but little or almost nothing as compared to
other clusters in terms of consuming family, comedy-drama,
fantasy content.</p>
      <p>Recency of Content consumed: We obtain 3 genre clusters based on
the count matrices binned as per release year of content to observe
characterizations for recency of content.</p>
      <p>• Latest (40%): 85% transactions with release year 2014-15.
• Recent (30%): 85% transactions with release year 2010-13.
• Nostalgic (30%): About 30% with release in 2000-09 followed
by recent and latest content in the remaining 65% of
transactions.</p>
      <p>Time &amp; Day of Transaction: Based on habits or preferences to
transact at a certain times and days of the week, the clusters with centers
representing counts in each time category of weekday/ weekend
are
• Weekend Evening &amp; Night (24%): 65% of transactions on
weekend nights, followed by 25% in evening.
• Weekday Evening &amp; Night (24%): 70% of transactions on
weekday nights, followed by 20% in evening.
• Weekend &amp; Weekday Night (42%): 45% and 35% of
transactions on nights of weekdays and weekends.
• Weekend Day &amp; Night: (10%): 25% of transactions on
weekend day time and 60% in weekend nights.</p>
    </sec>
    <sec id="sec-11">
      <title>Insights into user persona labels</title>
      <p>Temporal nature of labels: stability of macro
characteristicsA highlight of the derived user personas is that the uncovered
clusters stay stable in terms of size and composition on a population
level. This attractive property of consistency allows us to use these
clusters to model the temporal evolution of tenure timelines at a
population level consistently. At the same time, the personas also
succeed in explaining individual dynamism. That is, migrations do
happen on a user level and individual user labels are not static. Our
results show that these migrations between categories are never
drastic in nature, but rather migrations between neighboring
clusters, although we observe a few interesting migrations into far-of
labels. These migrations can be explained as dominant
characterizations reflecting spending capacity, content preferences and habits
staying stable over time while niche characterizations being more
prone to change. As specific examples, we see the dominant
segment of users transacting in lower-priced categories staying stable
in their respective labels over time. However, the niche segment
of higher end purchasers keep migrating to lower end categories
and migrate back to the niche labels with only availability of new
products of their interest. Another niche segment is a proportion of
people who buy content in the happy family label; over their tenure,
they move to other labels of genre consumption to buy content for
individual consumption that is diferent from content consumed
in the family context. On the contrary, the other two labels within
genre preference together represent the dominant population and
show stability along tenure timelines.</p>
      <p>Natural hierarchical structure of clusters: We observe that the
user personas exhibit a natural, divisive, hierarchical structure (not
imposed through algorithm), as we increase the number of
clusters. This lends interesting interpretations on the sub-population
of users within broad segments. An example of this is upon
clustering users based on monthly expenditure into two clusters, cluster
centers represent renters and purchasers, the two main segments
of users. When increasing the number of clusters, renters break up
into economic and heavy renters with 3 clusters, while purchasers
mostly decompose into two niche clusters, movie buyers and movie
bufs with 4 buckets.</p>
      <p>Upon clustering count data representing dominant genres
consumed by users into two clusters, we see a segment preferring
family content over a segment that consumes content not
qualifying as family watch. With three clusters, the non-family content
consumers decompose into two buckets- one that consumes drama,
comedy etc while other prefers thrill inducing content.
Layered structure of clusters: We explore the inter-relations
between the various user persona characterizations by performing a
layered clustering using the mixture model technique. An example
is the assignment of labels for a characterization such as genre
preference within clusters for spending behavior. For instance, we
observe that clusters based on genre preference derived within the
clusters characterizing economic behavior are similar across all
economic clusters. Similarly, the clusters for spending behavior
are similar across diferent genres. This observation statistically
validates that genre preferences of consumers are independent of
their economic budget. A similar observation goes for recency and
economic behavior. On the other hand, we see diferent clustering
results for recency of content when clustered within the genre
clusters, with the category preferring family content showing more
inclination towards more classic content than the other
dramabased or thrill inducing categories that prefer more recent content.</p>
    </sec>
    <sec id="sec-12">
      <title>INTEGRATION OF PERSONAS IN</title>
    </sec>
    <sec id="sec-13">
      <title>PERSONALIZATION</title>
      <p>We demonstrate the usefulness of user personas through an
application to response prediction, by efectively integrating the
information from personas into personalization. Specifically, we focus on a
CTR predictive model where the goal is to predict pu,i , the
probability that user u transacts on item i. The scope of utilizing persona
information extends to other popular models in collaborative
filtering. We conclude the paper by discussing such possibilities, where
one can integrate personas into other commonly used models and
expect to attain a relevance-scalability-interpretability tradeof.</p>
    </sec>
    <sec id="sec-14">
      <title>CTR: relevance-scalability-interpretability balance</title>
      <p>
        We model the CTR problem to predict transactional probabilities
through an ℓ1 penalized logistic regression model that is trained
per item. Such a fine-grained model at the item level captures the
item specific interest in users, leading to more accurate predictions
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. The challenge in such models, however, is the sparsity of the
transactional data, with about 1% users transacting on any given
item. To overcome this imbalance and avoid bias towards the
outcome of not transacting at all, for every positive sample (users who
transacted), we sample 5 negative samples (users who did not
transact). The gain with summarized information from personas can be
described as a balance between scalability of the training model,
interpretability of feature space and relevance of predictions:
Relevance-deliver relevant recommendations to users, quantified
by the quality of prediction in transactional probabilities. To fix
notations, we denote the evaluation metric to assess the
performance of the predictive model as F on a test set. With the training
model M∗ giving predicted labels labelM ∗ , the predictive ability is
given by F (labelM ∗ , labeltest). F here, is the mean AUC over the
100 most popular items in the content catalogue.
      </p>
      <p>Scalability-reduce the size of input feature and sample space (leads
to reduction in regression size) by using lower dimensional persona
features. Information from personas can be encoded as soft
clustering features or incorporated as hard clusters via a model trained per
cluster. This brings significant reduction in regression dimensions
which in turn, facilitates storage and future use of these feature
vectors in the same or other predictive models.</p>
      <p>Interpretability-retain the intuitive meaning of the feature space
as opposed to random lower dimensional projections which seldom
lend business insights. With a meaningful feature set, we can
reutilize the same features in a host of predictive tasks and use them in
easy debugging of models. While relevance and scalability can be
quantified, there is no measure of interpretability.</p>
      <p>The trade-of in the above criteria arises as we can use a baseline
model with the count features that were used to recover latent user
labels as regressors. However, there is a significant computational
cost associated with a higher regression size of the baseline based
on these aggregate features, without using any knowledge of
personas. We see a clear reduction in regression size and the associated
complexity with integration of persona information at the cost of
losing only a mere 2% predictive ability in Figure 2. Scalability of
regression size with comparable predictive power as the baseline
model alongside retaining clear meaning of feature space is the
trade-of achieved in CTR prediction with persona information.
The take away is that persona features can be used to construct
interpretable, lower dimensional regressors that preserve predictive
power. An added advantage of incorporating these summary
features in a model with sub-sampled users is preservation of privacy
of individual users and also, of individual transactions in using
summaries over a random set of users.</p>
      <p>To describe our model and results, we use Xu to denote the
feature vector corresponding to user u. This feature vector can be
based on 3 characterizations: ME (monthly expenses), DG
(dominant genre), CR (content recency). Information from personas can
be incorporated into Xu in diferent ways, yielding diferent models.</p>
      <p>In particular, we construct feature vectors using the personas on
ME, DG and CR in the following forms - denoted by (c), (s), (h)
and (-) respectively. (c) is used in the baseline model with count
features based on a particular characterization, (s) and (h) integrate
soft and hard clustering information based on characterizations. (-)
uses neither count nor persona information, we call this the null
model. These are summarized below:</p>
      <p>(c) a feature vector with distribution of ME in price categories
and/ or count vectors for DG/CR in feature bins (directly using
the constructed features). It uses aggregate count features, but no
additional knowledge from latent personas.</p>
      <p>(s) a feature vector of soft clustering values in the form of
distances of count features from their respective cluster centers.</p>
      <p>(h) incorporates hard clustering information for a
characterization by training a model cluster-wise.</p>
      <p>(-) does not include any information from a characterization at
all.</p>
      <p>Below, we describe the diferent CTR models and discuss results
on the three criteria trade-of.</p>
      <p>We achieve a gain in relevance with information from each added
characterization, either in the form of soft clustering/ hard
clustering/ count feature. Figure 1 highlights the relevance of each
characterization in the CTR model. CR (recency) is seen to the most
informative characterization adding the most to AUC.</p>
      <p>Denote ni as the samples per item and ni,c as samples per item,
per cluster, p the number of predictive features, O as the complexity
of regularized logistic with sample size and regression dimension.
Table 1 below compares diferent models illustrating how our
proposed integration of user personas into personalized
recommendation achieves a tradeof between relevance and scalability. We note
that interpretability comes alongside using summary information
from personas. The baseline model is depicted in the first row of the
table; representing the model with all count features (c). We see a
significant reduction in the predictive power when we do not
incorporate any information from the recency feature, this is depicted
by the fourth row of the table. When we train a model per
recencycluster using (h), we lose 1% of predictive power, but reduce the
sample size for the training model on each cluster as well as the
feature space leading to an overall reduction in complexity. We see
a similar predictive power when we use soft clustering recency
feature (s), but a significant reduction in the size of feature space.</p>
      <p>While we do not incorporate all 64 combinations of (c), (s), (h), (-),
we see that using soft clustering features for all the 3
characterizations leads to a loss of only 2% AUC. This is represented in the
last row of below table. The computational gain, however, is seen
to be significant even in a simple regression model that scales in
complexity as p2 with the size of feature space. Figure 2 shows this
as ni and ni,c = [n/3] varies per item. Interpretability is inherent
in these models due to the clear meaning of soft clustering features
that represent distances from cluster centers or hard-coded cluster
memberships in training models based on similar users.
We finally discuss few models based on popular collaborative
filtering techniques that can incorporate information from personas
to retain predictive power while gaining in scalability for practical
implementations.</p>
      <p>User based nearest neighbor similarity: This approach is based
on a similarity metric sim(u, v) (examples include Jaccard, cosine
etc.) to predict a weighted average rating based on similarity
between users who transacted on the same items. Denoting by U (i)
the set of users who transacted on the same item i, the amount of
money ru,i that user u is willing to spend on item i can be predicted
as
ru,i =
sim(u, v)rv,i /</p>
      <p>sim(u, v),
Õ
v ∈U (i)</p>
      <p>Õ
v ∈U (i)
and the probability that user u transacts on item i as
pu,i = Õ sim(u, v)/Õ sim(u, v).</p>
      <p>v
v ∈U (i)
Similarity approaches have scaling issues with high computational
cost associated with searching through set of users or even the
top K similar users in the set U (i). Persona information can bring
in gain in prediction accuracy, also ofering better scalability via
limiting search of top K neighbors to already formed personas.</p>
      <p>We could use clusters from most representative time point of
activity for predictions. Alternately, we can use temporal persona
information for prediction with the scope of leveraging diferently
on time points through a weighted similarity prediction along
the tenure timeline. Denoting time points of transaction history
(months of tenure timeline) as t with weights wt (that can be tuned)
and features-ut for user u, U (t , i) as the set of users who transacted
on the same item i and C(t , u) the set of users present in the same
cluster as user u at time t , ratings at a time point T leveraging on
temporal history till time T can be predicted as</p>
      <p>Table 2: Ratings in CF: Clustering buckets C(u)
ru,i (T ) =
pu,i (T ) =
Ít ≤T Ív ∈U (t,i)∩C(t,i) wt sim(ut , vt )rv,i
ÍÍt≤t T≤TÍÍv v∈U∈ U(t(,ti,)i∩)C∩C(t(,ti,)iw)wtstismi m(u(tu,tv, tv)tr)v,i</p>
      <p>Ít ≤T Ív ∈C(t,i) wt sim(ut , vt )</p>
      <p>
        Latent factor model: Without clustering information, the vanilla
model with latent factors qi for item i and pu for user u is rˆu,i =
µ +bi +bu +qTi pu , solved either through stochastic gradient descent
or alternating least squares [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Letting A to be a set of attributes
and a a cluster for A, user persona information can be incorporated
into the above model by
(1) adjusting for biases per cluster.
(2) enhancing user representation in the form of latent factors
for cluster memberships learnt with ya ∈ A–a latent factor
for each cluster a in set of characterizations [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
(3) hard wiring clustering information as features in the form
of an enhanced user feature with a latent component pu
concatenated with known added features p˜u .
(4) training latent factor model per cluster with ca being clusters
corresponding to some attribute a; Iu ∈ca equals 1 if user u
is in cluster ca , 0 otherwise.
      </p>
      <p>Table 3 below describes the enhanced rating models for each case
described above.</p>
      <p>Table 3: Ratings in CF: Adding Persona Clustering to Vanilla
1. rˆu,i = µ + bi + bu + Ía ∈A(u) ba + qTi pu
2. rˆu,i = µ + bi + bu + qTi (pu + Ía ∈A(u) ya )
3. rˆu,i = µ + bi + bu + q˜Ti (pu : p˜u )
4. rˆuca,i = µ ca + bi + Iu ∈ca bca + Iu ∈ca qTi pca</p>
    </sec>
    <sec id="sec-15">
      <title>CONCLUDING REMARKS</title>
      <p>
        This work ofers temporally evolving personas that lend new
perspectives and actionable insights into behavioral patterns of VoD
users as they age in the system. As highlighted, our personas do
possess the cluster stability on a macro level, while being able to
represent dynamic niche characterizations at the same time. Our
mixture approach together with the choices of granularity and
timeline of comparison and the engineered features give rise to a
consistent and robust latent model. That is insights derived from
a study of user personas at any time point are also likely to apply
to future clusters and models built using these clusters.
Information from eficiently built personas can achieve a much practical
and vital relevance-scalability-interpretability tradeof in
recommendations, highlighted in the work with predictive models that are
trained and tested on VoD data. An untapped area of application
is churn analysis, see [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], aiming to improve user retention and
interest. One can create user buckets based on longevity in system
or use existing personas to predict when users slip into a state of
inactivity in the system. A potential future direction also includes a
possible tradeof between privacy and predictive power in models
based on persona features. Finally, the methods, guarantees and
perspectives from this work can be extended to other domains of
personalization and can be realized in a host of other predictive
tasks.
      </p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was performed while all three authors were
withTechnicolor Research, CA, USA.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Manish Mehta, John C Shafer, Ramakrishnan Srikant, Andreas Arning, and
          <string-name>
            <given-names>Toni</given-names>
            <surname>Bollinger</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>The Quest Data Mining System.</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>KDD</given-names>
          </string-name>
          , Vol.
          <volume>96</volume>
          .
          <fpage>244</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Greg</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Allenby and Peter E Rossi</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Marketing models of consumer heterogeneity</article-title>
          .
          <source>Journal of econometrics 89</source>
          ,
          <issue>1</issue>
          (
          <year>1998</year>
          ),
          <fpage>57</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Asim</given-names>
            <surname>Ansari</surname>
          </string-name>
          , Skander Essegaier, and
          <string-name>
            <given-names>Rajeev</given-names>
            <surname>Kohli</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Internet recommendation systems</article-title>
          .
          <source>Journal of Marketing research 37</source>
          ,
          <issue>3</issue>
          (
          <year>2000</year>
          ),
          <fpage>363</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Chidanand</given-names>
            <surname>Apte</surname>
          </string-name>
          , Bing Liu,
          <source>Edwin PD Pednault, and Padhraic Smyth</source>
          .
          <year>2002</year>
          .
          <article-title>Business applications of data mining</article-title>
          .
          <source>Commun. ACM 45</source>
          ,
          <issue>8</issue>
          (
          <year>2002</year>
          ),
          <fpage>49</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Judy</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Customer segmentation in the telecommunications industry</article-title>
          .
          <source>Journal of Database Marketing &amp; Customer Strategy Management</source>
          <volume>17</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>4</lpage>
          (
          <year>2010</year>
          ),
          <fpage>247</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>David</given-names>
            <surname>Besanko</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Pierre Dubé</surname>
            , and
            <given-names>Sachin</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Competitive price discrimination strategies in a vertical channel using aggregate retail data</article-title>
          .
          <source>Management Science</source>
          <volume>49</volume>
          ,
          <issue>9</issue>
          (
          <year>2003</year>
          ),
          <fpage>1121</fpage>
          -
          <lpage>1138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Amit</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sanjoy</given-names>
            <surname>Ghose</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>A latent class segmentation analysis of e-shoppers</article-title>
          .
          <source>Journal of Business Research</source>
          <volume>57</volume>
          ,
          <issue>7</issue>
          (
          <year>2004</year>
          ),
          <fpage>758</fpage>
          -
          <lpage>767</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Derrick</surname>
            <given-names>S Boone</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Michelle</given-names>
            <surname>Roehm</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Retail segmentation using artificial neural networks</article-title>
          .
          <source>International journal of research in marketing 19</source>
          ,
          <issue>3</issue>
          (
          <year>2002</year>
          ),
          <fpage>287</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Deepayan</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          , Ravi Kumar, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Tomkins</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Evolutionary clustering</article-title>
          .
          <source>In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>554</volume>
          -
          <fpage>560</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mu-Chen</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ai-Lun Chiu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hsu-Hwa Chang</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Mining changes in customer behavior in retail marketing</article-title>
          .
          <source>Expert Systems with Applications 28</source>
          ,
          <issue>4</issue>
          (
          <year>2005</year>
          ),
          <fpage>773</fpage>
          -
          <lpage>781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Arthur</surname>
            <given-names>P Dempster</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nan M Laird</surname>
          </string-name>
          , and Donald B Rubin.
          <year>1977</year>
          .
          <article-title>Maximum likelihood from incomplete data via the EM algorithm</article-title>
          .
          <source>Journal of the royal statistical society. Series B (methodological)</source>
          (
          <year>1977</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>José</surname>
            <given-names>G Dias</given-names>
          </string-name>
          and
          <article-title>Jeroen</article-title>
          K Vermunt.
          <year>2007</year>
          .
          <article-title>Latent class modeling of website users? search patterns: Implications for online market segmentation</article-title>
          .
          <source>Journal of Retailing and Consumer Services</source>
          <volume>14</volume>
          ,
          <issue>6</issue>
          (
          <year>2007</year>
          ),
          <fpage>359</fpage>
          -
          <lpage>368</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>David</surname>
            <given-names>B</given-names>
          </string-name>
          <string-name>
            <surname>Dunson</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Bayesian latent variable models for clustered mixed outcomes</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62</source>
          ,
          <issue>2</issue>
          (
          <year>2000</year>
          ),
          <fpage>355</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Michael</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
          </string-name>
          , John T Riedl, and Joseph A Konstan.
          <year>2011</year>
          .
          <article-title>Collaborative ifltering recommender systems</article-title>
          .
          <source>Foundations and Trends in Human-Computer Interaction 4</source>
          ,
          <issue>2</issue>
          (
          <year>2011</year>
          ),
          <fpage>81</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Nir</given-names>
            <surname>Friedman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stuart</given-names>
            <surname>Russell</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Image segmentation in video sequences: A probabilistic approach</article-title>
          .
          <source>In Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence</source>
          . Morgan Kaufmann Publishers Inc.,
          <fpage>175</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Johanna</surname>
            <given-names>Gummerus</given-names>
          </string-name>
          , Veronica Liljander, Minna Pura, and Allard Van Riel.
          <year>2004</year>
          .
          <article-title>Customer loyalty to content-based web sites: the case of an online health-care service</article-title>
          .
          <source>Journal of services Marketing</source>
          <volume>18</volume>
          ,
          <issue>3</issue>
          (
          <year>2004</year>
          ),
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Donald</given-names>
            <surname>Hedeker</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A mixed-efects multinomial logistic regression model</article-title>
          .
          <source>Statistics in medicine 22</source>
          ,
          <issue>9</issue>
          (
          <year>2003</year>
          ),
          <fpage>1433</fpage>
          -
          <lpage>1446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Yifan</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Yehuda Koren, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          .
          <source>In 2008 Eighth IEEE International Conference on Data Mining. Ieee</source>
          ,
          <volume>263</volume>
          -
          <fpage>272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Tianyi</given-names>
            <surname>Jiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Segmenting customers from population to individuals: Does 1-to-1 keep your customers forever</article-title>
          ?
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>18</volume>
          ,
          <issue>10</issue>
          (
          <year>2006</year>
          ),
          <fpage>1297</fpage>
          -
          <lpage>1311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Tianyi</given-names>
            <surname>Jiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Improving personalization solutions through optimal segmentation of customer bases</article-title>
          .
          <source>IEEE transactions on knowledge and data engineering 21</source>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>305</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Komal</surname>
            <given-names>Kapoor</given-names>
          </string-name>
          , Mingxuan Sun, Jaideep Srivastava, and
          <string-name>
            <given-names>Tao</given-names>
            <surname>Ye</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A hazard based approach to user return time prediction</article-title>
          .
          <source>In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>1719</volume>
          -
          <fpage>1728</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Yehuda</surname>
            <given-names>Koren</given-names>
          </string-name>
          , Robert Bell,
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          , et al.
          <year>2009</year>
          .
          <article-title>Matrix factorization techniques for recommender systems</article-title>
          .
          <source>Computer 42</source>
          ,
          <issue>8</issue>
          (
          <year>2009</year>
          ),
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Gueorgi</given-names>
            <surname>Kossinets and Duncan J Watts</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Empirical analysis of an evolving social network</article-title>
          .
          <source>science 311</source>
          ,
          <issue>5757</issue>
          (
          <year>2006</year>
          ),
          <fpage>88</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Wei</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xuemei Wu</surname>
          </string-name>
          , Yayun Sun,
          <string-name>
            <given-names>and Quanju</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Credit card customer segmentation and target marketing based on data mining</article-title>
          .
          <source>In Computational Intelligence and Security (CIS)</source>
          ,
          <source>2010 International Conference on. IEEE</source>
          ,
          <fpage>73</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Snigdha</surname>
            <given-names>Panigrahi</given-names>
          </string-name>
          , Nadia Fawaz, and
          <string-name>
            <given-names>Ajith</given-names>
            <surname>Pudhiyaveetil</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Temporal Evolution of Behavioral User Personas via Latent Variable Mixture Models</article-title>
          . https://arxiv.org/abs/1704.07554. arXiv preprint arXiv:
          <volume>1704</volume>
          .07554 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Michael</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Pazzani</surname>
            and
            <given-names>Daniel</given-names>
          </string-name>
          <string-name>
            <surname>Billsus</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Content-based recommendation systems</article-title>
          .
          <source>In The adaptive web</source>
          . Springer,
          <fpage>325</fpage>
          -
          <lpage>341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Badrul</surname>
            <given-names>Sarwar</given-names>
          </string-name>
          , George Karypis, Joseph Konstan,
          <string-name>
            <given-names>and John</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Item-based collaborative filtering recommendation algorithms</article-title>
          .
          <source>In Proceedings of the 10th international conference on World Wide Web. ACM</source>
          ,
          <volume>285</volume>
          -
          <fpage>295</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J</given-names>
            <surname>Ben Schafer</surname>
          </string-name>
          , Dan Frankowski, Jon Herlocker, and
          <string-name>
            <given-names>Shilad</given-names>
            <surname>Sen</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Collaborative ifltering recommender systems</article-title>
          .
          <source>In The adaptive web</source>
          . Springer,
          <fpage>291</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Wendell</surname>
            <given-names>R</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          .
          <year>1956</year>
          .
          <article-title>Product diferentiation and market segmentation as alternative marketing strategies</article-title>
          .
          <source>Journal of marketing 21</source>
          ,
          <issue>1</issue>
          (
          <year>1956</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>William</given-names>
            <surname>Trouleau</surname>
          </string-name>
          , Azin Ashkan, Weicong Ding, and
          <string-name>
            <given-names>Brian</given-names>
            <surname>Eriksson</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Just one more: Modeling binge watching behavior</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <volume>1215</volume>
          -
          <fpage>1224</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Jeroen</surname>
            <given-names>K</given-names>
          </string-name>
          <string-name>
            <surname>Vermunt and Jay Magidson</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Latent class cluster analysis</article-title>
          .
          <source>Applied latent class analysis 11</source>
          (
          <year>2002</year>
          ),
          <fpage>89</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Michel</given-names>
            <surname>Wedel and Wagner A Kamakura</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Market segmentation: Conceptual and methodological foundations</article-title>
          .
          <source>Vol. 8</source>
          . Springer Science &amp; Business Media.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Research on customer segmentation model by clustering</article-title>
          .
          <source>In Proceedings of the 7th international conference on Electronic commerce. ACM</source>
          ,
          <volume>316</volume>
          -
          <fpage>318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>XianXing</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Yitong Zhou, Yiming Ma,
          <string-name>
            <surname>Bee-Chung</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Liang Zhang, and
          <string-name>
            <given-names>Deepak</given-names>
            <surname>Agarwal</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>GLMix: Generalized Linear Mixed Models For LargeScale Response Prediction</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <volume>363</volume>
          -
          <fpage>372</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>