<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How Diverse Is Your Audience? Exploring Consumer Diversity in Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacek Wasilewski</string-name>
          <email>jacek.wasilewski@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Neil Hurley</string-name>
          <email>neil.hurley@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics, University College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>On-line recommender systems have diferent challenges to overcome to provide content to users. One of these is the potential of isolating users from a diverse set of items by recommending very narrow content. In this paper we propose an item-centric view of a recommender system, looking at the exposure of items to groups of consumers, and how diverse those groups are, to identify if items are recommended to narrower groups of consumers. This is opposite to current practice where diversity of content is typically analysed. Preliminary results on the MovieLens 20M dataset show that recommender systems expose items to narrower groups of consumers, and these groups are less diverse.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Recommender systems have become ubiquitous in the interfaces to
product catalogues provided by on-line retailers. From the user’s
perspective, recommender algorithms are used to filter a large set of
possible selections into a much smaller set of items that the user is
likely to be interested in. On the other hand, from the business point
of view, as important as users getting engaging recommendations
is the utilisation of products in the catalogue.</p>
      <p>Sales increase or redistribution across the whole catalogue of
items might not be the only business goal to be addressed by a
recommender system. In some sense, recommender systems are
marketing tools that identify customers and target these customers with
personalised items. Questions arise: are we exposing items to users
that showed an interest before? Are we promoting items to reach
new groups of customers? How diverse are these groups? From
market development perspective, recommender system should help
us in achieving all of these business goals. To measure and control
for this, we need a picture of how items are exposed to diferent
groups of people, and if the exposure is diverse.</p>
      <p>In this paper we tackle the problem of item exposure to
understand who consumes items and if potential consumers are reached
by recommendations. We measure diversity of the people getting
recommendations for an item, using approaches coming from
ecology, such as species diversity of a habitat. This is diferent to the
content diversity of recommendations that has been typically
considered in the context of diversity in recommender systems. The main
goal of this paper is to find the answer to the following question: do</p>
    </sec>
    <sec id="sec-2">
      <title>CONSUMER DIVERSITY</title>
      <p>Recommender systems have to deal with the long tail of items
that are rarely recommended. This includes niche items that are
rarely liked, but also items that have not penetrated the market. To
identify and promote these items, we argue it is not enough to ask
how many users have rated each item in the past, but also which
users have rated the items, which define its item exposure.</p>
      <p>An item’s user profile, Ui , contains the set of users who rated
the item in the past. A diversity measure over these users gives
insight into the extent to which item has been exposed to a wide
range of diferent user types. Similarly, the set of users to whom
the item is recommended, Ri , can be analysed to reveal the extent
to which recommendations extend the exposure of an item. If an
item is recommended to diverse consumers, it is possible that the
item can reach a wider potential market.</p>
      <p>As it is commonplace for marketeers to model their
customerbase through customer segmentation, we find it useful to
measure the diversity in terms of the spread across diferent consumer
segments. Given a partition Pc of U into k consumer segments,
U = C1 ∪ C2 ∪ ... ∪ Ck , where Cj is the jth consumer segment, we
define consumer diversity of a set of consumers S, as functions of
(p1, ..., pk ), where pj = |S ∩Cj | is the proportion of the set S that
|S |
belong to consumer segment Cj .</p>
      <p>
        A similar problem is considered in ecology, where a habitat can
be quantified in terms of species diversity [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], which measures
diversity in terms of the proportionality abundance of each species
in a sample. It assigns a high diversity value when the sample is
evenly spread across the diferent species. In biodiversity, diferent
measures like species richness, Shannon entropy, Simpson
concentration, can be generalised through the Hill number [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or diversity
of order q defined as:
q D ≜
k
Õ pq
      </p>
      <p>
        j
j=1
!1/(1−q)
and 1D = limq→1 q D = exp(H (p)). In biodiversity these are called
true diversities or efective number of species [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. With q = 0 we
obtain richness, q = 1 true diversity of Shannon entropy, and for
q = 2 inverse Simpson index. Entropy increases as both richness
and evenness increase, where Simpson index measures dominance
and is less sensitive to richness. In our context, each consumer
segment corresponds to a “species”. Then, with the help of the true
diversity we can evaluate the diversity of a habitat—that is, an item
in our case. We can use true diversity to compare the exposure
      </p>
      <p>0.40
0
1</p>
      <p>UB
5.11
2.84
2.34</p>
      <p>IB
3.73
2.99
1.95</p>
      <p>MF
6.43
3.53
2.84
of diferent items to the consumer segments or to compare the
exposure of a single item under diferent conditions.
3</p>
      <p>
        ANALYSIS OF CONSUMER DIVERSITY
We investigate consumer diversity on the MovieLens 20M dataset
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For that, a partition into consumer segments is required. We
create behavioural segments based on past interactions. X -means
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] clustering algorithm is used to define segments— k = 15 clusters
have been created based on interactions. Results of such clustering
depends on the initialisation parameters, which is a limitation, but it
still enables comparison of diversity. We analyse recommendations
(of N = 20 items) generated by collaborative filtering algorithms
available in the RankSys framework (http://ranksys.org): user- (UB)
and item-based (IB) kNN, and matrix factorisation (MF).
      </p>
      <p>We wonder if recommender systems might sufer not only from
narrowing content served to users, but also items being exposed
to narrow audiences. To illustrate that, we take a movie (The
Matrix) for which we show distribution of consumers over segments—
Figure 1. It can be seen that one segment (no. 4) is over-represented
almost 4 times in recommendations. We measured its true
diversities: richness, Shannon and Simpson indices. Richness decreased
from 15 to 13 which means 2 segments are not reached, Shannon
and Simpson indices also dropped, respectively, from to 9.80 to 6.14,
and 8.85 to 4.43, which means that recommendations are generally
less diverse in terms of consumers to which this movie has reached.
True diversities are also easy to interpret—they tell the efective
number of species, the number of equally abundant species that
produce same diversity. In our case, recommendations are 1.5 times
less diverse on Shannon index, and 2 times on Simpson index.</p>
      <p>Table 1 contains values of considered true diversity indices,
averaged over all items. Richness shows that on average items are
consumed by users of 13 out of 15 segments, but only recommended
to 3-6 segments. If concentration is taken into account, Shannon and
Simpson indices drop, indicating items being 2-3 times less diverse.
Paired t-test show significance of the diferences ( p &lt; 0.001).</p>
      <p>As item’s popularity can afect collaborative filtering methods,
we wonder if lower diversity is due to low item popularity. To
examine this, we split the items into the head of most popular items
(80% of interactions), and the rest in the tail—histogram of Shannon
diversity in Figure 2. On dataset, head items have higher diversity,
while tail tends to obtain lower values. Recommendations do not
follow these—both groups of items have distribution of diversity
skewed towards 0. This suggests that even popular items, receiving
more interactions, are isolated from wide and diverse consumers.
4</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Diversity is commonly studied in the context of items that are
recommended to users, which might help mitigating the problem
of users being exposed to narrower spectrum of item types. A
number of frameworks have been proposed to measure and increase
diversity, such as Intra-List Diversity [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Sales diversity [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ] is a notion of diversity which attempts to
capture how items perform, e.g. how evenly they are consumed.
It tackles the long tail problem, where most popular items drive
the recommendations. Aggregate Diversity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Gini index [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
Shannon entropy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] are some of the measures of sales performance
over items. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] an item-centric evaluation is conducted to
detect pathologies hindering novel recommendations. These method,
however, analyse impacts on items globally, not individually, and
also without considering diferent groups of consumers.
      </p>
      <p>
        In information retrieval, a concept of profile diversity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has
been proposed, where a profile contains information about the
user’s community. Then queries should retrieve documents that
diferent communities find useful. However, the framework does
not analyse consumers reached by these documents.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>In this paper we identified and explored the problem of consumer
diversity, which measures how diverse each item is in terms of
consumer segments. Our analysis shows that popular recommendation
techniques expose items to much narrower and less diverse
consumers. Although the overall quality of recommendations might be
good, items are hidden from certain groups of people who expressed
an interest in them in the past.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kwon</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques</article-title>
          .
          <source>IEEE TKDE 24</source>
          ,
          <issue>5</issue>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ò.</given-names>
            <surname>Celma</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A New Approach to Evaluating Novel Recommendations (RecSys '08).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fleder</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hosanagar</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Blockbuster Culture's Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity</article-title>
          .
          <source>Manage. Sci. 55</source>
          ,
          <issue>5</issue>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The MovieLens Datasets: History and Context</article-title>
          .
          <source>ACM Trans. Interact. Intell. Syst. 5</source>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Hill</surname>
          </string-name>
          .
          <year>1973</year>
          .
          <article-title>Diversity and evenness: a unifying notation and its consequences</article-title>
          .
          <source>Ecology</source>
          <volume>54</volume>
          ,
          <issue>2</issue>
          (
          <year>1973</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Jost</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Entropy and diversity</article-title>
          .
          <source>Oikos</source>
          <volume>113</volume>
          ,
          <issue>2</issue>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pelleg</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Moore</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>X-means: Extending K-means with Eficient Estimation of the Number of Clusters (ICML '00).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pacitti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amer-Yahia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Neveu</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Profile Diversity in Search and Recommendation (WWW '13 Companion)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Szlavik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.J.</given-names>
            <surname>Kowalczyk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.C.</given-names>
            <surname>Schut</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Diversity measurement of recommender systems under diferent user choice models (ICWSM'11).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McNee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lausen</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Improving Recommendation Lists Through Topic Diversification (WWW '05).</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>