<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Similar Product Clustering for Long-Tail Cross-Sell Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladislav Grozin</string-name>
          <email>vlad.grozin@diginetica.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alla Levina</string-name>
          <email>alla.levina@diginetica.com</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>One of the main reasons of the rapid growth and development of e-commerce is the ability of on-line stores to provide large varieties of products, unlike their off-line counterparts, confined by stock spaces. This also means that on-line stores have products, which have few individual purchases but form a large chunk of revenue all together. They are known as "long-tail" products. Long-tail data sparsity makes it challenging to apply recommender algorithms. In this paper, we consider cross-sell recommendations using association rule mining. We consider application of clustering techniques to tackle this problem, and compare different clustering and distance calculation methods. Behavioral data, such as product views, and content data, such as category tree and product names, are used to calculate product similarity. Also, we develop a cross-validation method that allows stable metric calculations for algorithms that focus on long-tail products. We show that product clustering that uses session-based distances improve cross-sell recommendations for long-tail items.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>cross-sell recommendations</kwd>
        <kwd>association rules</kwd>
        <kwd>long-tail</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowadays, recommender systems are omnipresent in on-line retail. In the last
decade, recommender systems have greatly developed. The growth of Internet
and e-commerce sites such as Amazon.com, Netflix, and the iTunes Music Store,
has opened the door to so-called "infinite-inventory" retailers, which offer large
variety of products. This distinguishes them in a positive way from off-line stores
which focus more on top-sellers and hits due to physical limitations of shops and
costly stock spaces [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, the large variety of products also means that
there will are a lot of items that are purchased and viewed a few times. Indeed,
majority of the products in site catalogs belong to "long-tail" - items with have
much less number of purchases than items from "short-tail" (hits). This long-tail
accounts for large chunk of revenue, so business cannot ignore long-tail products.
      </p>
      <p>
        Recommendations are called "cross-sell" when recommended items are
"complementary" to item being viewed (external battery and cases for phones; cables
and supports for TV) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Unlike "similar items", or "items similar to ones user
liked in past", such recommendations guide user through site catalog, making
navigation easier, while generating additional sales of high-margin
supplementary products [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This makes cross-sell important for both site functioning and
user satisfaction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A common way of generating cross-sell recommendations
is association rule mining [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Such recommendations are often labeled as
"commonly purchased together", "do not forget to buy".
      </p>
      <p>However, data sparsity makes it challenging to apply recommender
algorithms. In our paper, we investigate item clustering approaches that group
similar items together, and use such dense representation of data to improve quality
of cross-sell recommendations generated by the association rule mining
algorithm.</p>
      <p>Below we consider related research, describe the approach to a decision of
the problem, list the methods of the clustering and methods of measuring the
distances between items, describe the experiments and see the results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related research</title>
      <p>
        association rule mining (ARM) is a technique that extracts rules from dataset.
These rules are often written as X ) Y , where X is condition (antecedent,
driver), and Y is consequent. Both X and Y are sets of items. X represents the
set of items that a user has already purchased, and Y is the set of items that
the user is likely to purchase next. This technique was formalized in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>For given rule X ) Y , support is defined as</p>
      <p>support(X ) Y ) = P (A \ B)</p>
      <p>P (T ) is probability to find the transaction T in dataset. Support shows how
often spe cified items appear in the dataset. Confidence of rule X ) Y is the
ratio of the number of the transactions that contain X and Y over the number
of the transaction containing Y , and defined as</p>
      <p>conf idence(X ) Y ) = P (A \ B)=P (A)</p>
      <p>Lift of rule X ) Y is the ratio of the observed support to that expected if
X and Y were independent:
(1)
(2)
lif t(X ) Y ) = P (X [ Y )=(P (X) P (Y ))
(3)</p>
      <p>
        ARM is an effective method for generating cross-sell recommendations. The
authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consider a recommender system that for given user purchase history
picks recommended items using plain confidence, lift, or profit-oriented metrics.
It is shown that the lift and profit-oriented metrics provide better results than
the confidence does.
      </p>
      <p>
        In practice, recommendation systems face a long-tail problem when many
items were purchased and viewed few times in the dataset. It was shown in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
that ARM in long-tail shows low performance due to low support and confidence
of mined rules for unpopular products.
      </p>
      <p>
        Long-tail effect is well-known, and many authors tried to improve the
recommendations for long-tail. Authors of [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] adopted random walk on bipartite
user-item graph to make personalized recommendations for long-tail. By using
products as starting point of random walk, and measuring average steps to the
user of interest, the authors promoted recommendations of unpopular items.
However, this algorithm finds similar items to the ones that user has interacted
with, thus making it not applicable to the cross-sell.
      </p>
      <p>
        Park in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] employed product clustering to create dense data representation.
He applied adaptive clustering for the task, and showed that it performed better
than the conventional k-nearest neighbors clustering. Euclidean distance between
vectors of basic product features was used as a distance between the products.
This distance didn’t take into account item category or genre similarity, or how
many users were interested in both items. For example, if two movies of different
genres have similar average rating and unbiased average rating, these movies
would be considered similar. The author run the linear model in order to
predict user preferences. In contrast, we solve the different recommender problem.
However, this paper gives insight into how we can solve sparsity problems by
clustering items together.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach description</title>
      <p>
        Lift is used to determine, which items are the best to be recommended for
given user’s history [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We consider only single-item antecedents of mined rules
because increase in condition complexity decreases support, which is already low
in long-tail [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. So, in this research, when a user purchases or adds to a cart
some item, recommendations are generated using only this particular item. Also,
we focus on recommendations for long-tail items.
      </p>
      <p>
        Item clustering is used to reinforce individual item data. For the item of
interest, we pick several nearest neighbors using some distance metric, and treat
this set as a single item [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Let C be the set of N nearest neighbors with
individual items Ci picked by clustering algorithm. If we treat this set of items
as a single item, we can extend the lift definition:
Lif t(C; Y ) = P ((C1 [ C2 [ ::: [ CN ) \ Y )=(P (C1 [ C2 [ ::: [ CN ) P (Y )): (4)
      </p>
      <p>We also have to employ some distance between products that can capture
semantic similarity. For example, if many users have viewed specific pair of
products together, these products are likely to be similar.
3.1</p>
      <sec id="sec-3-1">
        <title>Methods of clustering</title>
        <p>
          We considered two methods of clustering: total clustering and adaptive clustering
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Note that the neighbor relationship in these approaches is not symmetric
relation in these: if item A has item B as a cluster neighbor, it is not guaranteed
that B will have A as a cluster neighbor.
        </p>
        <p>Total clustering For a specified item, this method picks into cluster K nearest
neighbors to the item of interest, according to some distance metric.
Adaptive clustering Premise of this method is that we want bigger clusters
if the products have few records available for them, and smaller clusters if the
products have enough data. So, let us define some threshold N , and pick the
nearest neighbors one by one until the total sum of the picked items’ records
exceeds the threshold. In our case, we count purchase orders. Note that different
items will have different size of neighborhood. For instance, if an item is popular,
it may even contain only the item itself.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Metrics of distances</title>
        <p>
          Category Distance. Hierarchical category structure is commonly found in
ecommerce resources. We have a few top-level categories that have successors,
categories-successors has their successors and so on until we come to the leaves
of this structure — items. Category Distance between two items is the length of
the shortest path between these items [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Session Distance. We build the bipartite user-item graph [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and use link
prediction techniques [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to assess item similarity. One of popular and simple
metrics to measure similarity is the Jaccard’s measure. In our case, we use this
metric to measure similarity of sets of sessions viewed two items. For example, if
majority of the users has viewed both items within one session, we can assume
that those items are similar.
        </p>
        <p>
          Prod2Vec Distance. Prod2Vec model was proposed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The premise of this
approach is that we can consider user sessions as "sentences", and individual
items as "words". After applying conventional Word2Vec model to our dataset,
we get embeddings for each item. So, we can measure thus item similarity as the
scalar product of their vector representations.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        We have to evaluate how well we are giving recommendations for users, for
given clustering algorithm and distance metric. NDCG [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a common and
widely used metric to calculate recommendation quality. Our baseline we is the
cross-sell recommendations using ARM and lift as recommendation value [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Such baseline picks items with best lift measure for given user purchase history.
      </p>
      <p>In order to determine the best way of computing item similarity and the
best clustering method, all combinations of the clustering methods and distance
formulas are run. Also, we have to pick the best clustering hyperparameters.
Therefore, each clustering-pair distance is evaluated several times, each time
with different cluster size. Number of items per cluster is varied from 2 to 10
with step of 1 for total clustering, and number of purchases per cluster is varied
from 15 to 300 with step of 15 for adaptive clustering.
4.1</p>
      <sec id="sec-4-1">
        <title>Data sources</title>
        <p>We use data provided by e-commerce resource that have agreed to conduct
experiments. Date ranges of datasets are non-disclosed, datasets contain a anonymized
sampled data. Page views other than item page views are not used in this
research. We have 2.3M of unique sessions, 8.5 of item view events, 50k products
in catalog, 1k categories, and 434k of purchase orders. Most commonly
purchased product has 130 purchases, and top-1% of most commonly purchased
items account for 78% of revenue.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Cross-validation</title>
        <p>One of the challenges of working with a long-tail is cross-validation. If we split
data with conventional k-fold split, then we would end up with dozen samples
in train and test sets, and some products may have no samples in the test set at
all. Therefore, we have designed the cross-validation procedure in a way that it
would stratify data by products, guaranteeing data entries in both train and test
for each item. Also, we considered highly-purchased products in order to have
enough data to work with, and emulated long-tail sparsity by moving majority
of the records to the test set. In order to ensure reproducibility, random seed
was fixed.</p>
        <p>Full evaluation procedure is described below:
1. Take all top-sellers (top-1% of items, ordered by sold quantity). Randomly
assign half of them to "test items" set.
2. For every item in test items:</p>
        <p>(a) Mark random 10% of sessions that have viewed this item
3. Put all events from sessions that have been marked, or have not viewed items
from test item set, into train set, and move all other events to the test set.
This action leaves only a small portion (10%..20% of original record count of
test items) of data in the train set, effectively emulating environment with
sparse data. Howewer, we keep a lot of records (80%..90%) in the test set to
accurately assess the algorithm performance.
4. For each session in the test set:
(a) For each item from the test item set, which was purchased within this
session:
i. Build recommendations for this particular purchase event, item and
session:
A. Run clustering with specified distance and settings to determine
nearest neighbors to the item being purchased.</p>
        <p>B. Fetch all orders that contain at least one item from the previous
step.</p>
        <p>C. Iterate over all items in these orders, and calculate Lif t(C; Y ),
where C is purchased item neighbor cluster, and Y is a candidate
item.</p>
        <p>D. Exclude all the items selected at stage A from the result obtained
at stage C.</p>
        <p>E. Take 20 items with the highest lift value from obtained in stage D
list, and recommend these items to user.
ii. Assess recommendations quality for this particular purchase event,
item and session.</p>
        <p>Conventional stratified k-fold split would hide a portion (1/k) of data for
each product. We are interested in long-tail, so we have to pick products with
few records in train set. This also means that we would have even less records
in test set for metric measurement. If we simply put most of the data in test
set, it would put models in unrealistic situation where all items in train set have
few records. For example, models would not be able to reinforce data by picking
nearest popular product to the long-tail ones because all products will have few
records left. This can be solved by hiding majority of records only for some
products, and leave other products unaffected.</p>
        <p>
          Also, users sessions can have complex structure. This is important for models
that use sequence of events, for example Prod2Vec [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Therefore, we should not
hide single events, we should put entire sessions in test set.
        </p>
        <p>Thus, our algorithm takes portion of popular products, and puts majority of
sessions that have viewed these products in test set, leaving only few sessions in
train set. After that, we measure how well algorithm makes recommendations
for these products. Such cross-validation procedure forces algorithm to make
recommendations using sparse train data for these products, but we also have
large amount of records for metric measurement. This way we can properly assess
recommender performance for long-tail products.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Quality measurement</title>
        <p>
          We use well known and often used NDCG@20 to asses recommendations quality
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]:
        </p>
        <p>N DCG =</p>
        <p>DCG
IDCG</p>
        <p>:
n
DCG = X
i=1</p>
        <p>reli
log(i + 1)
; IDCG =
jorderedRelj orderedReli</p>
        <p>X</p>
        <p>;
i=1
log(i + 1)
where reli is relevance of ith recommended item, orderedRel is the set of items
ordered in descending order by their relevance. So, IDCG is normalization
coefficient that ensures that N DCG lies between 0 and 1.</p>
        <p>NDCG metric is calculated at step 5.a.ii of our algorithm for each instance
when a test user buys a test item; after that, the values are averaged. Our
(5)
(6)
goal is to guess what the user is going to purchase in the future. Therefore, we
evaluate rel of the the recommended item as 1 if the user is going to purchase
recommended item in future, and 0 otherwise. We skip instances when the user
will not purchase any items in the future.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>We can see that session-based distance works best for both types of
clustering. Category-based distance provides low performance. We account this for
the fact that categories ofter contain diverse sets of products, so clustering that
uses category-based distance may include unrelated items in cluster. Prod2Vec
distance also have not beaten baseline. This can be explained by the fact that
long-tail products do not have enough data for this model.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and future work</title>
      <p>We have used association rule mining in order to generate recommendations. We
have shown that by using Jaccard’s measure as a distance between products, we
can cluster similar items together; such clustering improves cross-sell quality for
long-tail. Adaptive clustering that increases cluster size when there is lack of
data works better than total clustering that always picks constant amount of
neighbors.</p>
      <p>In future works, we will should extend our approach. For example, we can
combine several distance measures into one that works better. Also, currently we
mine only rules with single cluster in antecedents, and item in consequents. We
should consider mining rules with multiple clusters in antecedents, and clusters
(instead of items) in consequents.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
          </string-name>
          , R.:
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In: 20th VLDB Conf. (Sep</source>
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Grbovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radosavljevic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djuric</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhamidipati</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savla</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhagwan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharp</surname>
          </string-name>
          , D.:
          <article-title>E-commerce in your inbox: Product recommendations at scale</article-title>
          . In: Cao,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Joachims</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Webb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.I.</given-names>
            ,
            <surname>Margineantu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (eds.) KDD. pp.
          <fpage>1809</fpage>
          -
          <lpage>1818</lpage>
          . ACM (
          <year>2015</year>
          ), http://dblp.uni-trier.de/db/conf/ kdd/kdd2015.html#GrbovicRDBSBS15
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Järvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kekäläinen</surname>
          </string-name>
          , J.:
          <article-title>Cumulated gain-based evaluation of ir techniques</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS) 20(4)</source>
          ,
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          (
          <year>2002</year>
          ), http://scholar.google.de/scholar.bib
          <article-title>?q=info:6Bdw8cs-UYMJ: scholar</article-title>
          .google.com/&amp;output=citation&amp;hl=de&amp;
          <article-title>as_sdt=0,5&amp;ct=citation&amp;cd=0</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kitts</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freed</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrieze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Cross-sell: a fast promotion-tunable customeritem recommendation method based on conditionally independent probabilities</article-title>
          . In: Ramakrishnan,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Stolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.J.</given-names>
            ,
            <surname>Bayardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.J.</given-names>
            ,
            <surname>Parsa</surname>
          </string-name>
          , I. (eds.) KDD. pp.
          <fpage>437</fpage>
          -
          <lpage>446</lpage>
          . ACM (
          <year>2000</year>
          ), http://dblp.uni-trier.de/db/conf/kdd/kdd2000.html# KittsFV00
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liben-Nowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The link-prediction problem for social networks</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>58</volume>
          (
          <issue>7</issue>
          ),
          <fpage>1019</fpage>
          -
          <lpage>1031</lpage>
          (
          <year>2007</year>
          ), http://dx.doi.org/10.1002/asi.20591
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Oktar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Recommendation systems: Increasing profit by long tail</article-title>
          . http://en. webrazzi.com/
          <year>2009</year>
          /09/18
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>Y.J.:</given-names>
          </string-name>
          <article-title>The adaptive clustering method for the long tail problem of recommender systems</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>25</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1904</fpage>
          -
          <lpage>1915</lpage>
          (
          <year>2013</year>
          ), http://dblp.uni-trier.de/db/journals/tkde/tkde25.html#Park13
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Riaz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arooj</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>Clustering based association rule mining on online stores for optimized cross product recommendation</article-title>
          .
          <source>In: ICCAIS</source>
          . pp.
          <fpage>176</fpage>
          -
          <lpage>181</lpage>
          . IEEE (
          <year>2014</year>
          ), http://dblp.uni-trier.de/db/conf/iccais/ iccais2014.html#RiazAHK14
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Schafer</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
          </string-name>
          , J.:
          <article-title>Recommender systems in e-commerce</article-title>
          .
          <source>In: Proceedings of the ACM Conference on Electronic Commerce</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Challenging the long tail recommendation</article-title>
          .
          <source>PVLDB</source>
          <volume>5</volume>
          (
          <issue>9</issue>
          ),
          <fpage>896</fpage>
          -
          <lpage>907</lpage>
          (
          <year>2012</year>
          ), http://dblp.uni-trier.de/db/journals/ pvldb/pvldb5.html#YinCLYC12
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>