Clustering Enhancement for a Token-Based Recommender

                                                    Ville Ollikainen
                                        VTT Technical Research Centre of Finland
                                                02150 Espoo, Finland
                                                ville.ollikainen@vtt.fi


                                                                 values, “tokens”, in user-item transactions. This enables
                                                                 distributed recommendations in a multi-player
                         Abstract                                environment, based on bilateral communications between
                                                                 a user and a service. As such, while also providing easy
    This paper introduces a clustering enhancement               scalability, the approach relates to multi-domain
    to an established token-based collaborative                  collaborative filtering proposed by [Zha12] and cross-
    recommendation method (“upcv”). The method                   domain recommendations proposed by [Gao13],
    creates privacy-protecting abstractions for users            overcoming sparsity problems that are often experienced
    and items by exchanging and collecting                       in single domain collaborative recommenders. Privacy
    randomly generated N-bit values, “tokens”, in                properties of the method have been presented in [Oll16].
    user-item transactions. The novel enhancement                    The upcv recommender has been in public use in
    considers users’ random value spaces as                      Helsinki Metropolitan area libraries since 2014.
    hyperspaces in which the tokens are N-                       Available online, it has currently 600,000 patrons and
    dimensionally clustered. Instead of selecting                actively covers 300,000 book titles.
    exchanged tokens at random, as in the baseline                   Some approaches, such as [Hua15], acknowledge that
    upcv, tokens are now selected from a cluster,                users can hold multiple interests and items may belong to
    which has the best match with item’s token                   multiple categories. The same topic from context point of
    collection. Recommendation quality is evaluated              view is addressed in [Sap16], suggesting that it is also
    with the same 3.5% density data set as in a                  important to incorporate the contextual information into
    previous publication. The quantitative analysis              the recommendation process. [Bin12] in turn introduced
    indicates      overall     improvement         in            multi-class co-clustering for the purpose. Indeed, human
    recommendation quality while learning time                   life is versatile, and recommenders should respect this:
    decreased without exception, up to one-third.                Although one person may be interested in cooking and
    There was improvement even when the number                   motorcycles and another interested in cooking and
    of exchanged tokes was exactly one, instead of               gardening, the recommender should not end up
    over 100 in the baseline upcv. The performance               associating motorcycles with gardening.
    improvement may be explained by the clustering                   The rest of this paper is structured as follows: Section
    enhancement inherently recognizing versatility               2 provides preliminaries about the token-based
    of each individuals’ interests. The paper also               recommender. In Section 3, we introduce unsupervised
    presents a study with news data set, where the               N-dimensional clustering of tokens. Results are presented
    improvement was in coverage.                                 in Section 4 and concluded with discussions in Section 5.

1     Introduction                                               2     Preliminaries
This paper is based on a collaborative token-based               2.1     A Token-Based Recommender (“UPCV”)
recommendation method[Oll13, Oll17], which creates               The method associates both users and items with
privacy-protecting abstractions for users and items by           collections of tokens, each token carrying a random value.
exchanging and collecting randomly generated N-bit
                                                                 Interaction between user and item triggers selected
———————————————                                                  tokens to be copied from the token collection of the user
Copyright © CIKM 2018 for the individual papers by the papers'
                                                                 to the token collection of the item, and vice versa. In
authors. Copyright © CIKM 2018 for the volume as a collection
                                                                 previous papers[Oll13, Oll16], the maximum size of
by its editors. This volume and its papers are published under
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
collections has been 1024 tokens, from which up to 15%           In step b), Alice is accessing Item X, triggering a token
have been randomly selected for the exchange.                exchange procedure; copying tokens from the user to the
   When the same user interacts with several items, or the   item and vice versa. In this step, both Alice and Item X
same item is involved in interactions with several users,    have only one token; these are the exchanged tokens
tokens are spread around, resulting in statistical           resulting step c), in which both Alice and Item X do have
similarities among different token collections in the        common tokens. (In the figure bold & italics font
system. Since tokens are copied in user-item interactions    highlights the most recently acquired tokens.)
only, it is likely that similarities between two token           Next, Bob is accessing the same item X in the Service
collections originate from similar user behavior. The        1, and once again, a couple of tokens are copied over (step
method is collaborative by nature and requires no content    d). It should be noted that this time Item X is able to
analysis.                                                    provide more than one token. After Bob’s action, it
   Collections are dynamic by nature and, unlike cookies     should be noted that also Bob and Alice have similar
and other tracking means, tokens have no persistent          tokens, as can be seen in step e).
associations with real world. In particular, they do not         Still in e): Bob is accessing Item Y in the Service 2. A
have any association with persons: tokens are mere           couple of tokens are requested for exchange; since Bob
random values that, over time, will be copied to and         has more than one token to give, some tokens are picked
deleted from collections.                                    randomly from his collection.
   The following example[Oll17] in Figure 1 explains             As the last action in this example, Alice requests a
how tokens are able to provide recommendations.              recommendation for herself from the Service 2, which she
                                                             is now visiting the very first time (step f). Finally, Service
                                         a)                  2 goes through all available items and compares their
                                                             tokens with the provided tokens. It is likely that Item Y
                                                             will be in the recommendation list, since there is a token
                                         b)                  in common. For a similarity measure in general, previous
                                                             papers [Oll13, Oll16, Oll17] suggest using Jaccard index.

                                                             2.2     Book Club Data Set and a Previous S tudy
                                         c)                  The recommendation method was first introduced in
                                                             [Oll13], using a data set collected from book club
                                                             members of Bonnier Books Finland in 2013. This
                                         d)                  publication presents some details of the data set.
                                                             Concisely, book club members (users) were asked,
                                                             without limitation, which books (items) they had read and
                                                             liked from a collection of 1041 books. 1575 members
                                         e)                  responded, of which 1532 selecting at least one book,
                                                             providing 55434 individual selections total. Hence,
                                                             density was 3.5%. 18 users selected exactly one book.
                                                                 The selections were converted into user-item pairs
                                        f)                   (“transactions”), shuffled into random order and divided
                                                             into two subsets, each consisting of 27717 user-item pairs
                                                             in random order. The first subset was used as training
                                                             data, while the second subset remained for validation. In
  Figure 1: Example of Two Users and Two Services;           the baseline study[Oll13], exchanged tokens were
Recommendation for Alice at Her First Visit in Service 2     selected randomly while a maximum of 15% of token
                                                             collections were exchanged in each transaction. When a
   The step a) in Figure 1 illustrates two independent       token collection reached its maximum size (1024 tokens),
services on the top, the upper service (“Service 1”)         tokens were deleted at random to make space.
having just a single item (“Item X”) and the lower service       The quantitative assessment was focusing on a
(“Service 2”) having a plurality of items, including “Item   practical question: how long should a recommendation
Y”. Two users, “Alice” and “Bob”, are presented for          list be, in order to have at least one successful
simplicity. In the beginning, each user and item has a       recommendation, i.e. a book that a user has selected
single random number, “token”, in their collections.         exists in the validation subset.
Tokens are illustrated as 823, 463, 71 and 343.
                                                                dimension. A generic illustration of a hypercube is
                                                                presented in Figure 3: An N-dimensional hypercube can
                                                                be created by adding a copy of an N-1 dimensional
                                                                hypercube with edges connecting respective corners. In
                                                                the illustration, the corners are numbered in such a way
                                                                that each dimension adds one most significant bit, in the
                                                                original hypercube the bit value being 0 and in the copy
                                                                1.


           Figure 2: Median Length of a Successful
     Recommendation List (vertical axis, with 10%- 90%
    intervals) for Users With Specific Number of Books in
            Training Data (horizontal axis) [Oll13]
                                                                Figure 3: N-dimensional Hypercube Is a Doubled Copy of
    Figure 2 (above) from the baseline study illustrates that    an N-1-Dimensional Hypercube with an Edge Between
shorter recommendation lists were adequate for users                the Original (grey) and the Copied (black) Corner
who had a higher number of books selected in the training
                                                                   In the novel concept, each token represents one corner
data [Oll13]. For example, half of the users having ~28
                                                                in the hypercube, while a token collection is a set of
books in the training data got at least one successful
                                                                corners, respectively.
recommendation in a list with five books. The figure also
                                                                   Tokens in each individual collection (separately) were
illustrates intervals that exclude the upper and lower 10%
                                                                then clustered with bisecting K-means, recursively
of the users in each group, and a best matching trend line.
                                                                splitting a hypercube into clusters and sub clusters each
2.3      News Portal Data S et Studies                          time by applying a cutting hyperplane through a previous
                                                                cluster, until a pre-defined maximum number of clusters
“Ilkka” news portal dataset consists of 2123 users, 2439        was reached or the remaining sub clusters were dense
news articles and 35891 news clicks in chronological            enough.
order over a period of 30 days. Success was evaluated by           For each cluster, we pick a best representing token, a
checking if a user eventually clicked a recommended             “center token”. When tokens are exchanged, the “item”
news article. In recommendations, each article had a 24-        discloses its center token and the “user” activates the
hour active lifespan from its first click. Training period      cluster, which is closest to it. Only the tokens in the active
was 7 days individually, beginning at each user’s first         cluster take part in token exchange. If the collection
click. Recommendations were created and evaluated at            reached its maximum size, those tokens that are farthest
each click, excluding articles already clicked by the user.     away from any cluster were deleted first.
With these parameters, recommendation lists with at least          If an already existing token was received, a new token
one success were possible in 14552 clicks (hereinafter:         was created as a randomly selected neighboring corner
100% ‘Click Coverage’). Maximum token collection size           (out of 24), making the particular cluster more dense.
was 256; otherwise, the setup was the same as in Chapter           A 24-dimensional hypercube has 224 ≈ 16 million
2.2, with 15% token exchange (i.e. max. 38).                    corners, being capable to accommodate a relatively high
                                                                number of clusters reflecting a multitude of our interests.
3      Methodology
                                                                3.2     New Similarity Metric
3.1      Clustering of Token C ollections
                                                                Instead of Jaccard [Oll13], a new similarity metric (1) is
The motivation behind clustering is, that a recommender         introduced in this paper to take into account cluster
should not mix multiple interests, such as “cooking”,           densities, while not requiring exact matches as in Jaccard.
“motorcycles” and “gardening”, as mentioned in the              It sums up, how well each token in collection A matches
Introduction.                                                   the other collection B:
   As a novelty, we introduce now a conceptual view of
tokens as points in a hyperspace by converting token’s
numeric value values into N-bit representation (e.g.
N=24), resulting an N-dimensional binary hypercube.
                                                                where A and B are token collections and dHAD is
Each token represents one corner in this hypercube, each
                                                                Hamming distance (i.e. number of differentiating bits).
bit value (0 or 1) defining its projection in the respective
                                                                The new similarity metric improves coverage, since it
does not require exact matches between token                       Exchanging 3 tokens provided fastest learning: For
collections. As a drawback, it is computationally heavier.      instance, half of the users having ~8 books in training data
                                                                got a successful recommendation in a list of mere five
3.3     Experiment                                              books; the previous study indicated similar performance
The experiment was carried out as in [Oll13], with the          for users with ~25 books in training.
same data and other parameters when applicable, in order           Table 1 presents respective recommendation coverage
to compare the effectiveness of clustering. The maximum         and number of successful recommendations (out of 1523
number of clusters for any single user was set to four, after   users) in the book club experiment. It is notable that
some preliminary experiments. Items had a single cluster        recommendations were most diverse when only 1 token
(i.e. no clustering). The number of exchanged tokens was        was exchanged: 31% of all books were in top-5
reduced to 1, 3 and 10 tokens, however with an additional       recommendations at least once. The number of
condition that the number of received tokens was never          unsuccessful recommendations were ~10 (i.e. 99.3%
allowed to exceed the number of already existing tokens.        success): failures typically related to users with one rare
Tokens that were closest to cluster centers were selected       book in training data. Only two exceptions were
for exchange, instead of random picks used in the baseline      observed: users with two and three such classic books (at
method.                                                         1 and 3 token exchange, respectively) in training data that
    In the book experiment, the quantitative assessment         could be classified as bestsellers.
was similar as in [Oll13]. Recommendations were given
only to those users that were in the training data,                Table 1: Recommendation Coverage and Number of
containing 1523 users; some users having only one book               Successful Recommendations in the Book Club
selected had that book not in training but in validation                             Experiment
data.
    A second experiment was carried out with the news
portal data set, comparing results of non-clustered and
clustered methods with recommending random (with
Monte Carlo evaluation) and most popular articles. The
clustering method with 3 token exchange was compared
                                                                   As can be expected, the average number of tokens in
to the baseline method as described in Chapter 2.3.
                                                                each users’ collections varied according to the number of
                                                                exchanged tokens: e.g., when exchanging exactly 1 token
4     Results                                                   at a time, a collection also grows by one at most. While
                                                                the collections had different number of tokens, the
As Figure 4 illustrates, clustering reduced learning time       number of clusters did not substantially vary.
(# of books in training data) to up to one-third, while the        Figure 5 below illustrates proportions of users that
overall quality was invariably improved. As found in            would get successful recommendations with given length
[Oll13], the method seems to be not critical of the number      of a recommendation list, when exchanging 3 tokens.
of tokens exchanged: there was notable improvement
even when exchanging only one token at a time.


                                                                   Figure 5: Length of a Recommendation List When
                                                                     Exchanging 3 Tokens in Each Transaction; the
        Figure 4: Median Length of a Successful                         Proportion of Users Getting Successful
      Recommendation List With Clustering Token                         Recommendations at Given Length of a
Collections, Exchanging Max 1, 3 and 10 Tokens in Each                          Recommendation List
        Transaction; all Outperform the Baseline
   Finally, Table 2 resents quantitative results of the news    References
recommendation experiment. The baseline method
outperformed ‘most popular’ recommendations, but had            [Bin12] Bin Xu, Jiajun Bu, Chun Chen, and Deng Cai.
a reduced click coverage. Compared to the baseline, the                 2012. An exploration of improving collaborative
clustering method fell behind in recommendation quality                 recommender systems via user-item subgroups.
but in turn provided full click coverage (c.f. 2.3). 3 tokens           In Proceedings of the 21st International
were exchanged, versus max. 38 in the baseline.                         Conference on World Wide Web. ACM, 21–30.

     Table 2: Results of the News Recommendation                [Gao13] Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P.,
                       Experiment                                       &     Guo,     J.    (2013).      Cross-domain
                                                                        recommendation via cluster-level latent factor
                                                                        model. Lecture Notes in Computer Science,
                                                                        8189      LNAI(PART          2),       161–176.
                                                                        doi:10.1007/978-3-642-40991-2_11
5    Conclusions and Discussion                                 [Hua15] Huang, S., Ma, J., Cheng, P., Wang, S. (2015) A
                                                                        Hybrid         Multigroup         Coclustering
This paper was focusing on comparing the results with a                 Recommendation Framework Based on
previous baseline study[Oll13], with the same data set                  Information Fusion. ACM Trans. Intell. Syst.
and evaluation metrics. As a novelty, unsupervised N-                   Technol. 6, 2, Article 27 (March 2015), 22
dimensional (N=24) clustering was applied to each users’                pages. DOI=http://dx.doi.org/10.1145/2700465
token collection individually, and only one cluster at a
time was selected for token exchange. Each user had a           [Oll13] Ollikainen, V., Mensonen, A., Tavakolifard, M.
maximum on four clusters.                                               (2013) UPCV Distributed recommendation
   In book club study this enhancement improved                         system based on token exchange. Journal of
recommendation quality, while it at the same time                       Print and Media Technology Research, Vol. 2,
reduced learning time. A previous publication[Oll13],                   No. 3, pp. 195 – 201 available at www.vtt.fi/inf/
reported that the method seems stable while varying the                 julkaisut/muut/2013/OA_JPMTR_1314_Ollikai
percentage of exchanged tokens. Clustering makes no                     nen.pdf
difference: The results remained better in all cases, when      [Oll16] Ollikainen, V., Niemi, V. (2016) Privacy
the number of exchanged tokens was reduced from                         Analysis of a Networked Collaborative
maximum of over 100 (15% out of 1024) to 1, 3 and 10.                   Recommendation System. International Journal
   Recommendations were diverse: About one-fourth of                    of Humanities and Management Sciences
the entire 1041 book collection were presented in Top-5                 (IJHMS) Volume 4, Issue 4 (2016) ISSN 2320–
recommendations. The evaluation suggested the best                      4044 (Online).
results with 3 token exchange. It is in further studies to
optimize token exchange with different data sets.               [Oll17] Ollikainen, V., Niemi, V. (2017) Evaluating the
   Tokens provide privacy-protecting abstraction. In                    Performance and Privacy of a Token-Based
certain arrangements, a single token exchange will enable               Collaborative Recommender. In Proceedings of
privacy, since it solves the remaining returning-user                   WI ’17, Leipzig, Germany, August 23-26, 2017,
privacy issue introduced in [Oll16].                                    5 pages.
   In the news recommendation experiment, quality                       http://dx.doi.org/10.1145/3106426.3109434
improvement was less clear. Although the                        [Sap16] Sappelli, M., Verberne, S., & Kraaij, W. (2016).
recommendation quality was close to popularity even                     Evaluation of context-aware recommendation
with full coverage, a tradeoff between coverage and                     systems for information re-finding. Journal of the
recommendation quality may exist: also worst cases get                  Association for Information Science and
recommendations with full click coverage, adversely                     Technology,            68(4),           895-910.
affecting quality evaluation. Further studies could                     doi:10.1002/asi.23717
quantify this phenomenon.
   While the N-dimensional clusters can be assumed to           [Zha12] Zhang, Y., Cao, B., & Yeung, D.-Y. (2012).
reflect multiple interests of users, the true relation                  Multi-domain collaborative filtering. arXiv
between clusters and multiple interests, perhaps utilizing              Preprint arXiv:1203.3535.
newspaper sections, would be worth studying, together
with optimizing the number of clusters.