=Paper=
{{Paper
|id=None
|storemode=property
|title=Predicting Component Utilities for Linear-Weighted Hybrid Recommendation
|pdfUrl=https://ceur-ws.org/Vol-1271/Paper7.pdf
|volume=Vol-1271
|dblpUrl=https://dblp.org/rec/conf/recsys/VahedianB14
}}
==Predicting Component Utilities for Linear-Weighted Hybrid Recommendation==
<pdf width="1500px">https://ceur-ws.org/Vol-1271/Paper7.pdf</pdf>
<pre>
                        Predicting Component Utilities for
                    Linear-Weighted Hybrid Recommendation

                                              Fatemeh Vahedian & Robin Burke
                              Center for Web Intelligence, Depaul University, Chicago, IL 60604
                                              {fvahedia, rburke}@cs.depaul.edu


ABSTRACT                                                                nesses, and other related elements. One obvious recommen-
An effective technique for recommendation in social media               dation task within Yelp is to recommend new businesses
and other heterogeneous networks is the weighted hybrid                 to users, but there are a variety of others. Recommend-
of low-dimensional components (WHyLDR). Recent stud-                    ing other users to befriend, recommending locations, and
ies have shown this technique is comparable to other in-                recommending categories of businesses are all user-focused
tegrative approaches while being considerably more flexi-               recommendation tasks. A site like Yelp may also be in-
ble. One key issue for the implementation of a WHyLDR                   terested in recommending users to businesses for marketing
system is the choice of components to generate. Research                purposes. In addition, a user may wish to constrain the rec-
has shown that the contribution of components based on                  ommendations in various ways: looking for a recommended
different network paths varies in unexpected and domain-                business in a particular category or in a particular location,
dependent ways. This work examines an information theo-                 for example.
retic technique for estimating component performance. Us-
ing a real-world social media dataset, we show that this tech-
nique is useful both for optimization (estimating component
weights) and for determining which components to include
in a hybrid.

1.     INTRODUCTION
Social media sites are an important element of today’s In-
ternet, drawing millions of users each day. The wealth of
information found in these sites makes recommender sys-
tems essential. Such sites often must integrate recommen-
dations of many types: recommending content, like-minded
users or appropriate tags to name just a few possibilities.                         Figure 1: Yelp network schema
Our approach, called the Weighted Hybrid of Low-Dimen-
sional Recommenders (WHyLDR), is designed to support
                                                                        As in other work with heterogeneous networks [10], the WHy-
the flexible creation and rapid deployment of a wide vari-
                                                                        LDR model views the network structure as a set of map-
ety of recommenders in a heterogeneous environment. We
                                                                        pings, or projections from nodes to nodes: projA (n) →
have demonstrated its effectiveness in prior work focusing
                                                                        {m0 , m1 , ..., mi } where A is a set of paths, n is a starting
on social tagging systems [8, 7, 6, 3, 4].
                                                                        node and the ms are nodes reachable from n via some path in
                                                                        A. Sets of paths that pass from one type of node to another
A social media site can be viewed as a heterogeneous network
                                                                        over specific categories of edges are known as meta-paths.
defined by a diversity of objects and relations. This diver-
sity gives rise to a wide variety of recommendation tasks.
                                                                        Meta-paths can also be envisioned as paths through a net-
For example, consider the popular social media site Yelp,
                                                                        work schema, an abstract view of a heterogeneous network
which allows users to review and rate restaurants and other
                                                                        that shows the node types and the possible connections be-
types of businesses.1 In Yelp, there are users, reviews, busi-
                                                                        tween them. Figure 1 shows the schema for part of the
1
    www.yelp.com                                                        Yelp social network. In this network, a path from any user
                                                                        node to the businesses that individual has reviewed to the
                                                                        categories associated with that business would be a UBC
                                                                        metapath. Following such a path for any given user yields
                                                                        a type of user profile – in this case, a user represented as
                                                                        the categories of rated businesses. Using such profiles, we
                                                                        can build standard collaborative filtering components to se-
                                                                        lect neighbors of users and generate recommendations. Out
                                                                        of a collection of such meta-paths, we can build an ensem-
Proceedings of the 6th Workshop on Recommender Systems and the Social
Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, Foster   ble of components, each capturing a different aspect of the
City, CA, USA. Copyright held by the authors.                           network.
While this model has proved successful in multiple social me-          items and users are compared directly based on their
dia settings, a key problem remains of how to control hybrid           profiles (the rows in their respective matrices), and
size. The set of components, like the set of possible meta-
paths, is unbounded. We have found that in some settings             • a non-personalized recommendation component based
components built using longer paths can outperform those               solely on item popularity.
using shorter paths. There is therefore no simple way to
choose a limited number of components and be sure of opti-
mal performance. In this paper, we examine an information       Table 1: Meta-paths for recommendation compo-
gain metric that can be used to estimate the potential con-     nents
tribution of components. We show that this metric can be         Type          Meta-paths
                                                                 User-based    UB, UBC, UBL, UBH, UBCB, UBLB, UBHB
used in two ways:
                                                                 Item-based    BU, BL, BC, BH, BLBC, BLBU, BUBU, BUBL
                                                                 Cosine        UBC, UBH
     • to estimate the hybrid weights directly, and
                                                                3.    CONTROLLING META-PATH
     • to filter components based on their expected contribu-
       tion to the hybrid.                                            GENERATION
                                                                There is no requirement that meta-paths be simple: nodes
                                                                and edges can be revisited, as seen in components like kNNUBLB ,
2.    WEIGHTED HYBRID                                           where the meta-path loops from businesses to locations and
A weighted hybrid recommender is a system comprised of
                                                                back to businesses again. Component generation could in
multiple recommendation components, each of which re-
                                                                theory continue indefinitely. However, there are significant
turns a real-valued score for a combination of user and item.
                                                                computational costs in generating components and in opti-
The scores from all the components are combined in a weighted
                                                                mizing a hybrid with a large number of components. More-
sum [2]. More formally,
                                                                over, some components will make only a minor contribution
                                                                to recommendation performance. It is therefore important
                                X                               to control this process. Ideally, we would like to be able
                    s(u, i) =       αj sj (u, i)                to estimate in advance what components are likely to make
                                j                               a substantial contribution to the hybrid and make an in-
                                                                formed decision to trade off expected accuracy against the
                                                                computational costs of additional components.
where s(u, i) is the overall score computed for a user-item
combination, sj (u, i) is the score computed by the jth com-
                                                                We have developed a measure based on mutual information
ponent, and αj is the weight associated with the jth compo-
                                                                for each meta-path to estimate the utility of recommenda-
nent.
                                                                tion components. For a given two-dimensional projection
                                                                AB, the mutual information can be calculated as
The weights are learned through an optimization procedure
as discussed below. The components are a function of the                         I(A, B) = H(A) − H(A|B)
recommendation task and the structure of the network.
                                                                where H(A) is entropy of dimension A and H(A|B) is the
WHyLDR components are built from two-dimensional ma-            conditional entropy. Entropy is defined as
trices familiar to researchers in collaborative recommenda-
                                                                                         X
                                                                               H(A) = −      p(ai )log(p(ai ))
tion [5]. A user-based matrix is one in which the rows are                                   i
users and the columns are the destination nodes for a given
                                                                The entropy is therefore a function of the probability of oc-
meta-path. Users are compared on the basis of their profiles
                                                                currence of nodes in each dimension. In our networks, we
and peer users form a neighborhood from which a target
                                                                define probability of node ai from dimension A based on the
user’s preferences for unknown items can be extrapolated.2
                                                                node degree:
The experiments reported here are for the task of recom-                                     Degree(ai )
                                                                                   p(ai ) = P
mending businesses to users. Four types of recommendation                                    i Degree(ai )
components are included in our Yelp model:
                                                                The intuition behind this measure comes from results re-
                                                                lated to random walks. As the length of a random walk
     • user-based KNN components constructed based on meta-     approaches infinity, the probability of hitting a node con-
       paths starting from a user node,                         verges to its normalized degree [?].

     • item-based KNN components from meta-paths start-         Conditional entropy measures the uncertainty of one dimen-
       ing from a business node,                                sion given another dimension, computed by summing up in-
                                                                dividual conditional probabilities. In our networks, we de-
     • cosine components in which two separate matrices are     fine the conditional probability PM (bi |aj ) as the fraction of
       constructed (one for users and one for items) and then   meta-paths of type M leaving node ai and arriving at bj out
2
  All of the optimizations that have been applied to collabo-   of all meta-paths of type M .
rative recommenders can therefore be applied to the individ-
ual WHyLDR components, for example, matrix factoriza-                                            |aj −→ bi |
                                                                                                      M
tion. We plan to study the properties of such optimizations                      PM (bi |aj ) = P
                                                                                                  |aj −→ bk |
on our hybrids in future work.                                                                   k     M
For example, consider the user-business-category meta-path        3.1   Methodology
and associated recommendation component. The values for           For the experiments reported here, we used the Yelp Aca-
H(U ) and HU BC (U |C) can be calculated using the formu-         demic Dataset, and followed the four fold cross validation
las above. If these values were roughly the same then the         methodology described in [4]. The α weights were learned
IU BC (U, C) will be around zero. This suggests that the          using Particle Swarm Optimization (PSO) [9] with recall
meta-path does not add much information beyond what is            as the optimization objective. In order to compare hybrid
already contained in the U dimension and that the UBC             model performance, we calculate recall and precision at list
meta-path is unlikely to give rise to a useful recommenda-        sizes from 1 to 10. These are averaged for each partition
tion component. The same principle can be applied to any          and then averaged over all partitions. We also calculate F1
user-based or item-based component. For the cosine com-           at list size 10, the harmonic mean of precision and recall.
ponents, we used the minimum information gain from either         We also measured the diversity of the results returned by
constituent meta-path. For the popularity component, we           these recommendations, but omit these results for reasons
used minimum information gain across all components.              of space.

In our previous experiments, we found that there was a sig-       4.    RESULTS
nificant correlation between the information gain associated      Figure 3 shows the recall-precision curve for the top indi-
with a meta-path and the learned α weight for a component         vidual recommendation component (kNNUB ) and the four
built using that meta-path [4]. In this work, we sought to        hybrids. There are several points to note here. One is there
build on that result by substituting our measure of meta-         is a relatively small difference between the performance of
path information for weights learned through optimization.        hybrid with learned weights HM-1 (dashed line) and the hy-
We use a simple normalization of the information gain mea-        brid with weights estimated from our information gain mea-
sure, so the contribution of ith component can be calculated      sure HM-IGW (solid line with diamond marks) and indeed,
as:                                                               some data points of the HM-IGW curve are above those of
                       Inf ormationGain(i)                        the learned hybrid. The results for the thresholded hybrids
                αi = k                                   (1)
                      P                                           (HM-T0.1 and HM-T0.2) are generally lower except at list
                         Inf omationGain(j)                       size of 1 and list size of 10.
                     j=1

                                                                  Figure 2 shows a different perspective on these experiments
                                                                  with the F1 measure computed at a recommendation list size
                                                                  of 10. Again, the hybrid with the learned weights shows the
                                                                  top performance with the other hybrids about 6-7% below.

                                                                  What we find therefore is that there is a modest trade-off be-
                                                                  tween recommendation performance with a fully-optimized
                                                                  hybrid and with weights computed using information gain.
                                                                  As the number of components increases the dimensionality
                                                                  of the optimization space increases, making the optimization
                                                                  step more time-consuming and more prone to over-fitting.
                                                                  Information gain can be computed directly from the meta-
                                                                  path expansions used to create the recommendation compo-
                                                                  nents and so is essentially free.
     Figure 2: F1 values for each hybrid model
                                                                  We also see that information gain may be useful in con-
Another way to use the information gain is as a filter con-       trolling the size of the hybrid. HM-T0.2 has one third fewer
trolling which components are incorporated into the hybrid.       components than the full hybrid and approximately 6% lower
In these experiments, we used two different thresholds (0.1       F1 performance. This suggests a process whereby meta-
and 0.2) for rejecting components with low information gain.      paths are generated and their information gain assessed be-
Table 2 shows which components were dropped in each case:         fore components are constructed, thus saving both off-line
the only difference between the lower threshold and the           optimization time and run-time computation.
higher one was the exclusion of the BUBU component at
the higher threshold.                                             5.    CONCLUSIONS
                                                                  A key challenge in social media recommendation is to in-
The results below report on four different configurations of      tegrate the different types of information available in such
the system: HM-1, a hybrid model including all the com-           systems to enhance recommendations and to offer recom-
ponents discussed in Section 2 with the weights learned           mendations of multiple types. The WHyLDR approach has
through optimization; HM-IGW, a hybrid including the same         been demonstrated to be successful in both of these respects.
components as HM-1 but with weights calculated as normal-         As there are an unbounded number of possible components
ized information gain; HM-T0.1 and HM-T0.2, hybrids with          in a WHyLDR hybrid, the questions arise of how to choose
learned weights but components with low information gain          components for a hybrid and when to stop adding to it.
filtered out using thresholds of 0.1 and 0.2, respectively. Ta-   In this paper, we show that our information gain measure
ble 2 shows which components were removed in each case.           shows promise for controlling hybrid creation and possibly
                                                                  for estimating component weights. While the full hybrid
                                                                  with learned weights showed the strongest performance, the
                                  Table 2: Removed components at each threshold
                                       Threshold    Components removed
                                       0.1          UBH, UBHB, BLBC, BUBC, BUBL
                                       0.2          Above, plus BUBU


                                             Figure 3: Recall vs. Precision


versions where our information gain heuristic was applied           of neighborhood-based recommendation methods. In
offered comparable results at lower computational cost.             F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,
                                                                    editors, Recommender Systems Handbook, pages
One of the key problems unaddressed by this work is the in-         107–144. Springer, 2011.
fluence of recommendation task. We know from prior work         [6] J. Gemmell, T. Schimoler, B. Mobasher, and
that, for most recommendation problems, the most strongly-          R. Burke. Recommendation by example in social
contributing component is the one that directly maps to the         annotation systems. E-Commerce and Web
recommendation task. For example, in this paper, kNNUB              Technologies, pages 209–220, 2011.
is the most important single component because we are rec-      [7] J. Gemmell, T. Schimoler, B. Mobasher, and
ommending businesses to users. If on the other hand, we             R. Burke. Tag-based resource recommendation in
were recommending locations, we would expect the kNNUBL             social annotation applications. User Modeling,
component to be a stronger contributor. This effect is not          Adaption and Personalization, pages 111–122, 2011.
captured by our information gain measure, which is a func-      [8] J. Gemmell, T. Schimoler, B. Mobasher, and
tion of the network structure alone. In future work, we will        R. Burke. Resource recommendation in social
examine additional recommendation tasks and try to deter-           annotation systems: A linear-weighted hybrid
mine how to modify our information gain measure to take             approach. Journal of Computer and System Sciences,
the task into account. We will also be experimenting with           78(4):1160–1174, 2012.
other heterogeneous network data sets to understand the         [9] J. Kennedy and R. C. Eberhart. Particle swarm
generalizability of these results.                                  optimization. In Proceedings of the IEEE International
                                                                    Conference on Neural Networks, pages 1942–1948,
6.   REFERENCES                                                     1995.
 [1] https://www.yelp.com/academic dataset.                    [10] Y. Sun and J. Han. Mining Heterogeneous Information
 [2] R. Burke. Hybrid recommender systems: Survey and               Networks: Principles and Methodologies. Synthesis
     experiments. User Modeling and User-Adapted                    Lectures on Data Mining and Knowledge Discovery.
     Interaction, 12(4):331–370, 2002.                              Morgan & Claypool Publishers, 2012.
 [3] R. Burke and F. Vahedian. Social web
     recommendation using metapaths. In RSWeb@RecSys,
     2013.
 [4] R. Burke, F. Vahedian, and B. Mobasher. Hybrid
     recommendation in heterogeneous networks. In
     Proceedings of the 22nd Conference on User Modeling,
     Adaptation and Personalization, 2014, to appear.
 [5] C. Desrosiers and G. Karypis. A comprehensive survey

</pre>