<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM RecSys</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predicting Component Utilities for Linear-Weighted Hybrid Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fatemeh Vahedian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Burke</string-name>
          <email>rburke@cs.depaul.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Web Intelligence, Depaul University</institution>
          ,
          <addr-line>Chicago, IL 60604</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>10</volume>
      <abstract>
        <p>An e ective technique for recommendation in social media and other heterogeneous networks is the weighted hybrid of low-dimensional components (WHyLDR). Recent studies have shown this technique is comparable to other integrative approaches while being considerably more exible. One key issue for the implementation of a WHyLDR system is the choice of components to generate. Research has shown that the contribution of components based on di erent network paths varies in unexpected and domaindependent ways. This work examines an information theoretic technique for estimating component performance. Using a real-world social media dataset, we show that this technique is useful both for optimization (estimating component weights) and for determining which components to include in a hybrid.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A social media site can be viewed as a heterogeneous network
de ned by a diversity of objects and relations. This
diversity gives rise to a wide variety of recommendation tasks.
For example, consider the popular social media site Yelp,
which allows users to review and rate restaurants and other
types of businesses.1 In Yelp, there are users, reviews,
businesses, and other related elements. One obvious
recommendation task within Yelp is to recommend new businesses
to users, but there are a variety of others.
Recommending other users to befriend, recommending locations, and
recommending categories of businesses are all user-focused
recommendation tasks. A site like Yelp may also be
interested in recommending users to businesses for marketing
purposes. In addition, a user may wish to constrain the
recommendations in various ways: looking for a recommended
business in a particular category or in a particular location,
for example.
As in other work with heterogeneous networks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the
WHyLDR model views the network structure as a set of
mappings, or projections from nodes to nodes: projA(n) !
fm0; m1; :::; mig where A is a set of paths, n is a starting
node and the ms are nodes reachable from n via some path in
A. Sets of paths that pass from one type of node to another
over speci c categories of edges are known as meta-paths.
Meta-paths can also be envisioned as paths through a
network schema, an abstract view of a heterogeneous network
that shows the node types and the possible connections
between them. Figure 1 shows the schema for part of the
Yelp social network. In this network, a path from any user
node to the businesses that individual has reviewed to the
categories associated with that business would be a UBC
metapath. Following such a path for any given user yields
a type of user pro le { in this case, a user represented as
the categories of rated businesses. Using such pro les, we
can build standard collaborative ltering components to
select neighbors of users and generate recommendations. Out
of a collection of such meta-paths, we can build an
ensemble of components, each capturing a di erent aspect of the
network.
      </p>
      <p>While this model has proved successful in multiple social
media settings, a key problem remains of how to control hybrid
size. The set of components, like the set of possible
metapaths, is unbounded. We have found that in some settings
components built using longer paths can outperform those
using shorter paths. There is therefore no simple way to
choose a limited number of components and be sure of
optimal performance. In this paper, we examine an information
gain metric that can be used to estimate the potential
contribution of components. We show that this metric can be
used in two ways:
to estimate the hybrid weights directly, and
to lter components based on their expected
contribution to the hybrid.</p>
    </sec>
    <sec id="sec-2">
      <title>2. WEIGHTED HYBRID</title>
      <p>
        A weighted hybrid recommender is a system comprised of
multiple recommendation components, each of which
returns a real-valued score for a combination of user and item.
The scores from all the components are combined in a weighted
sum [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. More formally,
s(u; i) = X jsj(u; i)
      </p>
      <p>j
where s(u; i) is the overall score computed for a user-item
combination, sj(u; i) is the score computed by the jth
component, and j is the weight associated with the jth
component.</p>
      <p>
        The weights are learned through an optimization procedure
as discussed below. The components are a function of the
recommendation task and the structure of the network.
WHyLDR components are built from two-dimensional
matrices familiar to researchers in collaborative
recommendation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A user-based matrix is one in which the rows are
users and the columns are the destination nodes for a given
meta-path. Users are compared on the basis of their pro les
and peer users form a neighborhood from which a target
user's preferences for unknown items can be extrapolated.2
The experiments reported here are for the task of
recommending businesses to users. Four types of recommendation
components are included in our Yelp model:
user-based KNN components constructed based on
metapaths starting from a user node,
item-based KNN components from meta-paths
starting from a business node,
cosine components in which two separate matrices are
constructed (one for users and one for items) and then
2All of the optimizations that have been applied to
collaborative recommenders can therefore be applied to the
individual WHyLDR components, for example, matrix
factorization. We plan to study the properties of such optimizations
on our hybrids in future work.
items and users are compared directly based on their
pro les (the rows in their respective matrices), and
a non-personalized recommendation component based
solely on item popularity.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. CONTROLLING META-PATH</title>
    </sec>
    <sec id="sec-4">
      <title>GENERATION</title>
      <p>There is no requirement that meta-paths be simple: nodes
and edges can be revisited, as seen in components like kNNUBLB,
where the meta-path loops from businesses to locations and
back to businesses again. Component generation could in
theory continue inde nitely. However, there are signi cant
computational costs in generating components and in
optimizing a hybrid with a large number of components.
Moreover, some components will make only a minor contribution
to recommendation performance. It is therefore important
to control this process. Ideally, we would like to be able
to estimate in advance what components are likely to make
a substantial contribution to the hybrid and make an
informed decision to trade o expected accuracy against the
computational costs of additional components.</p>
      <p>We have developed a measure based on mutual information
for each meta-path to estimate the utility of
recommendation components. For a given two-dimensional projection
AB, the mutual information can be calculated as</p>
      <p>I(A; B) = H(A)</p>
      <p>H(AjB)
where H(A) is entropy of dimension A and H(AjB) is the
conditional entropy. Entropy is de ned as</p>
      <p>H(A) =</p>
      <p>X p(ai)log(p(ai))</p>
      <p>i
The entropy is therefore a function of the probability of
occurrence of nodes in each dimension. In our networks, we
de ne probability of node ai from dimension A based on the
node degree:
p(ai) =</p>
      <sec id="sec-4-1">
        <title>Degree(ai)</title>
        <p>Pi Degree(ai)
The intuition behind this measure comes from results
related to random walks. As the length of a random walk
approaches in nity, the probability of hitting a node
converges to its normalized degree [?].</p>
        <p>Conditional entropy measures the uncertainty of one
dimension given another dimension, computed by summing up
individual conditional probabilities. In our networks, we
dene the conditional probability PM (bijaj) as the fraction of
meta-paths of type M leaving node ai and arriving at bj out
of all meta-paths of type M .</p>
        <p>PM (bijaj) =</p>
        <p>
          jaj M! bij
P jaj M! bkj
k
For example, consider the user-business-category meta-path
and associated recommendation component. The values for
H(U ) and HUBC (U jC) can be calculated using the
formulas above. If these values were roughly the same then the
IUBC (U; C) will be around zero. This suggests that the
meta-path does not add much information beyond what is
already contained in the U dimension and that the UBC
meta-path is unlikely to give rise to a useful
recommendation component. The same principle can be applied to any
user-based or item-based component. For the cosine
components, we used the minimum information gain from either
constituent meta-path. For the popularity component, we
used minimum information gain across all components.
In our previous experiments, we found that there was a
signi cant correlation between the information gain associated
with a meta-path and the learned weight for a component
built using that meta-path [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this work, we sought to
build on that result by substituting our measure of
metapath information for weights learned through optimization.
We use a simple normalization of the information gain
measure, so the contribution of ith component can be calculated
as:
i =
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Inf ormationGain(i)</title>
        <p>k
P Inf omationGain(j)
j=1
(1)
Another way to use the information gain is as a lter
controlling which components are incorporated into the hybrid.
In these experiments, we used two di erent thresholds (0.1
and 0.2) for rejecting components with low information gain.
Table 2 shows which components were dropped in each case:
the only di erence between the lower threshold and the
higher one was the exclusion of the BUBU component at
the higher threshold.</p>
        <p>The results below report on four di erent con gurations of
the system: HM-1, a hybrid model including all the
components discussed in Section 2 with the weights learned
through optimization; HM-IGW, a hybrid including the same
components as HM-1 but with weights calculated as
normalized information gain; HM-T0.1 and HM-T0.2, hybrids with
learned weights but components with low information gain
ltered out using thresholds of 0.1 and 0.2, respectively.
Table 2 shows which components were removed in each case.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.1 Methodology</title>
      <p>
        For the experiments reported here, we used the Yelp
Academic Dataset, and followed the four fold cross validation
methodology described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The weights were learned
using Particle Swarm Optimization (PSO) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with recall
as the optimization objective. In order to compare hybrid
model performance, we calculate recall and precision at list
sizes from 1 to 10. These are averaged for each partition
and then averaged over all partitions. We also calculate F1
at list size 10, the harmonic mean of precision and recall.
We also measured the diversity of the results returned by
these recommendations, but omit these results for reasons
of space.
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. RESULTS</title>
      <p>Figure 3 shows the recall-precision curve for the top
individual recommendation component (kNNUB) and the four
hybrids. There are several points to note here. One is there
is a relatively small di erence between the performance of
hybrid with learned weights HM-1 (dashed line) and the
hybrid with weights estimated from our information gain
measure HM-IGW (solid line with diamond marks) and indeed,
some data points of the HM-IGW curve are above those of
the learned hybrid. The results for the thresholded hybrids
(HM-T0.1 and HM-T0.2) are generally lower except at list
size of 1 and list size of 10.
What we nd therefore is that there is a modest trade-o
between recommendation performance with a fully-optimized
hybrid and with weights computed using information gain.
As the number of components increases the dimensionality
of the optimization space increases, making the optimization
step more time-consuming and more prone to over- tting.
Information gain can be computed directly from the
metapath expansions used to create the recommendation
components and so is essentially free.</p>
      <p>We also see that information gain may be useful in
controlling the size of the hybrid. HM-T0.2 has one third fewer
components than the full hybrid and approximately 6% lower
F1 performance. This suggests a process whereby
metapaths are generated and their information gain assessed
before components are constructed, thus saving both o -line
optimization time and run-time computation.</p>
    </sec>
    <sec id="sec-7">
      <title>5. CONCLUSIONS</title>
      <p>A key challenge in social media recommendation is to
integrate the di erent types of information available in such
systems to enhance recommendations and to o er
recommendations of multiple types. The WHyLDR approach has
been demonstrated to be successful in both of these respects.
As there are an unbounded number of possible components
in a WHyLDR hybrid, the questions arise of how to choose
components for a hybrid and when to stop adding to it.
In this paper, we show that our information gain measure
shows promise for controlling hybrid creation and possibly
for estimating component weights. While the full hybrid
with learned weights showed the strongest performance, the
versions where our information gain heuristic was applied
o ered comparable results at lower computational cost.
One of the key problems unaddressed by this work is the
inuence of recommendation task. We know from prior work
that, for most recommendation problems, the most
stronglycontributing component is the one that directly maps to the
recommendation task. For example, in this paper, kNNUB
is the most important single component because we are
recommending businesses to users. If on the other hand, we
were recommending locations, we would expect the kNNUBL
component to be a stronger contributor. This e ect is not
captured by our information gain measure, which is a
function of the network structure alone. In future work, we will
examine additional recommendation tasks and try to
determine how to modify our information gain measure to take
the task into account. We will also be experimenting with
other heterogeneous network data sets to understand the
generalizability of these results.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>[1] https://www.yelp.com/academic dataset.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Hybrid recommender systems: Survey and experiments</article-title>
          .
          <source>User Modeling</source>
          and
          <string-name>
            <surname>User-Adapted</surname>
            <given-names>Interaction</given-names>
          </string-name>
          ,
          <volume>12</volume>
          (
          <issue>4</issue>
          ):
          <volume>331</volume>
          {
          <fpage>370</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Vahedian</surname>
          </string-name>
          .
          <article-title>Social web recommendation using metapaths</article-title>
          .
          <source>In RSWeb@RecSys</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vahedian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          .
          <article-title>Hybrid recommendation in heterogeneous networks</article-title>
          .
          <source>In Proceedings of the 22nd Conference on User Modeling, Adaptation and Personalization</source>
          ,
          <year>2014</year>
          , to appear.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Desrosiers</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          .
          <article-title>A comprehensive survey of neighborhood-based recommendation methods</article-title>
          . In F. Ricci,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          , and P. B. Kantor, editors,
          <source>Recommender Systems Handbook</source>
          , pages
          <volume>107</volume>
          {
          <fpage>144</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gemmell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schimoler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Recommendation by example in social annotation systems</article-title>
          .
          <source>E-Commerce and Web Technologies</source>
          , pages
          <volume>209</volume>
          {
          <fpage>220</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gemmell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schimoler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Tag-based resource recommendation in social annotation applications</article-title>
          .
          <source>User Modeling, Adaption and Personalization</source>
          , pages
          <volume>111</volume>
          {
          <fpage>122</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gemmell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schimoler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Resource recommendation in social annotation systems: A linear-weighted hybrid approach</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>78</volume>
          (
          <issue>4</issue>
          ):
          <volume>1160</volume>
          {
          <fpage>1174</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          .
          <article-title>Particle swarm optimization</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Neural Networks</source>
          , pages
          <year>1942</year>
          {
          <year>1948</year>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          .
          <source>Mining Heterogeneous Information Networks: Principles and Methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery</source>
          . Morgan &amp; Claypool Publishers,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>