<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Graph Embeddings for Movie Visualization and Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michail Vlachos IBM Research - Zurich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Switzerland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Daniel Svonava Slovak University of Technology</institution>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <volume>13</volume>
      <issue>2012</issue>
      <fpage>56</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>In this work we showcase how graph-embeddings can be used as a movie visualization and recommendation interface. The proposed low-dimensional embedding carefully preserves both local and global graph connectivity structure. The approach additionally offers: a) recommendations based on a pivot movie, b) interactive deep graph exploration of the movie connectivity graph, c) automatic movie trailer retrieval.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>As we are moving gradually from the era of information
to the era of recommendation, interactive interfaces that
engage the users and let them easily discover the data of
interest will be of increasing importance. In this work we
explore how graph-embedding techniques can be used as the
basis of an interactive recommendation engine for movies.</p>
      <p>
        Several recommendation systems have appeared in the
literature in the recent years for recommending videos [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ]
or movies [
        <xref ref-type="bibr" rid="ref10 ref2 ref6 ref8">2, 6, 8, 10</xref>
        ]. These, however, rarely focus on
visually driven interfaces. In our scenario, we use a
movieactor database as the underlying graph structure. We use
textual features to describe the movie objects. Given a
selected pivot movie the system can retrieve a set of similar
movies which are portrayed and clustered on two
dimensions. Our method presents a novel way of capturing both
neighborhood and cluster structure. Neighborhood
information is preserved by retaining the Minimum Spanning Tree
(MST) structure on two-dimensions. This also partially
preserves the global graph structure, as the MST represents the
dataset ‘backbone’. In addition to the neighborhood
structure, the method also retains the cluster structure which can
be visualized at different granularities. Finally, the proposed
mapping can accommodate both metric and non-metric
distance functions.
      </p>
      <p>
        Data-embedding techniques have been extensively used
for visualizing high-dimensional data. Examples include the
Bourgain embedding [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], FastMap [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], BoostMap [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
      </p>
      <p>
        ISOMAP [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The goal of those projection methods is
to preserve all distances approximately, while our approach
preserves a subset of distances (spanning-tree distances)
exactly.
      </p>
      <p>We use the proposed graph embedding as the entry point
for a movie recommendation interface. Our methodology
allows the exploratory visualization of the movie graph space
and incorporates additional capabilities, such a filtering of
the graph based on various criteria, real-time graph
exploration and automatic retrieval of the movie trailer from the
Internet.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>DESCRIPTION OF OUR APPROACH</title>
      <p>The proposed method combines the visual simplicity and
comprehensibility of the minimum spanning tree (MST) with
the grouping properties of hierarchical clustering approaches
(dendrograms). Given pairwise distances describing the
relationship of high-dimensional objects, the objective is to
preserve a subset of important distances as well as
possible on 2D, while at the same time accurately conveying the
cluster information.
! " #</p>
      <p>Our approach works as follows. First, we construct a
Minimum Spanning Tree (MST) layout on the 2D plane in such
a way that all distances to a user-selected pivot point and
the neighborhood distances on the MST are exactly
preserved. This construction carefully considers how to best
portray the original object relationships for either metric or
non-metric distance functions. Secondly, the dendrogram
cluster hierarchy is constructed so that it can be positioned
exactly on top of the MST mapping of objects. The cluster
hierarchy can be frozen at any resolution level (tomographic
view) to convey the multi-granular clusters that are formed.
This concept is elucidated in Figure 1: objects are properly
mapped on the 2D plane whereas on the third dimension
the hierarchy of the clustering structure is portrayed.
Naturally, this is only a conceptual illustration of our approach.
In practice, cluster information is also projected onto two
dimensions, e.g., by properly coloring the nodes belonging
to the same cluster. Therefore, by ‘cutting’ the dendrogram
derived on a user-defined level, clusters on 2D can be formed,
expanded and contracted appropriately, as the user drills up
or down on the cluster hierarchy.</p>
      <p>
        The work presented here constitutes an demo prototype
of hhe visualization technique presented in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Here we
explore in more detail how the proposed high-dimensional
data embedding methodology can be used as the interface
of a movie search engine. In Table 1 we present briefly the
differences of our approach with respect to prevalent movie
search and recommendation systems.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Neighborhood Preservation</title>
      <p>
        We will first explain how to capture on two dimensions the
relationship between a set of high-dimensional objects. As
not all pairwise distances can be retained on two dimensions
we choose to maintain, as well as possible, the spanning tree
distances which partially capture the local relationships and
also record information about the general global structure
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>We begin by constructing the spanning tree on the
original high-dimensional objects. One object is selected as pivot
and mapped in the center of the 2D plane coordinate
system. By traversing the spanning tree, objects are positioned
on the 2D plane by triangulating the distances to two
objects: the pivot object and the neighboring point previously
mapped on the spanning tree.
Example: Suppose the first two points (A and B) of the
MST have already been mapped, as shown in Fig. 2(a).
Let’s assume that the second distance preserved per object
is the distance to a reference point, which in our case is the
first point. The third point is mapped at the intersection
of circles centered at the reference points. The circles are
centered at A and B with radii of d(A, C) – the distance
between points A and C– and d(B, C), respectively. Owing
to the triangle inequality, the circles either intersect in two
positions or are tangent. Any position on the intersection of
the circles will retain the original distances towards the two
reference points. The position of point C is shown in Fig.
2(a). The fourth point is mapped at the intersection of the
circles centered at A and C (Fig. 2(b)) and the fifth point
is mapped similarly (Fig. 2(c)). The process continues until
all the points of the ST are positioned on the 2D plane and
the final result is shown in Fig. 2 (d).</p>
      <p>The mapping technique presented will retain exactly the
distances between all points and the pivot sequence, and
also between the nodes that lie at the edges of the spanning
tree. This creates a powerful visualization technique that
not only allows to preserve nearest neighbor distances (local
structure), but in addition retains distances with respect to
a single reference point, providing the option of a global data
view using that object as a pivot.</p>
      <p>Layout optimization using simulated annealing: For
reaching maximum visual clarity we try to minimize the
number of intersected graph edges. Recall that when
triangulating the position of a third point to its neighbor and
the pivot, the algorithm proceeds by identifying the
intersection between two circles. One can readily see this in Figure
3; there are two positions in which a newly mapped point
can be placed.</p>
      <p>
        We employ a probabilistic global optimization technique
based on simulated annealing (SA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that intelligently
selects which of the two mapping positions to use, so as to
minimize the number of crossed edges. SA is an effective
optimization method when the search space is discrete, which
is exactly the situation we face. Our experiments on the
movie graph database suggest that the simulated annealing
process is very effective in reaching an improved layout.
Using Non-metric Distances: When the underlying
distance measure obeys the triangle inequality the circles around
the reference points are guaranteed to intersect. However,
many widely used distance functions (e.g., dynamic
warping, longest common subsequence) violate the triangle
inequality, and thus the corresponding reference circles may
not necessarily intersect. In such cases, one needs to
identify the position where to place an object with respect to
the two circles, in such a way that the object is mapped as
close as possible to the circumference of both circles. We
need to identify the locus of points that minimize the sum
of distances to the perimeters of two circles. One can show
that the desired locus always lies on the line connecting the
centers of the two circles. An example is shown in Fig. 4.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Cluster Preservation</title>
      <p>L</p>
      <p>A1</p>
      <p>C</p>
      <p>A2</p>
      <p>Now we turn our attention to capturing and conveying
the cluster information on 2D. Recall that the input for the
algorithm is a matrix of pairwise distances. Based on the
pairwise distances given, one can build a hierarchical
dendrogram. The dendrogram construction is based on a single
linkage approach that merges closest singleton objects and
clusters.</p>
      <p>Given the minimum spanning tree, the clustering process
can be sped up, because the merging order of the MST
algorithm is the same as the merging order of the single linkage
hierarchical clustering approach. In addition, this reflects
the fact that one can achieve an exact registration of the
constructed hierarchy on top of the MST-mapped points,
because the clustering order is the same as the order
crystallized in the spanning-tree mapping. This can be verified
in Fig. 1 where the single linkage dendrogram is positioned
exactly on top of the spanning-tree mapping.</p>
      <p>The above observation combined with the hierarchical
cluster information provides the capacity for multi-granular
cluster views on 2D by interactively setting variable threshold
levels on the resulting dendrogram. In this way, a
‘tomographic view’ of the clusters with formation of variable size
clusters is possible. The concept is illustrated in Fig. 5:
by cutting the dendrogram at a lower threshold, six
clusters are created on the left. Imposing a higher threshold,
clusters are merged progressively as shown on the right side.
Our prototype implementation conveys cluster information
by coloring the node perimeters and the connected edges.
3.</p>
    </sec>
    <sec id="sec-5">
      <title>GRAPHICAL INTERFACE</title>
      <p>On top of the proposed visualization methodology we have
built a movie recommendation engine that allows the
interactive exploration of a large movie graph. Our sample
database consists of 125,000 movies and 955,000 actors. We
augment the information on each movie by attaching
additional unstructured information from the web. So, each
movie is described by a bag-of-words pertinent to the genre,
actors, director(s), language, and a set of keywords relevant
to the plot. To evaluate movie similarities we follow an
IRdriven approach by considering the cosine similarity between
the bag-of-words. More complex functions can be also
accommodated, however this approach already gave very
satisfactory and intuitive results. So, similarity between movies
is based solely on the content of the movie rather than on
any collaborative features (or ratings). We follow this path
in an effort to introduce a factor of serendipity in the
recommendation process. Like this, more obscure movies can
appear in the recommendation if they share similar content
(e.g. plot or mood) with the one that the user selected. In
general, we have noticed that movie recommender systems
that consider collaborative features (e.g. users that have
rated highly movie A have also rated highly movie B) tend
to have a strong bias toward blockbuster movies that most
users (typically) have already watched.</p>
      <p>GUI and Functionalities: The data visualization is
accommodated through a graphical web interface, shown in
Figure 6. The interface allows the user to search for a movie,
and subsequently displays the proposed visualization graph,
placing the movie selected as the center (pivot) object. The
user can then easily identify other relevant movies, with the
option to retrieve detailed information about the movie, such
as participating actors or the movie plot. The user can
modify the number of displayed movie clusters or even watch the
movie trailer. The application allows the exploration of both
sides of the movie-actor bipartite graph: either by deeper
exploring of the movie graph (by clicking on a movie) or by
searching/filtering for movies of a particular actor (by
clicking on an actor’s image). Additional filtering functionalities
include: a) Filtering by rating, e.g., when interested in
retrieving movies with a rating &gt; Y . b) Filtering by year,
when the user is interested only in recent movie releases.</p>
      <p>Below we provide specific examples of our mapping,
highlighting its ability to be used as an effective and interactive
movie recommendation system. Number of clusters can be
interactively modified and clusters are conveyed using
varying border and edge colors.</p>
      <p>Examples: Selecting ‘The Titanic’ as pivot movie produces
the 2D mapping shown on the left-hand side of Fig. 7.
The user can navigate through the graph and identify
similar movies. Clustered with Titanic are the movies:
‘Titanic (1953)’, ‘Poseidon’ and ‘Shakespeare in Love’.
Another cluster displayed, shown in light green, includes movies
like: ‘The Notebook’, ‘Atonement’, ‘Purple Rain’; all
romantic drama movies. Similarly, selecting ‘Star Wars’ as
pivot movie correctly packs closely the remaining Star Wars
movies (Fig. 7, right-hand side). An adjacent cluster
includes parodies of Star Wars, like ‘Spaceballs’ or ‘Thumb
Wars’. Other related movies include adventure films like
the ‘Lord of the Rings’ trilogy, or sci-fi action thrillers like
‘Aliens’ and ‘Star Trek’. Additional illustrative examples
can be found in the provided video.
Benefits of simulated annealing (SA): The proposed SA
component probes multiple node placement configurations,
and picks the one that minimizes the number of intersected
edges. To measure its effectiveness we select 1000 movies
at random and retrieve the k nearest neighbors. Table 2
reports the median number of edge intersections with and
without the SA component. We observe that the SA
implementation significantly reduces the edge intersections and
hence the screen clutter, providing a more intelligent node
placement. Note, that both variations provide the same
distance preservation with respect to the pivot and the MST
distances.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Athitsos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sclaroff</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Kollios. BoostMap</surname>
          </string-name>
          :
          <article-title>An Embedding Method for Efficient Nearest Neighbor Retrieval</article-title>
          .
          <source>In IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>30</volume>
          (
          <issue>1</issue>
          ), pages
          <fpage>89</fpage>
          -
          <lpage>104</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bennett</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Lanning</surname>
          </string-name>
          .
          <article-title>The Netflix Prize</article-title>
          .
          <source>In Proc. of KDD Cup and Workshop</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bourgain</surname>
          </string-name>
          .
          <article-title>On Lipschitz embeddings of finite metric spaces in Hilbert space</article-title>
          .
          <source>In Israel Journal of Mathematics</source>
          ,
          <volume>52</volume>
          , pages
          <fpage>46</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cerny</surname>
          </string-name>
          .
          <article-title>A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm</article-title>
          .
          <source>In J. of Opt. Theory and Applications</source>
          <volume>45</volume>
          , pages
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Davidson</surname>
          </string-name>
          et al.
          <article-title>The YouTube Video Recommendation System</article-title>
          .
          <source>In Proc. of ACM Recommender Systems</source>
          , pages
          <fpage>293</fpage>
          -
          <lpage>296</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Eyjolfsdottir</surname>
          </string-name>
          , G. Tilak, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>MovieGEN: A Movie Recommendation System</article-title>
          .
          <source>In UC Santa Barbara: Technical Report</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          and
          <string-name>
            <surname>K. I. Lin.</surname>
          </string-name>
          <article-title>FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets</article-title>
          .
          <source>In Proc. of SIGMOD</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gruvstad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          .
          <article-title>Shiniphy - Visual Data Mining of movie recommendations</article-title>
          . In Stanford University:
          <source>Technical Report</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>VideoReach: an online video recommendation system</article-title>
          .
          <source>In Proc. of SIGIR</source>
          , pages
          <fpage>767</fpage>
          -
          <lpage>768</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B. N.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Albert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          . MovieLens Unplugged:
          <article-title>Experiences with an Occasionally Connected Recommender System</article-title>
          .
          <source>In Proc. of IUI</source>
          , pages
          <fpage>263</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Svonava</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          .
          <article-title>Graph Visualization using Minimum Spanning Dendrograms</article-title>
          .
          <source>In IEEE International Conference on Data Mining</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>J. B. Tenenbaum</surname>
            , V. de Silva, and
            <given-names>J. C.</given-names>
          </string-name>
          <string-name>
            <surname>Langford</surname>
          </string-name>
          .
          <article-title>A global geometric framework for nonlinear dimensionality reduction</article-title>
          .
          <source>Science</source>
          <volume>290</volume>
          , pages
          <fpage>2319</fpage>
          -
          <lpage>2323</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>