=Paper= {{Paper |id=None |storemode=property |title=TopicLens: An Interactive Recommender System based on Topical and Social Connections |pdfUrl=https://ceur-ws.org/Vol-891/interfacers12_submission_5.pdf |volume=Vol-891 }} ==TopicLens: An Interactive Recommender System based on Topical and Social Connections== https://ceur-ws.org/Vol-891/interfacers12_submission_5.pdf
TopicLens: An Interactive Recommender System based on
            Topical and Social Connections

                  Laura Devendorf                              John O’Donovan                               Tobias Höllerer
                University of California                     University of California                    University of California
                      Berkeley                                   Santa Barbara                               Santa Barbara
                      CA 94720                                  CA 93106, USA                               CA 93106, USA
           ldevendorf@berkeley.edu                            jod@cs.ucsb.edu                             holl@cs.ucsb.edu

ABSTRACT
This paper describes TopicLens, an interactive tool for exploring
and recommending items within large corpora, based on both so-
cial metadata and topical associations. The system uses a hybrid
visualization model that represents topics and content items side by
side, allowing the user to actively explore recommendations rather
than passively viewing them. The approach provides insight into
the composition of relevant topics as they relate to the meta-data
of underlying texts. We describe a novel approach to sorting and
filtering, which can be topic or document-driven, and two novel in-
teraction styles termed “view inversion” and “human-review”, each
of which enable novel perspectives on topic modeled sets of doc-
uments. To evaluate the system, three use cases are presented to
highlight interesting insights across three different data sets using
our novel recommendation interface.

1.    INTRODUCTION
   Recommender systems attempt to ease the information overload
problem by providing the right information to the right person at
the right time [31, 19, 33]. However, presentation mechanisms for
these systems are becoming increasingly important, as they are ap-               Figure 1: A snapshot of TopicLens interactively recommending
plied to increasingly more diverse data on the social web. For ex-               movies from Facebook API. The segment shows the popularity
ample, Herlocker’s early experiments on the value of explaining                  of each item among friend groups on the outer ring, and high-
recommendations [19] have informed and influenced many of to-                     lights recommendations in bold on the inner ring. This view is
day’s recommender system designs. Tintarev and Masthoff [38]                     highly dynamic and changes based on mouseover interactions
survey the role of explanation as an integral part of the recom-
mendation process and outline seven distinct advantages of pro-
viding explanation. More recent efforts to analyse the effect of “in-
spectability and control” [21], interactive visual feedback [6], and             large data set. Successful visualizations are especially effective at
dynamic critiquing [30, 10] clearly show that the interface compo-               highlighting patterns within high dimensional data. Such visual-
nents play an important role in a user’s acceptance and overall trust            izations may also allow the user to navigate and dynamically filter
in a recommendation.                                                             information in order to extract specific and relevant items. Example
   In this paper we focus on one specific interface design (Figure 1              use cases are:
for exploration of recommendations which have been derived from                     • To augment the users ability, beyond keyword based search
a topic modeling algorithm. Topic modeling is a statistical method                    and navigation, to discover topical composition and inter-
for extracting relevant topics from a large corpus of text. Visual-                   relationships in texts (i.e. recommendation via topic asso-
ization of connections formed through topic modeling can enable                       ciations).
users to quickly identify trends and other insightful details from a
                                                                                    • To highlight popular trends and conversations within social
                                                                                      networks.
                                                                                    • To compare bodies of text, visually exploring similarities,
                                                                                      differences and patterns in the underlying texts for better per-
Interface@RecSys’12, September 13, 2012, Dublin, Ireland. Paper pre-
                                                                                      sonalized result sets.
sented at the Workshop on Interfaces for Recommender Systems 2012,               The focus of this paper is largely on the UI design and on novel
in conjunction with the 6th ACM conference on Recommender Systems.               interaction techniques to represent connections formed over large
Copyright  c 2012 for the individual papers by the papers’ authors. This
volume is published and copyrighted by its editors.                              text datasets using topic modeling or other automated text analysis
.                                                                                algorithms. The key elements in our visual representations include:


                                                                            1




                                                                            41
                                                                                       Mathematical Theory     theorem lemma proof follow constant bound exist definition
     • Recommendable Item: An abstract entity which can translate                      Software Engineering    software process tool project development design system developer
                                                                                         Gene Expression       protein genes expression network motif interaction pathway genome
       to either a text document or a user within a social network.                     Politics and Society   political social policy economic china law government national
       These are conceptually grouped because they are both repre-                       Business and IT       business firm services customer technology management market product
                                                                                         Fluid Dynamics        flow velocity wall fluid turbulence reynold pressure channel
       sented by collections of terms. For example, in the Twitter
       data set, a user is represented as a collection of Tweets.
     • Topic: Multinomial distributions over a set of terms, which               Table 1: Examples of LDA topics learned on a corpus of re-
       can be associated with content items.                                     search papers

   While established representations, such as word clouds and tree
maps [35] can be useful for visualizing frequency in topic-item re-              proactive, query-based solutions in the fields of search [12] and
lationships, we describe a model that also preserves and represents              reactive or filter-based approaches in the field of recommendation
relationships at the meta-data level. This allows users not only to              [20, 7]. In the context of this work, we are especially interested
see which topics arise, but also how they arose and under what con-              in approaches that employ visual and interactive methods to tailor
ditions. The approach enables more informed reasoning about doc-                 an information space to a user’s individual needs. The novel ap-
uments a user wishes to investigate, while highlighting trends over              proach presented in this paper employs a statistical method known
a number of different types of networks with respect to a particular             as Latent Dirichlet Analysis (LDA) or “topic modeling” [4, 3] to
investigation.                                                                   discover useful linkages between documents upon which visual-
   Microsoft’s "Twahpic” [29] approach to visualizing topics in con-             izations are built.
junction with meta-data leverages a composite view that optimizes                   While there has been a significant amount of research in this
its visualization strategy for each different facet of the data. This            domain from a variety of perspectives, from early approaches such
strategy is effective for illustrating and highlighting the multifaceted         as [27, 40, 18] to more recent work in [36, 37, 39, 22, 25], visual
nature of the data, but is difficult to navigate due to the separation            techniques for exploring large sets of documents have not yet been
of each frame and the segregation of the data networks. In short, the            widely adopted.
interaction model helps a user form impressions of the data rather
than supporting investigations into the data.                                    2.1      Topic Modeling
   Work by Cao et al. in [8] shows a benefit of using multiple ap-                   LDA or “topic modeling” is a statistical technique introduced by
proaches to visualizing the different facets of the data, and in this            Blei et al. [4] that computes focused probability distributions over
paper, we will present a model that takes a hybrid approach rather               the words in a set of documents. The algorithm functions by map-
than a segregated approach in order to facilitate navigation and in-             ping documents onto a smaller number of “topics”. In this sense, a
teraction with the data. The key features of the proposed technique              topic consists of a multinomial distribution over words or stemmed
are as follows:                                                                  terms in a document set. For example, as p(w|t), for t ∈ 1 . . . T ,
                                                                                 where T is the number of topics [4, 16]. In many cases, topics are
     • Presents a choice of view modes, sorting parameters and con-
                                                                                 displayed as a list of the top n words with the highest probability
       trols for navigation and dynamic filtering.
                                                                                 in the set. Table 1 from [15] shows some example topics produced
     • Enables a user to filter topics in relation to the pre-existing            by an LDA algorithm. In this case, the words “theorem, lemma,
       networks in the data.                                                     proof, follow, constant...” seem to relate to the topic “Mathematical
                                                                                 Theory”. Recent research in [9, 26] has shown that although LDA
     • Allows for human oversight of algorithmically generated re-               topics can be misinterpreted, they are generally well understood by
       sults.                                                                    users. Techniques for the automatic labeling of topics have been
     • Enables exploration of dataset as a map, traversing and iso-              presented in [23].
       lating regions of particular interest in order to extract relevant           In TopicLens, topics are leveraged to form associations among
       items.                                                                    items in a large corpus, and these associations are used to produce
                                                                                 informative and highly flexible representations of the broader con-
     • Caters to diverse topic modeling scenarios, including addi-               tent item space, using novel layout and interaction techniques. Be-
       tional data such as social and information networks.                      fore describing our approach to visualizing a topic space, we now
                                                                                 present a discussion of existing approaches to visualization of large
   In the remaining sections, we will discuss the related research               document sets.
and provide a brief background of topic modeling before describ-                    Many approaches in the literature dealing with the representation
ing in detail the design decisions made when developing the Topi-                of large text collections, ranging from traditional static representa-
cLens interface. The design decisions include those related to over-             tions, e.g. [18], to more recent and highly interactive representa-
all structure and the mapping of formal elements to relational infor-            tions which use advanced methods to relate documents together,
mation. Novel aspects of the interface are also discussed, particu-              e.g. [?]. They can rely on pre-existing meta data, or can com-
larly new techniques that we have termed view inversion and                pute relations on the fly. In this paper, we present a novel interac-
human review. We will then present three applications of the               tive design and layout for exploring topic based and social network
system, one of which uses data that does not contain topic-based                 relations in large document sets. Before presenting the prototype
relations, thus highlighting a more generalized application of the               system in detail, the following section provides a brief account of
design.                                                                          the design choices for using a combination of river and graph-like
                                                                                 visual representations in the system.
2.     RELATED WORK
   Due to the proliferation of data available on the web, there is               2.2      The Need for a Hybrid Model
an increasing need for better techniques for exploration of large                   As shown in Figure 2, we are supporting exploration of multi-
amounts of text data. This is commonly known as addressing an in-                faceted data in a variety of ways. Specifically, examples are demon-
formation overload problem [20]. Ongoing research has produced                   strated on three different network types: social network data with


                                                                            2



                                                                            42
Figure 2: Two detail views of the TopicLens visualization, each showing connections between items and related topics. In this case
the data is from the Twitter social network, so our generic “items” represent Twitter users. Frequency measures are shown on the
outer river-like component. The two views are of the same data, with items and topics inverted.



unidirectional edges (followers and followees) from Twitter; aug-             great measures to ensure clarity and consistency across multiple
mented with topic relations, and a topic modeled network of news              view modes. In our informal tests and observations of interactions
articles from the New York Times; and social network data with bi-            with the system, we have found it easy to learn and that users take
directional connections from Facebook. Across all examples, the               quickly to the dynamic filtering and sorting tools we provide.
goal is to use simple interaction and novel layouts to facilitate user
comprehension of complex data, particularly to communicate the                4.    VISUALIZATION DESIGN
“credibility’ factor of peers in a network with respect to particular            The most prominent feature of the visualization, shown in Figure
topics of interest. This complexity would be inherently difficult to           2, is its use of the wheel to structure information. Using a circu-
communicate with a single visualization technique such as a river             lar structure allows us accommodate variability in the size of the
or graph visualization. Accordingly, we have opted for a hybrid               datasets. The wheel dynamically expands to fit the data and con-
approach which uses a graph-like mechanism similar to TopicNets               tracts upon filtering. Zooming and font size are adjusted in order to
[15] for highlighting relations between document and topic nodes,             keep information present within the visualization space, regardless
and a river-like view similar to ThemeRiver [17] overlaid to com-             of how much there is to display.
municate frequency or “credibility” of different sets of peers within            The visualization is designed to fit within a rectangular window
the context of a topic selection. This approach has been successful           with width larger than height. The exact dimensions can vary and in
in applications such as Freire’s ManyNets [13].                               our examples, we found it most effective to use a full screen view
                                                                              on a high-resolution display (1280x1024 and higher), especially
                                                                              when dealing with large sets. The left side of the screen contains
3.    DESIGN CONSIDERATIONS                                                   the controls and legends and the wheel rotates on an axis in the
   At the core, the TopicLens interface seeks to empower the user             center of the screen. A static camera is also positioned at the center
to explore a large datasets based on a number of factors. We de-              of the screen, allowing the user to zoom in towards and away from
signed the interface with the idea that potential users would benefit          the center. The river is positioned along the outer edge of the wheel
most from learning and engaging in the system rather than making              and protrudes in different directions depending on the current data
sense of the data at a glance. We see applications of our system              selection.
being beneficial for any researcher who is looking to glean insights
into a large body of text. This includes analysts of social networks          4.1    Organization
as well as scholars in the humanities who may want to use Topi-                  In order to support the user in exploring the data at varying levels
cLens to explore trends in the bodies of work by a single author              of detail, the organization of the visualization needs to clearly dis-
or works belonging to a single or set of genres. We provide func-             tinguish the different relationships that are represented. We classify
tionality with the goal of avoiding a crowded interface and we took           those relationships into three types: primary, secondary and ternary.


                                                                         3



                                                                         43
The data we collect has pre-existing relationships as formed through           pending on the underlying dataset, visual features may be turned
meta-data (primary relations), the topic modeling algorithm pro-               on and off in order to keep the visual complexity to a minimum.
vides information about relationships between items and topics (sec-              At the root, our information display consists of two basic entities:
ondary relationships), and we found it helpful to further analyze              topics and content items. Items are mapped to circles and topics are
the topics in relation to items and item meta-data (ternary rela-              mapped to rectangles with the text label of the topic in the center.
tionships). By dividing the wheel into three concentric regions,               We made these entities distinct in order to visually and conceptually
we were able to map each type relationship to its own location                 separate them. The topic text is always visible but the item text is
on the wheel. As you travel from the center out, the information               only present on demand. Similarly, the circular shape of the item is
represented reflects a increasing number of factors. The wheel,                 always visible but the rectangular shape of the topic is only visible
combined with zooming, was intended to give the user the idea                  on selection.
that zooming out will provide them with a big picture, birds-eye                  Color is used to visually group items based on meta-data. For
overview of the data and zooming in closer will focus on the finer              instance, if there is meta-information about item categories, each
detailed relationships. The following paragraphs provide a detailed            category type would map to a unique color. This mapping was cho-
explanation of the relationship types and the regions they map to.             sen partly because it enables a quick visual grouping of items and
                                                                               extends to a large number of categorizations. Another reason for
4.1.1     Primary Relations: Center                                            choosing color, was its ability to support a visual connection be-
   Primary relations are formed though associations in item meta-              tween the meta-data of the individual item and the corresponding
data. In the analysis of Twitter networks, a single item represents            meta-data represented in the river. This offers the user two levels
a Twitter user. Item meta-data includes, but is not limited to, a list         of understanding by illustrating how the meta information is con-
of followers of this user and a list of other Twitter users that this          nected to the item as well as the topic.
user is following. In the case of topic modeling run over New York                Opacity is used to illustrate secondary relationships, relation-
Times articles, primary relationships would be formed between two              ships between topics and content items. These relationships occur
or more articles that share the same author formed by two articles.            with a probability specified by the LDA algorithm. Opacity is an
Primary relationships are mapped to the center so these relation-              effective means of illustrating these connections as it indicates rel-
ships can be viewed in a local space. Figure 2 shows primary rela-             ative strength. Darker nodes have strong probabilities of relation,
tions through coloring in the view on the right. In the view on the            lighter have weaker ones. If a node is unrelated, it is removed from
left, topics are featured in the center. Since primary relations don’t         the space. Secondary relations are highlighted upon interaction as
exist within topics, no explicit color mapping is represented.                 the user must specify a single item or topic in order to view its con-
                                                                               nections. If multiple items or topics are selected, then the opacity
4.1.2     Secondary Relations: Center & Inner Ring                             value is determined by the average probability from all nodes in the
   Secondary relationships occur as a result of the topic modeling             selected set.
and define the relationships between topic and item nodes. Each                    Position and order are used in conjunction to highlight patterns in
of these relationships occurs with a given probability as defined by            the data. Patterns are exposed by using the ordering of the items or
the LDA algorithm. These relationships as well as their respective             topics on the inner ring to position the items or topics in the center.
probabilities are represented by interactions between the center and           Each value begins in the center of the circle and is pulled towards
inner ring. While the nodes in the center are not bound to any axis            all of its related nodes in the inner ring. The strength of attraction
or predetermined path, the nodes in the inner ring are equidistantly           depends on the probability of the connection between the item and
laid out in a circle. This is primarily because the inner ring also            the topic. The result is a spatial grouping of items or topics that
functions as the axis points for the river visualization but also rein-        share similar relationships. A number of interaction techniques for
forces simplicity by defining only one type of data to be related spa-          positioning items on the inner ring will be discussed in the follow-
tially. On the left side of Figure 2, highlighting Wikileaks changed           ing sections.
the opacity of the nodes on the inner ring in order to indicate how               Size is used to illustrate measures of numerical magnitude such
related each item is to this topic. On the right, highlighting User            as frequency or number of relations. Similar to position and order-
16 changed the opacity of the topics in the inner ring, similarly              ing, some mappings of attributes to size can be more informative
showing the strength of the connection.                                        than others. For this reason, we allow the user to indentify the node
                                                                               attribute that determines node size.
4.1.3     Ternary Relations: Outer Ring / River
   Ternary relationships are formed between the topic modeled re-              5.    IMPLEMENTATION
sults and the meta-information of the items related to those results.             This visualization evolved through a number of design iterations.
Using the river visualization to graph these relationships allows us           Using Processing to program the design and interaction allowed us
to see an overall frequency of the node in addition to the meta-               to easily explore changes in the design and instantly see the re-
information frequencies within the same space. Depending on the                sults. The Processing framework also made it simple to program
data and filtering, the river model can be customized to show any               animations and transitions between states. A number of libraries
particular facet of the meta-information. Figure 2 is showing av-              were used to extend the scope and flexibility ofProcessing. The
erage probabilities over each facet of item meta-data in relation to           PeasyCam library provided the basic virtual viewpoint control, the
the selected item. The colors in the river match the colors of the             ControlP5 library was used to implement text boxes, range sliders
meta-data in the center, reinforcing this relationship.                        and list boxes and an OpenGL library was used to add custom func-
                                                                               tionality into the system such as smoothing and alpha blending.
4.2     Visual Mappings                                                           The TopicLens application creates node and edge objects by pars-
   Because the TopicLens visualization needs to encode a rich va-              ing configuration and data files on load. During the execution of the
riety of data, we took care to make the visual encoding of different           program, nodes and edge objects are referenced in order to create
relationships and concepts distinct. In order to maintain simplicity           dynamic links. Links are the elements that are drawn to the can-
we map objects and relationships to specific formal elements. De-               vas and much of the code is devoted to maintaining those links and


                                                                          4



                                                                          44
Figure 3: TopicLens view details for varying data sets. Left: TopicLens view of news articles from the New York Times showing
topics on the inner ring and articles/items in the center. The view shows selection of an individual topic (peak oil) and edges linking
to related articles. Right: TopicLens view of a Facebook social network showing an individual’s friend network and associated item
preferences.



dynamically updating their values to indicate relationships. The              represent probabilities over the meta data. In this example, the user
smooth transitions were created using an integrator class that allows         meta data contains a list of other users following this user and a list
the user to specify characteristics such as mass, position, damping           of the users this user is following. One type of ternary relationship
and attraction. When a link targets a given position, the integrator          in this example is the relationship between a topic and the average
dynamically updates its position depending on its physical charac-            probability that the users friends are discussing that topic.
teristics.                                                                       As noted in Section 4.1, ternary relations are mapped to the river
                                                                              and the nodes in the inner ring form the axis points. The river
                                                                              displays specific information depending on the organization of the
6.    USE CASES AND DISCUSSION                                                nodes within the space. This approach affords the user an opportu-
   In order to showcase the flexible applicability of our visual model,        nity to uncover potentially interesting relations in the following 6
we present three use cases that explore different dynamic datasets.           view configurations.
Each use-case will discuss the design decision made to cater to the              With topic nodes in the center, and user nodes on the outer ring:
specific data domain as well as a usage scenario to illustrate its po-
tential for a variety of applications.                                           • Upon selection of an individual user, the river view shows
                                                                                   that user, their friends and their followers’ probabilistic as-
6.1    Recommending Credible Information In                                        sociation with each topic on the outer ring.
       Twitter                                                                   • When no user is selected, the river shows the average proba-
   Preserving social network relations in topic modeled systems                    bility for each topic across all users.
allows us to glean insights into the networks and salient topics
therein. This example is catered specifically as an attempt to vi-                • When a topic is selected, the river shows each user’s associ-
sualize credibility in Twitter networks. Our definition of “credibil-               ation with that topic.
ity” relates to the probability by which a user is connected with a              With user/item nodes in the center, and topic nodes on the outer
particular topic, based on LDA analysis over a bag of words rep-              ring:
resentation of all of that user’s tweets. In analyzing credibility, we        3
also examine that user’s followers and followees and their respec-                • When a topic is selected, the user’s friends and follower’s
tive associations with the given topic.                                             opacity is varied to represent association with that topic.
   In this visualization, which is represented earlier in Figure 2,              • When a user is selected, their association with each topic on
each topic node contains a label that represents the list of words                 the outer ring is shown in the river view.
in a mined topic. Primary relationships are formed between a user
and their followers and followees. Secondary relationships occur                 • When no user is selected, the probability of each topic in the
between users and extracted keywords and the ternary relationships                 global space is shown on the outer ring


                                                                         5


                                                                         45
   For this scenario, the river represents three probabilities for each         lated documents. Lighter lines then extend from each of those re-
node, the average probability of the user using the topic, the average          lated documents to all of the other topics they are related to. This
probability of the user’s followers using the topic and the average             conveys information to the user about other topics related to their
probability that the people following this user are using the topic.            selection. The user is able to specifically locate the documents that
Since topics are represented along the inner ring, this information             contributed to this relationship by following the lines or selecting
is available for every topic. Each of the probabilities is represented          multiple topics and browsing the filtered document space.
on the river, using color matching to indicate the group or single                  The lines are particularly useful for illustrating how two topics
user it applies to. To further explain what the river is visualizing,           are related to each other and upon what criteria. This is helpful
a legend on the bottom left of the interface dynamically updates,               when browsing for articles associated with a given theme. Let’s say
explaining the current model. In this case mode visibility is of par-           a researcher is looking for references on "peak oil." Searching for
ticular importance as the river maps different values through the               and selecting "peak oil" from the space would show the researcher
life of the visualization.                                                      other related topics as well as articles specifically contain the re-
   When a Twitter user is highlighted in the space, interactions take           lation. If one of the related articles contains a topic that is also
place at each of the three levels. In the center, the primary relation-         of interest to him or presents a particularly interesting comparison,
ships are presented through colors. All users who don’t belong to               he or she can easily isolate and obtain information about the arti-
this user’s network are removed and the remaining users are color               cles containing both topics by filtering the space and hovering over
coordinated to indicate whether they are a follower of the selected             the document node, revealing information about the article such as
user, or someone the selected user is following. Spatially, each user           title, date and author.
is attracted to the topic nodes in the inner ring by the positioning al-            TopicLens could also be used visualize trends associated with a
gorithm mentioned above. Topics related to the selected node vary               subset of articles. Say a user read a few articles in the Times Opin-
by opacity in order to indicate the strength of connection.                     ion section and they would like to find other articles about similar
   The probability mappings were specifically designed to investi-               and related subjects. This could be accomplished by typing each
gate credibility or trust. The top left of Figure 2 shows a network             article name into the search field. This would in turn select each
with two people selected. All of the nodes in the set represent both            of the corresponding articles in the space and illustrate the topics
of the selected people’s networks. On the outer river, one can see              associated with them. In order to remove outliers, they would ad-
the probability distributes for this network over each topic in the             just the slider to specify the amount of documents that need to be
network. From the river you can conclude that these two users are               associate a topic in order for it to appear in the space. After this
using the topic "Crowley" quite a bit, however their friends and fol-           filtering step, they are presented with a number of related topics,
lowers are not. For this reason, they may not be a trusted source               the most popular being the largest and darkest. By selecting that
for this topic since their followers do not appear to be interested in          topic, the space is reorganized to show all the the articles related
similar topics. On the other side of the visualization is the topic             to that topic. The user can now visually browse these articles and
"asylum" which is being used largely by the network and not so                  quickly identify which one appeared in the opinion section, based
much by these users.                                                            on color. Hovering the mouse over a document node would reveal
   Drawing firm conclusions at this level is not necessarily reliable            its specific information and provide access to the full text..
but better information can be introduced by selecting the right net-
work. For instance, if you know the terms "Assange", "Julian" and               6.3    Recommending Movies via Facebook API
"Wikileaks" are all terms related to Wikileaks, then you could se-                 In this example, shown in the right of Figure 3, we use the pro-
lect those terms from the visualization and view the results over the           posed framework to visualize data that is not topic modeled in or-
given network of users associated with those terms. By investigat-              der to show how the interface also operates on similarly structured
ing the probability of these three words occurring together across              datasets. Reinterpreting the definitions of recommendable item and
the social network you may be able to visualize trends about who                topic allows us to use the existing visual model for this dataset. In
is followed, by whom and for what reasons.                                      this example, a single Facebook user takes the place of a recom-
                                                                                mendable item and a movie takes the place of a topic. Since movies
6.2     Recommending New York Times Articles                                    can be related to any number of Facebook users and Facebook
                                                                                users can be associated with any number of movies, this dataset
   In the example shown on the left of Figure 3, topic modeling was
                                                                                can function similarly to the topic model examples. Each item-
performed on a set of New York Times articles and is used for in-
                                                                                topic, or rather user-movie, combination is assigned the probability
vestigation and discovery of related articles that may not have been
                                                                                of 1 since the user has specified explicitly that they like the given
discovered through traditional search models. Each document node
                                                                                movie. This visualization is able to provide exploratory views of
represents an article and topic nodes represent the topics extracted
                                                                                the most popular movies within a Facebook friend network as well
from those articles. Each article contains information about the
                                                                                as the least popular movies. It can also isolate pockets of users that
section of the paper which it belonged to, such as opinion, world or
                                                                                are fans of these most or least popular movies. Essentially this view
national news.
                                                                                is a visual representation of a social collaborative filtering process,
   Two unique design features were included in this interface to
                                                                                since items which are popular among Facebook friends are pro-
improve the functionality in relation to the underlying data. The
                                                                                moted for a single target user receiving the recommendation.
first one is colored rectangles on topics. These colors are used to
reinforce ternary connections though the use of color averaging.
The color of the rectangle is determined by the category of each of             7.    CONCLUSION
the articles associated with it. Should a color tend heavily towards               In summary, this paper has presented a novel interactive interface
a single category’s color, on could deduce that the topic tends to              for recommending interesting topics and documents from within a
appear most frequently within that category. The actual distribution            large corpus. The design is a hybrid which combines river and
of the categories is explicitly represented in the river.                       graph-like representations of recommended items and can be eas-
   The second unique feature is the use of lines. When hovering                 ily adapted and customized by the end user for different use cases.
over a topic, darkened lines extend form the topic itself to all re-            We have also introduced novel interaction methods that support hu-


                                                                           6



                                                                           46
man skills in the exploration of topic modeled data sets. In doing                             [17] S. Havre, E. Hetzler, P. Whitney, and L. Nowell. Themeriver: Visualizing
so, we have extended the efficacy of both the system and the al-                                     thematic changes in large document collections. IEEE Transactions on
                                                                                                    Visualization and Computer Graphics, 8(1):9–20, 2002.
gorithm, allowing the user to navigate large datasets and uncover                              [18] M. A. Hearst. Tilebars: visualization of term distribution information in full text
patterns. Details of our design choices and methodology have been                                   information access. In CHI ’95: Proc. of the SIGCHI Conf., pages 59–66, New
discussed, and demonstrated over three example applications, in-                                    York, NY, USA, 1995. ACM Press/Addison-Wesley Publishing Co.
cluding social network data from Twitter augmented with topic                                  [19] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering
                                                                                                    recommendations. In CSCW, pages 241–250, 2000.
modeling over users’ tweets, a topic modeled set of New York                                   [20] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating
Times news articles, and social network data from Facebook, in-                                     collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22:5–53,
cluding item preferences. In each example case, we have discussed                                   January 2004.
ways in which the approach facilitates discovery of relevant infor-                            [21] B. Knijnenburg, J. O’Donovan, S. Bostandjiev, and A. Kobsa. Inspectability and
                                                                                                    control in social recommenders. In 6th ACM Conference on Recommender
mation which may go undiscovered in traditional analysis tools.                                     Systems, Dublin, Ireland, September 9 to 13th, 2012.
We have also demonstrated TopicLens’ ability to act as a flexible                               [22] S. Koch, H. Bosch, and T. Ertl. Towards content-oriented patent document
interaction layer, supporting exploration of multiple application do-                               processing. IEEE Symposium on Visual Analytics, Science and Technology,
mains.                                                                                              pages 203–210, 2009.
                                                                                               [23] Q. Mei, X. Shen, and C. Zhai. Automatic labeling of multinomial topic models.
                                                                                                    In KDD, 2007.
8.     ACKNOWLEDGMENT                                                                          [24] N. E. Miller, P. Chung Wong, M. Brewster, and H. Foote. Topic islands - a
                                                                                                    wavelet-based text visualization system. In Proceedings of the conference on
   The authors would like to thank Peter Pirolli and Bongwon Suh                                    Visualization ’98, VIS ’98, pages 189–196, Los Alamitos, CA, USA, 1998.
for use of their Twitter data set. This work was partially supported                                IEEE Computer Society Press.
by the U.S. Army Research Laboratory under Cooperative Agree-                                  [25] D. Newman, T. Baldwin, L. Cavedon, E. Huang, S. Karimi, D. Martinez,
                                                                                                    F. Scholer, and J. Zobel. Visualizing search results and document collections
ment No. W911NF-09-2-0053; by NSF grant IIS-1058132; and                                            using topic maps. Web Semantics: Science, Services and Agents on the World
by the U.S. Army Research Laboratory under MURI grant No.                                           Wide Web, 8(2-3):169–175, July 2010.
W911NF-09-1-0553; The views and conclusions contained in this                                  [26] D. Newman, Y. Noh, E. Talley, S. Karimi, and T. Baldwin. Evaluating topic
document are those of the authors and should not be interpreted                                     models for digital libraries. In JCDL ’10: Proceedings of the 10th annual joint
                                                                                                    conference on Digital libraries, pages 215–224, New York, NY, USA, 2010.
as representing the official policies, either expressed or implied, of                               ACM.
ARL, NSF, or the U.S. Government. The U.S. Government is au-                                   [27] K. A. Olsen, R. R. Korfhage, K. M. Sochats, M. B. Spring, and J. G. Williams.
thorized to reproduce and distribute reprints for Government pur-                                   Visualization of a document collection: the vibe system. Inf. Process. Manage.,
poses notwithstanding any copyright notation here on.                                               29(1):69–81, 1993.
                                                                                               [28] Questel. Qpat, intellectual property patent and trademark searching.
                                                                                                    http://www.qpat.com/, 2010.
9.[1] M.REFERENCES
         Analyzer. Matheo analyzer, database analysis and information mapping.
                                                                                               [29] D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic
                                                                                                    models. In A. Cohn, editor, Proceedings of the Fourth International on Weblogs
     http://www.matheo-analyzer.com/, 2010.                                                         and Social Media, pages 130–137. AAAI Press, 23–26 May 2010.
 [2] K. Andrews, W. Kienreich, V. Sabol, J. Becker, G. Droschl, F. Kappe,                      [30] J. Reilly, K. McCarthy, L. McGinty, and B. Smyth. Dynamic critiquing. In
     M. Granitzer, P. Auer, and K. Tochtermann. The infosky visual explorer:                        ECCBR, pages 763–777, 2004.
     exploiting hierarchical structure and document similarities. Information
                                                                                               [31] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An
     Visualization, 1:166–181, December 2002.
                                                                                                    open architecture for collaborative filtering of netnews. In CSCW, pages
 [3] D. Blei and J. Lafferty. Correlated topic models. In Advances in NIPS 18, pages                175–186, 1994.
     147–154. MIT Press, Cambridge, MA, 2006.
                                                                                               [32] D. A. Rushall and M. R. Ilgen. Depict: Documents evaluated as pictures.
 [4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of                visualizing information using context vectors and self-organizing maps. In
     Machine Learning Research, 3:993–1022, 2003.                                                   Proceedings of the 1996 IEEE Symposium on Information Visualization
 [5] I. Borg and P. Groenen. Modern Multidimensional Scaling: Theory and                            (INFOVIS ’96), INFOVIS ’96, pages 100–, Washington, DC, USA, 1996. IEEE
     Applications. Springer, 2005.                                                                  Computer Society.
 [6] S. Bostandjiev, J. O’Donovan, , and T. Hollerer. Tasteweights: An interactive             [33] U. Shardanand and P. Maes. Social information filtering: Algorithms for
     hybrid recommender system. In 6th ACM Conference on Recommender                                automating "word of mouth". In CHI, pages 210–217, 1995.
     Systems, Dublin, Ireland, September 9 to 13th, 2012.                                      [34] B. Shneiderman. The eyes have it: A task by data type taxonomy for
 [7] R. Burke. Knowledge-based recommender systems. In Encyclopedia of Library                      information visualizations, 1996.
     and Information Systems, volume 69, 2000.                                                 [35] B. Shneiderman and M. Wattenberg. Ordered treemap layouts. Information
 [8] N. Cao, J. Sun, Y.-R. Lin, D. Gotz, S. Liu, and H. Qu. Facetatlas: Multifaceted                Visualization, IEEE Symposium on, 0:73, 2001.
     visualization for rich text corpora. IEEE Transactions on Visualization and               [36] W. Spangler, J. Kreulen, and J. Lessler. Mindmap: Utilizing multiple
     Computer Graphics, 16:1172–1181, November 2010.                                                taxonomies and visualization to understand a document collection. Hawaii
 [9] J. Chang. Reading Tea Leaves: How Humans Interpret Topic Models. 2009.                         International Conference on System Sciences, 4:102, 2002.
[10] L. Chen and P. Pu. Critiquing-based recommenders: survey and emerging                     [37] J. T. Stasko, C. Görg, and Z. Liu. Jigsaw: supporting investigative analysis
     trends. User Model. User-Adapt. Interact., 22(1-2):125–150, 2012.                              through interactive visualization. Information Visualization, 7(2):118–132,
[11] D. R. Cutting, D. R. Karger, and J. O. Pedersen. Constant interaction-time                     2008.
     scatter/gather browsing of very large document collections. In Proceedings of             [38] N. Tintarev and J. Masthoff. A survey of explanations in recommender systems.
     the 16th annual international ACM SIGIR conference on Research and                             In Data Engineering Workshop, 2007 IEEE 23rd International Conference on,
     development in information retrieval, SIGIR ’93, pages 126–134, New York,                      pages 801–810. IEEE, 2007.
     NY, USA, 1993. ACM.                                                                       [39] Wanner. Towards content-oriented patent document processing. World Patent
[12] B. D. Davison, T. Suel, N. Craswell, and B. Liu, editors. Proceedings of the                   Information, 30(1):21–33, 2008.
     Third International Conference on Web Search and Web Data Mining, WSDM                    [40] J. A. Wise, J. J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and
     2010, New York, NY, USA, February 4-6, 2010. ACM, 2010.                                        V. Crow. Visualizing the non-visual: Spatial analysis and interaction with
[13] M. Freire, C. Plaisant, B. Shneiderman, and J. Golbeck. Manynets: an interface                 information from text documents. In N. D. Gershon and S. Eick, editors, IEEE
     for multiple network analysis and visualization. In CHI ’10: Proceedings of the                Information Visualization ’95, pages 51–58. IEEE Computer Soc. Press,
     28th international conference on Human factors in computing systems, pages                     30–31 Oct. 1995.
     213–222, New York, NY, USA, 2010. ACM.                                                    [41] P. C. Wong, B. Hetzler, C. Posse, M. Whiting, S. Havre, N. Cramer, A. Shah,
[14] B. Fry. Visualizing data - exploring and explaining data with the processing                   M. Singhal, A. Turner, and J. Thomas. In-spire infovis 2004 contest entry.
     environment. O’Reilly, 2008.                                                                   Information Visualization, IEEE Symposium on, 0:r2, 2004.
[15] B. Gretarsson, J. O’Donovan, A. Asuncion, D. Newman, S. Bostandjiev,
     T. Hllerer, and P. Smyth. Topicnets: Visual analysis of large text corpora with
     topic modeling. ACM Transactions on the Web: Special Issue on Intelligent Text
     Visualization, 16:1172–1181, November 2011.
[16] T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl
     1):5228–5235, 2004.



                                                                                          7



                                                                                          47