Interactive Visualization Tools for Exploring
           the Semantic Graph of Large Knowledge Spaces
                                  Christian Hirsch, John Hosking, John Grundy
                                          Department of Computer Science
                                             The University of Auckland
                                     Private Bag 92019, Auckland, New Zealand
                               {chir008@ec. | john@cs. | john-g@cs.}@auckland.ac.nz


ABSTRACT
                                                                  ACM Classification Keywords
While the amount of available information on the Web is
                                                                  H5.m. Information interfaces and presentation (e.g., HCI):
increasing rapidly, the problem of managing it becomes
                                                                  Miscellaneous.
more difficult. We present two applications, Thinkbase and
Thinkpedia, which aim to make Web content more
                                                                  INTRODUCTION
accessible and usable by utilizing visualizations of the
                                                                  This research focuses on the design and implementation of
semantic graph as a means to navigate and explore large
                                                                  two interactive visualization tools. Both applications are
knowledge repositories. Both of our applications implement
                                                                  built on top of large knowledge repositories. The first
a similar concept: They extract semantically enriched
                                                                  prototype, Thinkbase, is built on top of Freebase. The
contents from a large knowledge spaces (Freebase and
                                                                  second prototype, Thinkpedia, is built on top of Wikipedia.
Wikipedia respectively), create an interactive graph-based
                                                                  The purpose of our applications is to provide a visual
representation out of it, and combine them into one
                                                                  navigation and exploration tool for the underlying
interface together with the original text based content. We
                                                                  knowledge space. We aim to provide a proof of concept of
describe the design and implementation of our applications,
                                                                  how visualizations can improve and support Semantic Web
and provide a discussion based on an informal evaluation.
                                                                  applications.
Author Keywords                                                   The remainder of this paper is organized as follows. In the
Semantic Web, Social Web, Wiki, Visualization, User               Background section we will shortly introduce the concepts
Interface, HCI.                                                   of “Web 2.0”, “Semantic Web”, Information Visualization,
                                                                  and their relevance to our work. After clarifying our
                                                                  Approach and Objective, we will then describe the two
Workshop on Visual Interfaces to the Social                       prototypes in the main sections, followed by a Discussion
and the Semantic Web (VISSW2009), IUI2009,                        and a short section on Future Work.
Feb 8 2009, Sanibel Island, Florida, USA.
Copyright is held by the author/owner(s).


                                                              1
BACKGROUND                                                       APPROACH AND OBJECTIVE
In today’s globalized Information Age the problem of             Even though all of the three mentioned approaches above –
information overload – having more information available         Web 2.0, Semantic Web, and Information Visualization –
than one can efficiently process – has become a ubiquitous       attempt to solve the information overload issue differently,
issue. Recent estimates predict that in the next five years      there is plenty of space for synergies. Approaches of
more information will be created than has been created in        combining the Web 2.0 and the Semantic Web can be seen
the whole of human history [4]. Most of the information is       in two different directions: On the one side, organically
or will be accessible through the internet and intranet.         grown content within the Web 2.0 (e.g. Wikipedia) is being
Tackling the problem of information overload has thus            semantically enriched with the help of natural language
become particularly interesting for the web community.           processing and knowledge extraction. DBpedia [1] is a well
                                                                 known example of this. On the other side, semantic
The “Web 2.0” [14] or “Social Web” already addresses the
                                                                 information repositories start to allow end-users to edit and
issue of information overload in several ways. The Web 2.0
                                                                 create semantics in a collaborative wiki-style manner. An
is a loosely defined set of technologies, tools, and concepts,
                                                                 example of this is Freebase [5]. One of the better known
which has had an enormous impact on how web-based
                                                                 applications which demonstrate synergies between the
information is processed. Besides new enabling
                                                                 Web 2.0 and Information Visualization is Many Eyes [21].
technologies (e.g. XML) and tools (e.g. Wikis), this “New
                                                                 It allows everyone to create, share, and discuss
Web” has introduced significant new behavioral and usage
                                                                 visualizations online. Lastly, there exists a variety of
patterns like content sharing, personalization, and mass
                                                                 possibilities and approaches to visualize Semantic Web
collaboration [16]. The resulting widespread adoption of
                                                                 content. This is discussed for example in [6].
Wikis and other social software tools transforms the way
how information is created and annotated. By using social        Our approach when building two prototypes of interactive
software, users annotate the content with meta-data in an        visualization and exploration tools for large knowledge
organic, bottom-up fashion. This enables software agents to      spaces was to combine elements from all three areas. As
better process the information, and as a result more tasks       knowledge spaces we have chosen Freebase (a “Semantic
can be delegated to those agents. Search algorithms and          Wiki”) for the one application, and Wikipedia (semantically
recommendation systems are successful examples of how a          enriched) for the other application. At the core of our tools,
bottom-up creation of meta-data can help to better cope          we utilize one crucial benefit of Semantic Web data: the
with an overwhelming amount of information.                      ability to be easily transformed from one representation into
                                                                 another. More precisely we transform the content from a
The Semantic Web [2] represents a further, more recent,
                                                                 textual representation into a visual representation. The
approach of addressing the information overload issue.
                                                                 interactive visualizations are displayed alongside the text-
Instead of creating semi-structured meta-data in a bottom-
                                                                 based repositories, providing a focus-plus-context view.
up fashion (as in the Web 2.0), the Semantic Web provides
                                                                 The results are applications which present visually enriched
the possibility to formally define meta-data supported by
                                                                 user interfaces for Semantic Web content.
knowledge representation languages (e.g. OWL) and formal
specifications (e.g. RDF). The resulting structured content      The objective of our research is to provide a proof of
can then not only be understood by humans but also by            concept of how interactive visualizations can improve
machines. Therefore more and more tasks can be delegated         Semantic Web applications. This is two-fold. On the one
to software agents. Research for example in the fields like      hand, our objective is to demonstrate how it is possible to
semantic search (e.g. [7]) and semantic recommendation           easily transform Semantic Web content into meaningful
systems (e.g. [22]) is well underway. Further advantages of      visual representations. On the other hand, our objective is to
semantically enriched data, such as interoperability and         demonstrate how these resulting applications can be used as
transformability, allow for better integration of different      efficient information discovery tools.
sources as well as easy transformation between different
representations (e.g. different languages).                      THINKBASE
                                                                 Our first prototype, Thinkbase [18], is a visual navigation
A further and more general approach of how to cope with          and exploration tool for Freebase, an open, shared database
information overload can be found in the field of                of the world’s knowledge [3]. Freebase can also be
Information Visualization. Visualizations provide effective      described as a “Semantic Wiki”. This means its content is
methods for representing and organizing knowledge- and           semantically enriched, everyone can edit it, and
information-rich scenarios [11]. They are tools for              furthermore, the meta-model itself is also editable by
knowledge management which make use of the human                 everyone. Figure 1 shows the general user interface of
cognitive processing system in order to create and convey        Thinkbase (in this case displaying the movie “The
content more efficiently. Information and knowledge              Departed”). The application is divided into two frames. The
visualizations both employ similar techniques. Based on          right frame displays the current Freebase topic, which
specific mapping rules, they translate resource objects into     consists of a short textual description as well as all the
visual objects, offering easy and comprehensive access to        details in tabular form. The left frame displays an
the underlying content [9].                                      automatically generated, interactive, force directed layout
graph of that same topic including all
related topics. We have chosen to use
the       Thinkmap          visualization
framework to implement this [19].
Thinkmap is a software platform for
developing customized visualization
interfaces. It consists of loosely
coupled components which provide
users the ability to retrieve a result set
from data sources, and then visualize,
navigate, and organize it. The
Thinkmap Software Development Kit
(SDK) provides ways to easily extend
and adjust the suite as well as to
integrate it with other web and
database technologies.
Thinkbase accesses the Freebase API,
retrieves information about the current
Freebase topic as well as all related
topics, and creates a graph-based
visual representation of it with the
help of Thinkmap. Each Freebase
topic is represented as a node using an
icon which corresponds to its type
(e.g. person, movie). Edges between                                Figure 1. The user interface of Thinkbase.
those nodes are annotated with the
type of the relationship. These labels become visible when
hovered by the mouse. For example, Figure 2 shows the
Thinkbase graph for “Homer”. There, one can see that the
“Place of birth” of “Homer” is “Greece”. Related topics of
the same type are combined in an aggregation node (the
grey circles) as seen for example for the type “Influenced”.
These aggregation nodes can be expanded and collapsed
through a context menu, which helps to focus on specific
contents while hiding others (e.g. “Quotations”). Further
visual cues such as the length of edges, size of aggregation
nodes, and text color are used to encode additional
information. Users can navigate from node to node by
clicking on them. This will refresh the graph as well as the
Freebase frame. The graph is animated which will allow for
a smooth transition between different visualizations. This
helps the users to preserve the “mental map” [12] of the
knowledge space. The two alternative representations
(textual and visual) of the same underlying body of
knowledge enable a focus-plus-context view. This is a                       Figure 2. The Thinkbase graph for “Homer”.
further means to support navigation and help users to
maintain the mental map. While the textual representation
gives a good focus on the current topic, the visual
representation allows the user to see the topic embedded in
the wider context.
The visual representation in Thinkbase presents a topic-
centered view. That is, a specific Freebase topic is at the
center of the visualization and connections of all directly
related topics are shown around it. As all of the Freebase
content is basically one huge graph, this means that at any                 Figure 3. A small extract of the “tree of life”.
one time we show a small subset of this whole graph. From


                                                               3
a cognitive perspective, it would not
make sense to display a very large
amount of data [8]. However, we
allow users to extend the
visualization       metaphor       by
(repetitively)     expanding      and
collapsing not only aggregation
nodes but all nodes of the graph to
ones liking. This feature gives the
user the ability to create unique and
informative visualizations. Figure 3
shows an example where the lower
and higher classifications of an
animal class (here: “Reptile”) has
been expanded repetitively. The
resulting visualization represents a
small subset of the tree of life,
ranging from “Vertebrate” to
“Dinosaur”.
Further features of Thinkbase
include: zoom functions; the ability
to navigate the browsing history;
printing the visualization; the
possibility to share a direct link to a                     Figure 4. The user interface of Thinkpedia.
specific page; and the option to
trigger a search of a node in Google
or Wikipedia. Our research prototype also provides some
functionality to edit the content of Freebase through the
visual representation (e.g. add new relationships). This is
only possible due to the semantically enriched content.

THINKPEDIA
Our second prototype, Thinkpedia [20], is a visual
navigation and exploration tool for Wikipedia. The
objective for this prototype was to investigate the
possibility of creating a similar visual exploration tool as
Thinkbase, only for a less structured knowledge space. The
“Social Web”, of which Wikipedia is a part, has produced a
huge amount of interesting content. However, most of it is
unstructured or semi-structured. Therefore it is hard for
machines to reason with it and, in our case, to automatically
translate the content into meaningful visualizations. What is
needed is a way to extract semantics from the unstructured
contents of Wikipedia. This field of knowledge extraction is
a well established research area, and tools like DBpedia [1]
                                                                     Figure 5. The Thinkpedia graph for “Semantic Web”.
demonstrate successful approaches of doing this. We
decided to use the SemanticProxy web service which is part
of the Calais initiative by Thomson Reuters [15]. The
SemanticProxy takes plain text or a URL as input,
processes this, and returns the identified concepts and their
relationships in RDF format. Using a general “semantifier”
like SemanticProxy allows us to easily switch between
different MediaWikis (not only Wikipedia) or even other
unstructured sources. Figure 4 shows the general user
interface of Thinkpedia (in this case displaying the article
for “Albert Einstein”). The concept of the application is
similar to Thinkbase. It is divided into two frames, the right
                                                                    Figure 6. The same graph as Figure 5, reduced to the most
                                                                                       relevant concepts.
one displays the current Wikipedia article, the left one              repositories. We conducted a small informal survey in order
displays an interactive, force directed layout graph which            to better discuss potential strengths and weaknesses. Our
was created from the same Wikipedia article with the help             prototypes depend to a large extend on the efficiency and
of SemanticProxy. We use Thinkmap for this, the same                  usability of visualizations. These, however, are
visualization framework as used for Thinkbase.                        fundamentally hard to evaluate. Therefore we chose a quite
                                                                      informal and anecdotal evaluation method proposed in [13].
When a Wikipedia article is requester through Thinkpedia
                                                                      Instead of giving users a clearly defined task (e.g. finding a
(e.g. through a keyword search), the application first
                                                                      specific piece of information) and then measuring the time
accesses the Wikipedia API in order to retrieve the most
                                                                      or accuracy when using different visualization tools, we
relevant article(s). The SemanticProxy API is then used to
                                                                      gave the users one open-ended task and let them report on
processes the content (i.e. identifies concepts and their
                                                                      interesting findings or insights. Seven users participated in
relationships), and returns the result in RDF format. The
                                                                      the survey, all of them postgraduates or staff members at
RDF content is parsed and visualized as an interactive
                                                                      the Computer Science department of The University of
graph. For example, Figure 5 shows the graph for the
                                                                      Auckland. Participants were asked to choose any starting
“Semantic Web” Wikipedia article. The article itself is
                                                                      topic they are interested in (e.g. their favorite movie, next
visualizes as the center node. All identified related concepts
                                                                      holiday destination, a famous person) and then explore this
are shown around the center. These concepts are things like
                                                                      topic and its related topics according to their interest.
“Person”, “Company”, or “Country”. Each of these is
                                                                      Additionally they were asked to write a short report about
represented as a node in the graph using an icon which
                                                                      how they experienced both of the applications, what kind of
corresponds to its type. Concepts of the same type are
                                                                      insights they gained, as well as notable differences between
combined in an aggregation node which can also be
                                                                      the tools.
collapsed in order to reduce the amount of information
shown. The size of the aggregation nodes corresponds to               A general observation which was made in similar ways by
the number of concepts within this type. One particularly             a majority of the participants was that even though
interesting feature of the SemanticProxy is that it annotates         Thinkbase provides more structured content, the coverage
each identified concept with a relevance value. This value            of its content is rather limited (that is, for many topics the
expresses how relevant the concept is within the processed            semantic content is still very sparse). Thinkpedia on the
text. We visually encode this value in form of the edge               other hand has much more coverage but the semantically
thickness. The thicker an edge is, the more relevant is its           enriched graph still lacks some structure. Not surprisingly,
connected concept. For example, in Figure 5 one can see               this also roughly translates into general strengths and
that the “Person” “Tim Berners-Lee” is more relevant to the           weaknesses of the Semantic and Social Web.
“Semantic Web” Wikipedia article than any of the
                                                                      More precisely, participants reported that Thinkbase is
“Industry Terms”. The edge thickness between the center
                                                                      “very well structured”, the “[connections] seem very solid”,
node and the aggregation node is an average of all the edges
                                                                      and “navigation felt very natural”. Furthermore, the
going out from the aggregation node. Furthermore, we can
                                                                      application has been described as “effective and beautiful”,
use this value for an interactive range slider (see Figure 4 in
                                                                      and that it is “lot of fun [browsing the content]”. On the
the lower part of the visualization). This range slider can be
                                                                      downside, participants reported that the “richness of content
increased and decreased which will show more or less
                                                                      [is] rather less than [the one in] Thinkpedia”, e.g. it is
relevant nodes in the graph. Figure 6 for example shows the
                                                                      “limited for some topics” and not as “full as [one] would
same graph as Figure 5, only that its visible content has
                                                                      have liked”. For Thinkpedia, participants reported that the
been reduced to the most relevant concepts.
                                                                      “richness of information is much better” and “more
Navigating the graph is quite similar to Thinkbase. Clicking          comprehensive”. “Due to the fuller amount of information
on a related node will refresh both the Wikipedia frame as            available”, the application “[gives] an interesting
well as the graph. A difference here is that the concepts in          perspective”. However, there clearly are weaknesses.
the graph do not correspond directly to a Wikipedia page.             Participants found Thinkpedia to be “less solid” and that it
Clicking on a concept therefore triggers a further Wikipedia          sometimes “seems a little bit disorder”. Furthermore the
search, which returns the most relevant page. Further                 visualization presents some “odd mistakes”, due to
features of Thinkpedia again include a zoom and printing              ambiguities within the process of extracting semantics. The
function, the ability to navigate the browsing history, and           implementation of the search function in Thinkpedia is still
the possibility to share a direct link.                               a little bit flawed and was described as “frustrating”.
                                                                      Insights about which the participants reported were mostly
DISCUSSION
Clearly, both of our prototypes have their strengths and              along the lines of discovering related information which
weaknesses. Some of these are related to the contents and             they were either not aware of, or which they already knew
structure of the underlying knowledge repositories                    of, but found noteworthy to see visualized. A typical report
(Freebase and Wikipedia). Others are related to how we                would for example look like this: “I found it interesting to
implement our visual exploration tools on top of those                see X connected with Y”. Exploring content along those
                                                                      kind of connections seems to be a very useful feature. One


                                                                  5
participant for example described how he navigated from a        4.  Department of Education, Science and Training.
television show to a city to a state and finally he discovered       Backing Australia's Ability - An Ongoing Commitment.
a “mountain [where he could] go skiing”. This relates to a           2007.
concept called Orienteering [17], which describes a type of          http://backingaus.innovation.gov.au/info_booklet/on_c
search in which the target is not (well) known. Instead of           ommit.htm.
jumping or “teleporting” directly to the target (what is         5. F Freebase. 2008. http://www.freebase.com.
usually the case in keyword search), one rather performs a       6. Geroimenko, V. and C. Chen, Visualizing the Semantic
directed situated navigation. This means a user takes a              Web:      Xml-based      Internet   And      Information
series of smaller steps while navigating though the                  Visualization. 2006: Springer.
information space. Advantages of Orienteering are: it            7. Guha, R., R. McCool, and E. Miller, Semantic search,
decreases the cognitive load, maintains a sense of location,         in Proceedings of the 12th international conference on
and gives a better feeling for context. Our applications seem        World Wide Web. 2003, ACM New York, NY, USA. p.
to support such a navigation behavior as it allows starting          700-709.
e.g. with a general topic and then drilling down on it.          8. Herman, I., G. Melancon, and M.S. Marshall, Graph
Lastly, participants reported on the benefits of having              Visualization and Navigation in Information
information condensed in a visual form. This help to reveal          Visualization: A Survey. IEEE Transactions on
“information that is otherwise difficult to notice when              Visualization and Computer Graphics, 2000: p. 24-43.
presented in a textual environment”. Furthermore one can         9. Jaeschke, G., M. Leissler, and M. Hemmje, Modeling
“easily [see] key words and [does not] need to waste time            Interactive, 3-Dimensional Information Visualizations
reading [all the text]”.
                                                                     Supporting Information Seeking Behaviors. in
Graphs are arguable not always the best way to represent             Knowledge and Information Visualization: Searching
large amounts of content, depending on the task a system is          for Synergies. Springer 2005: p. 119-135.
meant to support [10]. Instead of simply displaying one          10. Karger, D. and M.C. Schraefel. The pathetic fallacy of
“big fat graph”, we have focused on several ways to filter           RDF. 2006.
the graph (e.g. aggregation nodes and range slider).             11. Keller, T. and S.O. Tergan, Visualizing Knowledge and
Furthermore our graph visualizations do not replace but go           Information: An Introduction. in Knowledge and
along with existing user interface approaches (e.g. tabular          Information Visualization: Searching for Synergies.
displays in Freebase). The informal evaluation as discussed          Springer 2005: p. 1-23.
above suggests that our approach has several benefits when       12. Misue, K., et al., Layout Adjustment and the Mental
exploring large knowledge spaces.                                    Map. Journal of Visual Languages and Computing,
                                                                     1995. 6(2): p. 183-210.
FUTURE WORK                                                      13. North, C., Toward Measuring Visualization Insight.
Future work of our research will focus on two different              IEEE Computer Graphics and Applications, 2006.
areas. Firstly we will further work on improving the             14. O’Reilly, T., What Is Web 2.0: Design Patterns and
existing two prototypes and adding new features. This will           Business Models for the Next Generation of Software.
include fixing weaknesses identified in the evaluation such          O'Reilly Media 2005.
as poor search function in Thinkpedia and smaller user
                                                                 15. Reuters, T. SemanticProxy. 2008.
interface improvements. Further work might also focus on
                                                                     http://semanticproxy.com.
improving the usability of Thinkbase by adding more
advanced filtering mechanisms and giving more control            16. Tapscott, D. and A.D. Williams, Wikinomics: how
over the display. Improving Thinkpedia might include                 mass collaboration changes everything. 2006 Portfolio.
exploring alternative knowledge extraction tools. Secondly       17. Teevan, J., et al., The perfect search engine is not
our future work will include extending our concept to                enough: a study of orienteering behavior in directed
further information repositories. We have provided a proof           search, in Proceedings of the SIGCHI conference on
of concept of how visual user interfaces can improve Social          Human factors in computing systems. 2004, ACM New
and Semantic Web applications. This same concept could               York, NY, USA. p. 415-422.
be explored for many more applications and domains.              18. Thinkbase. 2008. http://thinkbase.cs.auckland.ac.nz.
                                                                 19. Thinkmap. 2008. www.thinkmap.com.
REFERENCES                                                       20. Thinkpedia. 2008. http://thinkpedia.cs.auckland.ac.nz.
1.   Auer, S., et al., DBpedia: A Nucleus for a Web of Open      21. Viégas, F.B., et al., Many Eyes: A Site for Visualization
     Data. Lecture Notes in Computer Science, 2007. 4825.            at Internet Scale. IEEE Transactions on Visualization
2.   Berners-Lee, T., J. Hendler, and O. Lassila, The                and Computer Graphics, 2007: p. 1121-1128.
     semantic Web. Scientific American, 2001. 284(5).            22. Ziegler, C.N., L. Schmidt-Thieme, and G. Lausen,
3.   Bollacker, K., R. Cook, and P. Tufts, Freebase: A               Exploiting semantic product descriptions for
     Shared Database of Structured General Human                     recommender systems, in Proceedings of the 2nd ACM
     Knowledge. Proceedings of the national conference on            SIGIR Semantic Web and Information Retrieval
     Artificial Intelligence, 2007. 22(2): p. 1962.                  Workshop. 2004.