Interactive Visualization Tools for Exploring the Semantic Graph of Large Knowledge Spaces Christian Hirsch, John Hosking, John Grundy Department of Computer Science The University of Auckland Private Bag 92019, Auckland, New Zealand {chir008@ec. | john@cs. | john-g@cs.}@auckland.ac.nz ABSTRACT ACM Classification Keywords While the amount of available information on the Web is H5.m. Information interfaces and presentation (e.g., HCI): increasing rapidly, the problem of managing it becomes Miscellaneous. more difficult. We present two applications, Thinkbase and Thinkpedia, which aim to make Web content more INTRODUCTION accessible and usable by utilizing visualizations of the This research focuses on the design and implementation of semantic graph as a means to navigate and explore large two interactive visualization tools. Both applications are knowledge repositories. Both of our applications implement built on top of large knowledge repositories. The first a similar concept: They extract semantically enriched prototype, Thinkbase, is built on top of Freebase. The contents from a large knowledge spaces (Freebase and second prototype, Thinkpedia, is built on top of Wikipedia. Wikipedia respectively), create an interactive graph-based The purpose of our applications is to provide a visual representation out of it, and combine them into one navigation and exploration tool for the underlying interface together with the original text based content. We knowledge space. We aim to provide a proof of concept of describe the design and implementation of our applications, how visualizations can improve and support Semantic Web and provide a discussion based on an informal evaluation. applications. Author Keywords The remainder of this paper is organized as follows. In the Semantic Web, Social Web, Wiki, Visualization, User Background section we will shortly introduce the concepts Interface, HCI. of “Web 2.0”, “Semantic Web”, Information Visualization, and their relevance to our work. After clarifying our Approach and Objective, we will then describe the two Workshop on Visual Interfaces to the Social prototypes in the main sections, followed by a Discussion and the Semantic Web (VISSW2009), IUI2009, and a short section on Future Work. Feb 8 2009, Sanibel Island, Florida, USA. Copyright is held by the author/owner(s). 1 BACKGROUND APPROACH AND OBJECTIVE In today’s globalized Information Age the problem of Even though all of the three mentioned approaches above – information overload – having more information available Web 2.0, Semantic Web, and Information Visualization – than one can efficiently process – has become a ubiquitous attempt to solve the information overload issue differently, issue. Recent estimates predict that in the next five years there is plenty of space for synergies. Approaches of more information will be created than has been created in combining the Web 2.0 and the Semantic Web can be seen the whole of human history [4]. Most of the information is in two different directions: On the one side, organically or will be accessible through the internet and intranet. grown content within the Web 2.0 (e.g. Wikipedia) is being Tackling the problem of information overload has thus semantically enriched with the help of natural language become particularly interesting for the web community. processing and knowledge extraction. DBpedia [1] is a well known example of this. On the other side, semantic The “Web 2.0” [14] or “Social Web” already addresses the information repositories start to allow end-users to edit and issue of information overload in several ways. The Web 2.0 create semantics in a collaborative wiki-style manner. An is a loosely defined set of technologies, tools, and concepts, example of this is Freebase [5]. One of the better known which has had an enormous impact on how web-based applications which demonstrate synergies between the information is processed. Besides new enabling Web 2.0 and Information Visualization is Many Eyes [21]. technologies (e.g. XML) and tools (e.g. Wikis), this “New It allows everyone to create, share, and discuss Web” has introduced significant new behavioral and usage visualizations online. Lastly, there exists a variety of patterns like content sharing, personalization, and mass possibilities and approaches to visualize Semantic Web collaboration [16]. The resulting widespread adoption of content. This is discussed for example in [6]. Wikis and other social software tools transforms the way how information is created and annotated. By using social Our approach when building two prototypes of interactive software, users annotate the content with meta-data in an visualization and exploration tools for large knowledge organic, bottom-up fashion. This enables software agents to spaces was to combine elements from all three areas. As better process the information, and as a result more tasks knowledge spaces we have chosen Freebase (a “Semantic can be delegated to those agents. Search algorithms and Wiki”) for the one application, and Wikipedia (semantically recommendation systems are successful examples of how a enriched) for the other application. At the core of our tools, bottom-up creation of meta-data can help to better cope we utilize one crucial benefit of Semantic Web data: the with an overwhelming amount of information. ability to be easily transformed from one representation into another. More precisely we transform the content from a The Semantic Web [2] represents a further, more recent, textual representation into a visual representation. The approach of addressing the information overload issue. interactive visualizations are displayed alongside the text- Instead of creating semi-structured meta-data in a bottom- based repositories, providing a focus-plus-context view. up fashion (as in the Web 2.0), the Semantic Web provides The results are applications which present visually enriched the possibility to formally define meta-data supported by user interfaces for Semantic Web content. knowledge representation languages (e.g. OWL) and formal specifications (e.g. RDF). The resulting structured content The objective of our research is to provide a proof of can then not only be understood by humans but also by concept of how interactive visualizations can improve machines. Therefore more and more tasks can be delegated Semantic Web applications. This is two-fold. On the one to software agents. Research for example in the fields like hand, our objective is to demonstrate how it is possible to semantic search (e.g. [7]) and semantic recommendation easily transform Semantic Web content into meaningful systems (e.g. [22]) is well underway. Further advantages of visual representations. On the other hand, our objective is to semantically enriched data, such as interoperability and demonstrate how these resulting applications can be used as transformability, allow for better integration of different efficient information discovery tools. sources as well as easy transformation between different representations (e.g. different languages). THINKBASE Our first prototype, Thinkbase [18], is a visual navigation A further and more general approach of how to cope with and exploration tool for Freebase, an open, shared database information overload can be found in the field of of the world’s knowledge [3]. Freebase can also be Information Visualization. Visualizations provide effective described as a “Semantic Wiki”. This means its content is methods for representing and organizing knowledge- and semantically enriched, everyone can edit it, and information-rich scenarios [11]. They are tools for furthermore, the meta-model itself is also editable by knowledge management which make use of the human everyone. Figure 1 shows the general user interface of cognitive processing system in order to create and convey Thinkbase (in this case displaying the movie “The content more efficiently. Information and knowledge Departed”). The application is divided into two frames. The visualizations both employ similar techniques. Based on right frame displays the current Freebase topic, which specific mapping rules, they translate resource objects into consists of a short textual description as well as all the visual objects, offering easy and comprehensive access to details in tabular form. The left frame displays an the underlying content [9]. automatically generated, interactive, force directed layout graph of that same topic including all related topics. We have chosen to use the Thinkmap visualization framework to implement this [19]. Thinkmap is a software platform for developing customized visualization interfaces. It consists of loosely coupled components which provide users the ability to retrieve a result set from data sources, and then visualize, navigate, and organize it. The Thinkmap Software Development Kit (SDK) provides ways to easily extend and adjust the suite as well as to integrate it with other web and database technologies. Thinkbase accesses the Freebase API, retrieves information about the current Freebase topic as well as all related topics, and creates a graph-based visual representation of it with the help of Thinkmap. Each Freebase topic is represented as a node using an icon which corresponds to its type (e.g. person, movie). Edges between Figure 1. The user interface of Thinkbase. those nodes are annotated with the type of the relationship. These labels become visible when hovered by the mouse. For example, Figure 2 shows the Thinkbase graph for “Homer”. There, one can see that the “Place of birth” of “Homer” is “Greece”. Related topics of the same type are combined in an aggregation node (the grey circles) as seen for example for the type “Influenced”. These aggregation nodes can be expanded and collapsed through a context menu, which helps to focus on specific contents while hiding others (e.g. “Quotations”). Further visual cues such as the length of edges, size of aggregation nodes, and text color are used to encode additional information. Users can navigate from node to node by clicking on them. This will refresh the graph as well as the Freebase frame. The graph is animated which will allow for a smooth transition between different visualizations. This helps the users to preserve the “mental map” [12] of the knowledge space. The two alternative representations (textual and visual) of the same underlying body of knowledge enable a focus-plus-context view. This is a Figure 2. The Thinkbase graph for “Homer”. further means to support navigation and help users to maintain the mental map. While the textual representation gives a good focus on the current topic, the visual representation allows the user to see the topic embedded in the wider context. The visual representation in Thinkbase presents a topic- centered view. That is, a specific Freebase topic is at the center of the visualization and connections of all directly related topics are shown around it. As all of the Freebase content is basically one huge graph, this means that at any Figure 3. A small extract of the “tree of life”. one time we show a small subset of this whole graph. From 3 a cognitive perspective, it would not make sense to display a very large amount of data [8]. However, we allow users to extend the visualization metaphor by (repetitively) expanding and collapsing not only aggregation nodes but all nodes of the graph to ones liking. This feature gives the user the ability to create unique and informative visualizations. Figure 3 shows an example where the lower and higher classifications of an animal class (here: “Reptile”) has been expanded repetitively. The resulting visualization represents a small subset of the tree of life, ranging from “Vertebrate” to “Dinosaur”. Further features of Thinkbase include: zoom functions; the ability to navigate the browsing history; printing the visualization; the possibility to share a direct link to a Figure 4. The user interface of Thinkpedia. specific page; and the option to trigger a search of a node in Google or Wikipedia. Our research prototype also provides some functionality to edit the content of Freebase through the visual representation (e.g. add new relationships). This is only possible due to the semantically enriched content. THINKPEDIA Our second prototype, Thinkpedia [20], is a visual navigation and exploration tool for Wikipedia. The objective for this prototype was to investigate the possibility of creating a similar visual exploration tool as Thinkbase, only for a less structured knowledge space. The “Social Web”, of which Wikipedia is a part, has produced a huge amount of interesting content. However, most of it is unstructured or semi-structured. Therefore it is hard for machines to reason with it and, in our case, to automatically translate the content into meaningful visualizations. What is needed is a way to extract semantics from the unstructured contents of Wikipedia. This field of knowledge extraction is a well established research area, and tools like DBpedia [1] Figure 5. The Thinkpedia graph for “Semantic Web”. demonstrate successful approaches of doing this. We decided to use the SemanticProxy web service which is part of the Calais initiative by Thomson Reuters [15]. The SemanticProxy takes plain text or a URL as input, processes this, and returns the identified concepts and their relationships in RDF format. Using a general “semantifier” like SemanticProxy allows us to easily switch between different MediaWikis (not only Wikipedia) or even other unstructured sources. Figure 4 shows the general user interface of Thinkpedia (in this case displaying the article for “Albert Einstein”). The concept of the application is similar to Thinkbase. It is divided into two frames, the right Figure 6. The same graph as Figure 5, reduced to the most relevant concepts. one displays the current Wikipedia article, the left one repositories. We conducted a small informal survey in order displays an interactive, force directed layout graph which to better discuss potential strengths and weaknesses. Our was created from the same Wikipedia article with the help prototypes depend to a large extend on the efficiency and of SemanticProxy. We use Thinkmap for this, the same usability of visualizations. These, however, are visualization framework as used for Thinkbase. fundamentally hard to evaluate. Therefore we chose a quite informal and anecdotal evaluation method proposed in [13]. When a Wikipedia article is requester through Thinkpedia Instead of giving users a clearly defined task (e.g. finding a (e.g. through a keyword search), the application first specific piece of information) and then measuring the time accesses the Wikipedia API in order to retrieve the most or accuracy when using different visualization tools, we relevant article(s). The SemanticProxy API is then used to gave the users one open-ended task and let them report on processes the content (i.e. identifies concepts and their interesting findings or insights. Seven users participated in relationships), and returns the result in RDF format. The the survey, all of them postgraduates or staff members at RDF content is parsed and visualized as an interactive the Computer Science department of The University of graph. For example, Figure 5 shows the graph for the Auckland. Participants were asked to choose any starting “Semantic Web” Wikipedia article. The article itself is topic they are interested in (e.g. their favorite movie, next visualizes as the center node. All identified related concepts holiday destination, a famous person) and then explore this are shown around the center. These concepts are things like topic and its related topics according to their interest. “Person”, “Company”, or “Country”. Each of these is Additionally they were asked to write a short report about represented as a node in the graph using an icon which how they experienced both of the applications, what kind of corresponds to its type. Concepts of the same type are insights they gained, as well as notable differences between combined in an aggregation node which can also be the tools. collapsed in order to reduce the amount of information shown. The size of the aggregation nodes corresponds to A general observation which was made in similar ways by the number of concepts within this type. One particularly a majority of the participants was that even though interesting feature of the SemanticProxy is that it annotates Thinkbase provides more structured content, the coverage each identified concept with a relevance value. This value of its content is rather limited (that is, for many topics the expresses how relevant the concept is within the processed semantic content is still very sparse). Thinkpedia on the text. We visually encode this value in form of the edge other hand has much more coverage but the semantically thickness. The thicker an edge is, the more relevant is its enriched graph still lacks some structure. Not surprisingly, connected concept. For example, in Figure 5 one can see this also roughly translates into general strengths and that the “Person” “Tim Berners-Lee” is more relevant to the weaknesses of the Semantic and Social Web. “Semantic Web” Wikipedia article than any of the More precisely, participants reported that Thinkbase is “Industry Terms”. The edge thickness between the center “very well structured”, the “[connections] seem very solid”, node and the aggregation node is an average of all the edges and “navigation felt very natural”. Furthermore, the going out from the aggregation node. Furthermore, we can application has been described as “effective and beautiful”, use this value for an interactive range slider (see Figure 4 in and that it is “lot of fun [browsing the content]”. On the the lower part of the visualization). This range slider can be downside, participants reported that the “richness of content increased and decreased which will show more or less [is] rather less than [the one in] Thinkpedia”, e.g. it is relevant nodes in the graph. Figure 6 for example shows the “limited for some topics” and not as “full as [one] would same graph as Figure 5, only that its visible content has have liked”. For Thinkpedia, participants reported that the been reduced to the most relevant concepts. “richness of information is much better” and “more Navigating the graph is quite similar to Thinkbase. Clicking comprehensive”. “Due to the fuller amount of information on a related node will refresh both the Wikipedia frame as available”, the application “[gives] an interesting well as the graph. A difference here is that the concepts in perspective”. However, there clearly are weaknesses. the graph do not correspond directly to a Wikipedia page. Participants found Thinkpedia to be “less solid” and that it Clicking on a concept therefore triggers a further Wikipedia sometimes “seems a little bit disorder”. Furthermore the search, which returns the most relevant page. Further visualization presents some “odd mistakes”, due to features of Thinkpedia again include a zoom and printing ambiguities within the process of extracting semantics. The function, the ability to navigate the browsing history, and implementation of the search function in Thinkpedia is still the possibility to share a direct link. a little bit flawed and was described as “frustrating”. Insights about which the participants reported were mostly DISCUSSION Clearly, both of our prototypes have their strengths and along the lines of discovering related information which weaknesses. Some of these are related to the contents and they were either not aware of, or which they already knew structure of the underlying knowledge repositories of, but found noteworthy to see visualized. A typical report (Freebase and Wikipedia). Others are related to how we would for example look like this: “I found it interesting to implement our visual exploration tools on top of those see X connected with Y”. Exploring content along those kind of connections seems to be a very useful feature. One 5 participant for example described how he navigated from a 4. Department of Education, Science and Training. television show to a city to a state and finally he discovered Backing Australia's Ability - An Ongoing Commitment. a “mountain [where he could] go skiing”. This relates to a 2007. concept called Orienteering [17], which describes a type of http://backingaus.innovation.gov.au/info_booklet/on_c search in which the target is not (well) known. Instead of ommit.htm. jumping or “teleporting” directly to the target (what is 5. F Freebase. 2008. http://www.freebase.com. usually the case in keyword search), one rather performs a 6. Geroimenko, V. and C. Chen, Visualizing the Semantic directed situated navigation. This means a user takes a Web: Xml-based Internet And Information series of smaller steps while navigating though the Visualization. 2006: Springer. information space. Advantages of Orienteering are: it 7. Guha, R., R. McCool, and E. Miller, Semantic search, decreases the cognitive load, maintains a sense of location, in Proceedings of the 12th international conference on and gives a better feeling for context. Our applications seem World Wide Web. 2003, ACM New York, NY, USA. p. to support such a navigation behavior as it allows starting 700-709. e.g. with a general topic and then drilling down on it. 8. Herman, I., G. Melancon, and M.S. Marshall, Graph Lastly, participants reported on the benefits of having Visualization and Navigation in Information information condensed in a visual form. This help to reveal Visualization: A Survey. IEEE Transactions on “information that is otherwise difficult to notice when Visualization and Computer Graphics, 2000: p. 24-43. presented in a textual environment”. Furthermore one can 9. Jaeschke, G., M. Leissler, and M. Hemmje, Modeling “easily [see] key words and [does not] need to waste time Interactive, 3-Dimensional Information Visualizations reading [all the text]”. Supporting Information Seeking Behaviors. in Graphs are arguable not always the best way to represent Knowledge and Information Visualization: Searching large amounts of content, depending on the task a system is for Synergies. Springer 2005: p. 119-135. meant to support [10]. Instead of simply displaying one 10. Karger, D. and M.C. Schraefel. The pathetic fallacy of “big fat graph”, we have focused on several ways to filter RDF. 2006. the graph (e.g. aggregation nodes and range slider). 11. Keller, T. and S.O. Tergan, Visualizing Knowledge and Furthermore our graph visualizations do not replace but go Information: An Introduction. in Knowledge and along with existing user interface approaches (e.g. tabular Information Visualization: Searching for Synergies. displays in Freebase). The informal evaluation as discussed Springer 2005: p. 1-23. above suggests that our approach has several benefits when 12. Misue, K., et al., Layout Adjustment and the Mental exploring large knowledge spaces. Map. Journal of Visual Languages and Computing, 1995. 6(2): p. 183-210. FUTURE WORK 13. North, C., Toward Measuring Visualization Insight. Future work of our research will focus on two different IEEE Computer Graphics and Applications, 2006. areas. Firstly we will further work on improving the 14. O’Reilly, T., What Is Web 2.0: Design Patterns and existing two prototypes and adding new features. This will Business Models for the Next Generation of Software. include fixing weaknesses identified in the evaluation such O'Reilly Media 2005. as poor search function in Thinkpedia and smaller user 15. Reuters, T. SemanticProxy. 2008. interface improvements. Further work might also focus on http://semanticproxy.com. improving the usability of Thinkbase by adding more advanced filtering mechanisms and giving more control 16. Tapscott, D. and A.D. Williams, Wikinomics: how over the display. Improving Thinkpedia might include mass collaboration changes everything. 2006 Portfolio. exploring alternative knowledge extraction tools. Secondly 17. Teevan, J., et al., The perfect search engine is not our future work will include extending our concept to enough: a study of orienteering behavior in directed further information repositories. We have provided a proof search, in Proceedings of the SIGCHI conference on of concept of how visual user interfaces can improve Social Human factors in computing systems. 2004, ACM New and Semantic Web applications. This same concept could York, NY, USA. p. 415-422. be explored for many more applications and domains. 18. Thinkbase. 2008. http://thinkbase.cs.auckland.ac.nz. 19. Thinkmap. 2008. www.thinkmap.com. REFERENCES 20. Thinkpedia. 2008. http://thinkpedia.cs.auckland.ac.nz. 1. Auer, S., et al., DBpedia: A Nucleus for a Web of Open 21. Viégas, F.B., et al., Many Eyes: A Site for Visualization Data. Lecture Notes in Computer Science, 2007. 4825. at Internet Scale. IEEE Transactions on Visualization 2. Berners-Lee, T., J. Hendler, and O. Lassila, The and Computer Graphics, 2007: p. 1121-1128. semantic Web. Scientific American, 2001. 284(5). 22. Ziegler, C.N., L. Schmidt-Thieme, and G. Lausen, 3. Bollacker, K., R. Cook, and P. Tufts, Freebase: A Exploiting semantic product descriptions for Shared Database of Structured General Human recommender systems, in Proceedings of the 2nd ACM Knowledge. Proceedings of the national conference on SIGIR Semantic Web and Information Retrieval Artificial Intelligence, 2007. 22(2): p. 1962. Workshop. 2004.