=Paper=
{{Paper
|id=Vol-1409/paper-05
|storemode=property
|title=DBpedia Atlas: Mapping the Uncharted Lands of Linked Data
|pdfUrl=https://ceur-ws.org/Vol-1409/paper-05.pdf
|volume=Vol-1409
|dblpUrl=https://dblp.org/rec/conf/www/ValsecchiABTM15
}}
==DBpedia Atlas: Mapping the Uncharted Lands of Linked Data==
<pdf width="1500px">https://ceur-ws.org/Vol-1409/paper-05.pdf</pdf>
<pre>
                DBpedia Atlas: Mapping the Uncharted Lands
                               of Linked Data

                  Fabio Valsecchi                       Matteo Abrate                       Clara Bacciu
              Institute of Informatics and        Institute of Informatics and       Institute of Informatics and
                Telematics, CNR Pisa                Telematics, CNR Pisa               Telematics, CNR Pisa
            fabio.valsecchi@iit.cnr.it matteo.abrate@iit.cnr.it clara.bacciu@iit.cnr.it
                             Maurizio Tesconi         Andrea Marchetti
                               Institute of Informatics and        Institute of Informatics and
                                 Telematics, CNR Pisa                Telematics, CNR Pisa
                             maurizio.tesconi@iit.cnr.it andrea.marchetti@iit.cnr.it

ABSTRACT                                                          Linked Data and put its full potential at use. Other cate-
In the last few years, Linked Open Data sources have ex-          gories of users surely have interest in LOD sets, but, lack-
tremely increased in number. Despite their enormous po-           ing a deep expertise, they may find it difficult to make
tential, it is really hard to find effective and efficient ways   sense of their content or structure [6]. In our opinion, such
for navigating and exploring them, mainly because of com-         non-expert users (e.g., application developers, students, re-
plexity and volume issues. In fact, application developers,       searchers in other fields) often have the need to look at a
students and researchers that are not experts in Semantic         dataset and see the whole picture, getting an answer to the
Web technologies often lose themselves in the intricacies of      somewhat naive question “What is the dataset like?”. More
the Web of Data. We propose to address this problem by            specifically, they can benefit from having a feel of how big it
providing users with a map-like visualization that acts as an     is in terms of instances, relationships and properties, what
entry point for the exploration of a dataset. To this end, we     kind of entities it contains, how they are organized, how
adapt a spatialization approach, based on cartographic and        they are connected to each other, and so on. Answering
information visualisation techniques, to make it suitable for     those questions can prove to be fundamental in promoting
Linked Data sets with a hierarchical ontological structure.       knowledge about these datasets, fostering their growth and
Finally, we apply our method on DBpedia, implementing             driving their adoption for a variety of applications. Informa-
and testing a prototype web application that shows a com-         tion visualization techniques have already been proposed to
prehensive and organic representation of the more than 4          address similar needs [4], because of their effective exploita-
million instances defined by the dataset.                         tion of the innate human ability of acquiring information
                                                                  through vision. Nevertheless, to the best of our knowledge,
                                                                  the existing works are either focused on the exploration of
Categories and Subject Descriptors                                small groups of entities or on the presentation of aggregated
H.5.0 [Information Interfaces and Presentation]: Gen-             data. What is currently missing is an entry point, something
eral                                                              that could lead a user from an overview of the main features
                                                                  of a dataset to its tiniest details.
                                                                     We propose to use a map-like interactive visualization to
Keywords                                                          serve as such an entry point. If designed by taking carto-
Linked Data, Information Visualisation, Cartography               graphic principles into account, a map can leverage both
                                                                  innate visual perception abilities and learned map-reading
                                                                  skills to attain a high level of efficacy in communicating fea-
1.     INTRODUCTION                                               tures of large scale, complex structures [15, 1]. A zoomable
  During the last few years, the amount of available datasets     map also nicely embodies Ben Shneiderman’s well-known
based on the Linked Open Data (LOD) paradigm has ex-              Visual Information-Seeking Mantra (“Overview first, zoom
tremely increased1 . However, virtually no one outside the        and filter, then details-on-demand”) [14, 6], according to
Semantic Web community is able to completely understand           which the overview should always come first in a visualiza-
1                                                                 tion, since it provides the general context of a dataset, and
    Statistics are available at http://lod-cloud.net.
                                                                  only in a second moment users should be able to load more
                                                                  detailed information. To obtain such a map, a process of
                                                                  spatialization (i.e., the assignment of position and shape to
                                                                  abstract, non-geometrical data) becomes necessary. We pro-
                                                                  pose an adaptation of the work by Auber et al. on Gosper
                                                                  treemaps [2] to the case of LOD sets with a hierarchical
                                                                  ontological structure. The approach enables the automatic
                                                                  generation of stable 2D maps that show the entirety of the
                                                                  entities contained in the dataset, forming a hierarchy of re-
Copyright is held by the author/owner(s).                         gions according to their ontological class. Such maps can
WWW2015 Workshop: Linked Data on the Web (LDOW2015).
Figure 1: A screenshot of DBpedia Atlas, available online at http://wafi.iit.cnr.it/lod/dbpedia/atlas. The
code is open source and hosted on GitHub (https://github.com/fabiovalse/dbpedia_atlas). The search box
on the top left allows users to search for a specific instance. On the map, a yellow placemark identifies the
selected instance, while the red links show the locations of the resources related to it. The infobox on the
right reports the information about the selected instance such as its label, classes, data properties, incoming
and outgoing relations.


then be used as a foundational layer for the creation of a        [8] addresses the task of revealing if and how two given re-
collection of thematic maps and ancillary charts, forming an      sources are connected, by visually showing all the paths be-
atlas describing many different aspects of the dataset. Our       tween them. gFacet [9] allows the navigation of a LOD set
method is applied to the English version of the DBpedia           combining graph-based visualization with faceted filtering
knowledge base [3], obtaining a comprehensive interactive         techniques. All the aforementioned applications make use
visualization of the more than 4 million instances defined        of a node-link representation that allows to clearly identify
by its RDF triples, as well as additional representations of      the relations between resources, but fails to scale to large
different aspects of the dataset. Users involved in prelimi-      amounts of data. Among other solutions, DBpedia viewer
nary tests of the resulting prototype were able to get insights   [12] is a web application for searching resources and con-
about some non-obvious and not-so-known features of DB-           sulting the available information as text, images, geograph-
pedia, proving the usefulness of the approach not only as a       ical maps and raw data. LodView2 is a tool for navigat-
presentation tool, but also as a visual exploration system.       ing LOD sources through a user-friendly interface based on
                                                                  a single-instance view. Spacetime [16] allows to implicitly
1.1    Related Work                                               perform SPARQL queries over spatio-temporal data and vi-
                                                                  sualize their result on a geographical map connected to a
  The need to visualize LOD is an important issue in the          timeline. Linked Data Query Wizard [10] is an analysis tool
Semantic Web community. In fact, several works have al-           for searching resources, filtering them, refining and visual-
ready tackled the problem. LodLive [5] is an RDF browser          izing the output in the form of different diagrams. All the
that allows to explore LOD by manually creating a node-link       works mentioned above provide useful techniques for navi-
diagram. Starting from a given URI, the user can expand
                                                                  2
the diagram by following links to other resources. RelFinder          http://lodview.it/
                                                             Scale                           Visualization
                                                 Whole                 Single                 Technique
                                                            Subset
                                                 Dataset              Instance
                  LodLive                                                          node-link, infobox
                  RelFinder                                                        node-link, infobox
                  gFacet                                                           list, node-link
                  DBpedia Viewer                                                   infobox
                  LodView                                                          infobox
                  Spacetime                                                        geomap, timeline, infobox
                  Linked Data Query Wizard                                         table, node-link, various
                  LOD Visualization                                                treemap, tree
                  DBpedia Atlas                                                    map-like visualization, infobox

Table 1: This table shows a comparison of our proposal with eight applications found in literature. Most of
the applications represent a subset of a given Linked Data set and give a view of single instances. Only LOD
Visualization provides a visualization of the whole dataset but it does not represent single instances.


gating LOD. However, they are focused on the exploration             2.1   Data abstraction
of single entities or a small group of them, neglecting to show        Since hierarchical ontologies are often the structure upon
an effective overview of the whole data source. This aspect is       which Linked Data sets are based [6], we consider the set of
one of the key points of Shneiderman’s Mantra. Other works           RDF triples of DBpedia to form a compound network, i.e., a
present some kind of overview: LODVisualization3 is a pro-           structure defined by a graph with an associated tree. In our
totype based on the Linked Data Visualization Model [4],
and offers different diagrams such as an interactive treemap
and an indented tree representing class hierarchies. The for-
                                                                                class
mer shows a compact overview of a data set, but it does not
                                                                                nodes
                                                                                                                     VL
provide the detailed information about the resources within
it. In the latter, the ontology is clearly visualized but no
overview is shown, since the number of classes makes the
diagram too long to be displayed in a single view.                                                                   TL


2.     DESIGN                                                                   instance
                                                                                                                     RL
                                                                                nodes
   DBpedia Atlas is designed as an interactive, web-based vi-
sualization that allows different kinds of users to understand
and benefit from a complex RDF dataset such as DBpe-                 Figure 2: A graph and an associated tree define a
dia. The application is primarily meant for those users who          compound network. In our case, it is composed by
are not proficient in semantic web technologies but are in-          class nodes, instance nodes, vocabulary links (VLs),
terested in learning, researching, or developing applications        relationship links (RLs) and type links (TLs).
specifically on DBpedia. To a lesser extent, casual users in-
terested in doing some research about a given subject could          case (Figure 2), it comprises two kind of nodes: class nodes,
benefit from the map as a complementary way of accessing             which define the hierarchical structure, and instance nodes,
Wikipedia content.                                                   which are the nodes of the graph. More precisely, we define
   Our primary goal is to provide these users an overview.           an instance node for each distinct URI found as subject or
Hence, we first define some high-level tasks that they should        object of an RDF triple. In order to avoid to take exter-
be able to perform by looking at the visualization at a glance:      nal resources into account, we filter out URIs not prefixed
i) get a feel of the size of the dataset; ii) see the main as-       by http://dbpedia.org/resource/. Three kinds of links
pects of its structure; iii) approximately compare different         are also defined [7]: vocabulary links (VL) are derived from
parts of its structure in terms of both size and complexity.         the DBpedia infobox ontology (i.e., rdfs:subClassOf), re-
Secondly, we define more specific tasks, to characterize the         lationships links (RL) express various types of connections
user’s wish to get detailed information by interacting with          between two instances (e.g., dbpedia-owl:birthPlace for
the visualization space: i) locate a class; ii) search for or        Galileo Galilei and Pisa), and type links (TL) connect class
locate an instance; iii) consult its properties; iv) browse the      nodes to instance nodes, describing the membership of an
list of its connections; v) explore to find the location of its      instance to a class (i.e., rdf:type). Of the many TLs that a
related instances; vi) discover which are the classes to which       single instance could feature (e.g. Scientist, Person, Agent
it is more connected; vii) compare its connections with the          and Thing for Galileo Galilei), we consider only the one
ones of other instances.                                             leading to the most specific class in the ontology (e.g. Sci-
                                                                     entist for Galileo Galilei), since the other ones can be in-
3
    http://lodvisualization.appspot.com/                             ferred by walking up the ontology tree. We ran an ad-hoc
script that verified that no instance node is connected to                 zoom and pan at will. The main island represents
multiple class nodes belonging to different branches (i.e, no              owl:Thing (i.e., the root of the ontology) while the
entity has incompatible classes). In the resulting compound                colored regions identified by the uppercase labels rep-
network, 476 class nodes constitute the tree, while 4,232,628              resent its direct children (e.g., Agent, Place, Work,
instance nodes and 15,077,186 RLs compose the graph. We                    Species and so on). Instances with missing types are
do not consider all the 721 class nodes currently included                 shown in the smaller island at the bottom left. Re-
in the DBpedia ontology tree4 because we prune the tree                    gions having an area of suitable size show a label from
branches to which no instances are connected. Since the                    the beginning, while labels of minor regions are loaded
automatic attribution of a class to a DBpedia entity from                  when zooming in. The zoom behaviour allows to fil-
the corresponding Wikipedia infobox may lead to errors [13],               ter out certain regions and to focus the attention to
our compound network is characterized by large amounts of                  other ones. Some notable instances have been man-
instance nodes connected to very generic class nodes (e.g.,                ually identified and have been given a label that is
Leonardo da Vinci is classified simply as Person, while it                 always visible, in order to provide the users with ad-
could have been more specifically typed as Artist or Scien-                ditional, city-like landmarks to get orientation in the
tist). It is also worth noticing that about 500,000 instance               map and to identify some basic categories. Selecting
nodes in our network have no associated class node. Such                   an instance on the map loads its details in the infobox
entities may have a URI but still lack their own Wikipedia                 (on the right). All the instances connected to it are
page (i.e., the “red links” appearing in Wikipedia articles),              also depicted in the map as a distribution of red dots.
or be the result of an error of the aforementioned automatic               Two thematic maps can also be loaded: one showing
classification.                                                            the depth of the classes in the DBpedia ontology hier-
                                                                           archy, and the other showing the average outdegree of
2.2     Interactive Visualization                                          instances contained in each class (Figure 6).
   The spatialization process upon which our visualization
is based adopts a treemap approach [11], following the re-              2. Search box. This component (top left of the interface),
sults of Auber et al. on Gosper treemaps [2]. Treemaps                     allows to perform a text search about a specific in-
are in general able to represent big and complex trees in a                stance by using the DBpedia lookup service [3]. The
small amount of space, trading the explicit representation                 selection of one of the resulting instances triggers the
of hierarchical links for compactness. Gosper treemaps have                displaying of its position and distribution of connected
the additional feature of being able to represent each leaf                entities on the map, and the loading of its details in
of the tree as a hexagonal tile with a specific position, at               the infobox;
the expense of some compactness and simplicity. In both
cases, internal nodes of the tree are implicitly represented            3. Infobox. Shows the title, classes, data properties, in-
as a hierarchy of regions contained into one another.                      coming and outgoing relations of an instance. Links
   Gosper treemaps come with the additional benefit of pro-                to DBpedia online and Wikipedia are also provided.
ducing geographic-like regions, which helps users to instinc-              Data is loaded within this container when the user
tively read the visualization as they would with a geographic              selects an instance from the map or from the search
map. Thus, in our approach, each instance node (i.e., each                 box. Moreover, by clicking on an outgoing or incom-
entity from DBpedia) is given a position into a hexagonal                  ing property, it is possible to follow the connection to
tiling. Entities belonging to the same class are placed near               another instance.
one another, and positioned in the same region. Unfortu-
nately, though, two entities that are neighbors in the tiling      3.     PRELIMINARY EVALUATION
do not necessarily belong to the same class. By construc-
                                                                      To asses the usefulness of our approach and get an early
tion, the size of a region corresponding to a class node is
                                                                   feedback, we carried out a preliminary formative evaluation
proportional to the amount of instance nodes having that
                                                                   of our prototype. We briefly presented the purpose of DB-
class or a subclass of it (e.g., Person takes Galileo Galilei
                                                                   pedia Atlas to five users with different backgrounds: three
into account, even if its most specific type is Scientist).
                                                                   technical users without a specific expertise on Semantic Web
   The layout algorithm of Gosper treemaps is also order-
                                                                   technologies, and two lay users with no scientific or technical
preserving and stable, i.e., a small modification of the dataset
                                                                   background. Then, we observed their free interaction with
would cause only a small change in the map5 , making it ideal
                                                                   the system, and asked them to answer some questions to as-
for an ever-changing Linked Data set like DBpedia. It would
                                                                   sess their ability to perform the tasks introduced in Section
in fact be confusing for users to explore a newer map of the
                                                                   2. Finally, we asked them to compare the application with
same dataset and see a very different spatial arrangement.
                                                                   other solutions and to complete a short questionnaire.
   The interface of the application (Figure 1) comprises three
                                                                      Participants found DBpedia Atlas easy to read and to op-
main components that work together in order to provide
                                                                   erate with, giving it an average score of 4 in a scale from 0 to
overview, zoom and filter and details on demand.
                                                                   5. They also found it useful (3.6/5 on average), especially to
    1. Map. It initially provides the overview of all the in-      get a general feel of the dataset. Two of them were skepti-
       stances and classes in DBpedia, allowing the user to        cal about the level of detail of the map, expressing the need
4                                                                  to see more information as they progressed with the zoom.
 http://mappings.dbpedia.org/server/ontology/classes/              All of them reported to prefer DBpedia Atlas over LodLive
5
 This is true only when both the original and the modified         [5] and RelFinder [8] as an entry point for the exploration
tree are ordered by following the same criterion. In order
to ensure this and be able to keep a similar map for fu-           of the dataset, but RelFinder was pointed out to be more
ture updates of the dataset, we transform the tree from our        useful for a specific task unsupported by our map (i.e., to
compound network into its canonical ordering form [17].            find paths between two instances).
Figure 3: The distribution of in-          Figure 4: When Apple Inc.          is     Figure 5: The distribution for the
stances (red dots) connected to the        selected, the amount of websites          instance Microsoft is more similar
entity Google (yellow placemark).          decreases significantly, while Soft-      to the one for Apple Inc. than it
A large number of dots gathers in          ware becomes much more promi-             is for Google. However, with re-
the Website region (top left) and in       nent. This is especially true for         gards to both Device and Website,
the Software region (top middle).          the lowest part of the region (Video      it seems that Microsoft falls some-
                                           Game). An interesting conglomer-          where in between the other two.
                                           ate appears on the left (Device).


Figure 6: Two examples of thematic maps. The first one shows the depth of the classes in the DBpedia
ontology hierarchy (the darker, the deeper). The second one shows the average outdegree of instances
contained in each class (the darker the color, the higher the average outdegree). By inspecting the interactive
maps, it can be seen that the deepest level of the ontology corresponds to the small Diocese class (top right),
and that the highest average outdegree is found in Soccer Manager, Jockey and Horse Trainer (bottom right).
Conversely, CareerStation, PersonFunction and TimePeriod, while vast, have the lowest depth and the lowest
average outdegree.


   When asked to estimate the amount of instances in the         Species) were quickly identified from the initial overview,
map, almost all the participants replied with a number greater   while minor ones were inspected by zooming in. In one case,
than a few millions, proving to get a feel of the vastness of    a user reported to give more importance to detailed regions
the dataset. All the participants showed no difficulties in      (i.e., with many subdivisions) rather than to big ones. Three
interpreting the regions as more and more refined classifica-    participants got curious about the big and flat CareerSta-
tions of the entities composing the map, nor in relating the     tion class, and tried to understand its meaning by selecting
size of regions to the amount of instances of that class. The    random entities from the region (discovering that it contains
largest classes of the ontology (e.g., Agent, Place, Work and    information about the career of people, mostly athletes).
   Users selected various instances and compared their dot         [5] D. V. Camarda, S. Mazzini, and A. Antonuccio.
distributions of connected entities, sometimes noting a steep          Lodlive, exploring the web of data. In Proc. of the
difference in the amount of connections. Some interesting              International Conference on Semantic Systems, 2012.
patterns were also found, as in the case of the comparison         [6] A.-S. Dadzie and M. Rowe. Approaches to visualising
between Google, Apple Inc. and Microsoft (see Figures 3, 4             linked data: A survey. Semantic Web, 2011.
and 5 for more details). Uncommon connections sometimes            [7] T. Heath and C. Bizer. Linked data: Evolving the web
popped to the eye of participants when a selection showed a            into a global data space. Synthesis lectures on the
dot in an unexpected region. For example, when one of them             semantic web: theory and technology, 2011.
selected the instance Dog from the Species class, he noticed       [8] P. Heim, S. Hellmann, J. Lehmann, S. Lohmann, and
a lone connection in the Food region, revealing that Saksang           T. Stegemann. Relfinder: Revealing relationships in
is an Indonesian dish made of dog and pork. Thematic maps              rdf knowledge bases. In Semantic Multimedia. 2009.
(Figure 6) got mixed reactions from users, which described         [9] P. Heim, J. Ziegler, and S. Lohmann. gfacet: A
them as very informative but harder to read than the base              browser for the web of data. In Proc. of the
map, especially because of difficulties in the interpretation          International Workshop on Interacting with
of label-region correspondence.                                        Multimedia Content in the Social Semantic Web, 2008.
                                                                  [10] P. Hoefler, M. Granitzer, E. Veas, and C. Seifert.
4.   CONCLUSIONS AND FUTURE WORK                                       Linked data query wizard: A novel interface for
   We presented DBpedia Atlas, a web application for ex-               accessing sparql endpoints. In Proc. of Linked Data on
ploring instances, relations and classes of DBpedia. By using          the Web at WWW, 2014.
this application, users can obtain a grasp of the fundamental     [11] B. Johnson and B. Shneiderman. Tree-maps: A
properties of the dataset, browse it, and get several interest-        space-filling approach to the visualization of
ing insights, without the need to be experts of Semantic Web           hierarchical information structures. In IEEE Proc. of
technologies. The underlying approach we propose, based on             Conference on Visualization, 1991.
cartography and information visualisation techniques, can         [12] D. Lukovnikov, C. Stadler, D. Kontokostas,
be reused for visualizing and exploring other LOD sets with            S. Hellmann, and J. Lehmann. Dbpedia viewer-an
hierarchical ontologies. Several improvements can be intro-            integrative interface for dbpedia leveraging the
duced to the current prototype. Data can be updated to                 dbpedia service eco system. In Proc. of Linked Data
reflect the current status of DBpedia online6 . A formal user          on the Web at WWW, 2014.
study with a greater number of participants can be carried        [13] H. Paulheim and C. Bizer. Type inference on noisy rdf
out to better validate the approach and to get more feed-              data. In Internatioan Semantic Web Conference. 2013.
back. Specific improvements can be made to the map visual-        [14] B. Shneiderman. The eyes have it: A task by data
ization, in order to increase its expressive power. In particu-        type taxonomy for information visualizations. In IEEE
lar, a ranking factor (based for example on the degree of an           Symposium on Visual Languages, 1996.
instance node, or on the length or the popularity of the corre-
                                                                  [15] A. Skupin. From metaphor to method: Cartographic
sponding Wikipedia article) could be adopted to display the
                                                                       perspectives on information visualization. In IEEE
most important instances (i.e., “cities”) at each zoom level.
                                                                       Symposium on Information Visualization, 2000.
Moreover, a concept of distance between instances can be
                                                                  [16] F. Valsecchi and M. Ronchetti. Spacetime: a two
introduced to complement the treemap approach. We are
                                                                       dimensions search and visualisation engine based on
currently investigating an ontology-independent similarity
                                                                       linked data. In The Eighth International Conference
measure that would pack similar entities together regardless
                                                                       on Advances in Semantic Processing, 2014.
of their class. This approach could prove to be useful to
define a meaningful spatialization for vast regions of entities   [17] R. A. Wright, B. Richmond, A. Odlyzko, and B. D.
having the same class or no class at all, and it would open            McKay. Constant time generation of free trees. SIAM
our approach to datasets without a hierarchical ontology.              Journal on Computing, 1986.


5.   REFERENCES
 [1] M. Abrate. Data Cartography: atlases and maps for
     non-geographical data. PhD thesis, 2014.
 [2] D. Auber, C. Huet, A. Lambert, B. Renoust,
     A. Sallaberry, and A. Saulnier. Gospermap: Using a
     gosper curve for laying out hierarchical data. IEEE
     Trans. on Visualization and Computer Graphics, 2013.
 [3] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer,
     C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia-a
     crystallization point for the web of data. Web
     Semantics: science, services and agents on the world
     wide web, 2009.
 [4] J. M. Brunetti, S. Auer, and R. Garcı́a. The linked
     data visualization model. In International Semantic
     Web Conference, 2012.
6
  Our work is based on the latest available DBpedia dump
(2014). Subsequent updates are not included in our map.

</pre>