=Paper= {{Paper |id=Vol-2083/paper-15 |storemode=property |title=An Interactive 3D Visualization for the LOD Cloud |pdfUrl=https://ceur-ws.org/Vol-2083/paper-15.pdf |volume=Vol-2083 |authors=Maria-Evangelia Papadaki,Panagiotis Papadakos,Michalis Mountantonakis,Yannis Tzitzikas |dblpUrl=https://dblp.org/rec/conf/edbt/PapadakiPMT18 }} ==An Interactive 3D Visualization for the LOD Cloud== https://ceur-ws.org/Vol-2083/paper-15.pdf

An Interactive 3D Visualization for the LOD Cloud
Maria-Evangelia Papadaki, Panagiotis Papadakos, Michalis Mountantonakis, Yannis Tzitzikas
Institute of Computer Science, FORTH-ICS, GREECE, and
Computer Science Department, University of Crete, GREECE
{marpap|papadako|mountant|tzitzik}@ics.forth.gr

ABSTRACT that of an urban area where each dataset is visualized as a build-
The LOD (Linked Open Data) cloud currently contains thousands ing. An indicative screenshot of the LOD cloud according to the
of published datasets. Existing visualizations, like the Linking interactive 3D visualization that we propose, is shown in Fig-
Open Data cloud diagram, are useful for getting an overview of ure 1(right). In a nutshell the contributions of this paper are:
its size, the datasets and their connectivity. An interesting ques- (i) it introduces and motivates a novel interactive 3D model for
tion is whether we could come up with more informative and LOD datasets that adopts the metaphor of urban area, (ii) it in-
more interactive visualizations that could make evident more troduces several variations of the model, and discusses the pros
features of the datasets for aiding the inspection and the dis- and cons of each one, and (iii) it demonstrates the application of
covery of related datasets. To this end we propose an interac- the model over the datasets of each domain (government, me-
tive 3D visualization that adopts the metaphor of urban area. dia, etc.) and the entire LOD cloud. The rest of this paper is
In brief, each dataset is visualized as a building, whose features organized as follows: §2 describes the context, §3 describes the
(e.g. volume) reflect various dataset’s features (e.g. number of main components of the interactive 3D model, and its applica-
triples), while the proximity of the buildings (and other features) tion, §4 describes the implementation of the visualization system
indicates the commonalities of the datasets. The introduced ap- as well as directions that are worth further work and research,
proach supports various shapes of buildings and various place- and finally §5 concludes the paper. A running prototype is al-
ment algorithms: mountainside, orthogonal spiral, concentric spi- ready available to the community and it is accessible through
ral, and similarity-based adaptations of force-directed algorithms. http://www.ics.forth.gr/isl/3DLod (needs a recent web browser
The visualization is interactive, i.e. it allows the user to zoom in supporting WebGL).
any part of the model, to change the perspective, to change the
shape of the buildings and their placement, to see all the con- 2 CONTEXT
nections or only those of one dataset, and others. The paper de- Visualization has been recognized as important for dataset dis-
tails the construction process and provides examples over real covery and dataset selection [10], which consist two of the most
datasets including the entire LOD cloud, and describes the pros emerging challenges for the web of data [5, 9]. A number of vi-
and cons of each layout. sualization approaches and tools for Linked Data have been pro-
posed, some indicative of which are described in [6]. The most
KEYWORDS widely known visualization diagram of the LOD is the 2D Link-
Linked Data, Connectivity of Linked datasets, Interactive 3D Vi- ing Open Data cloud diagram, which consists of datasets that
sualization have been published in Linked Data format by contributors to
the Linking Open Data community project and other individu-
als and organisations. It is based on metadata collected and cu-
1 INTRODUCTION rated by contributors to the datahub.io as well as on metadata ex-
During the last years we observe an increasing trend towards tracted from periodic crawls of the Linked Data web. The 2014
publishing data as LOD. Thousands of datasets have been pub- crawled version of the diagram is shown in Figure 1(left). We
lished and various visualizations that give an overview of their refer to the Linking Open Data cloud that was available from
number and interconnections have been proposed (e.g. see [1, 7, 2014-08-30 to 2017-01-251 that contains datasets from the fol-
8]). The classical visualization of the LOD cloud (Figure 1(left)), lowing nine domains (in parenthesis the percentages of datasets
depicts each dataset as a circle (whose size indicates the size of that fall in each category): government (23.85%), publications
the dataset in triples). The commonalities between two datasets (23.33%), social web (15.78%), life sciences (11.05%), cross-domain
(in terms of common URIs) are made evident by an edge that con- (7.19%), user-generated content (7.36%), geographic (4.21%), me-
nects the dataset’s circles. Such visualizations are useful for get- dia (3.68%), and linguistics (3.50%). The size of the circles corre-
ting an overview of the entire LOD cloud, or for a part of it, or for sponds to the number of triples in each dataset. Only five sizes
a particular set of RDF datasets. There are various visualization- of circles (very large, large, medium, small, very small) are sup-
driven tasks. In our work we focus mainly on tasks related to ported each corresponding to a particular size interval (> 1 B,
datasets inspection, datasets monitoring, dataset selection and nav- 10M-1B, 500K-10M, 10K-500K, < 10K resp.). The arrows between
igation across multiple linked datasets. The basic question we two circles indicate the existence of at least 50 links between
address here, is: can we come up with visualizations of the LOD the corresponding two datasets. A link is considered as an RDF
cloud which are more informative (i.e. which can make evident triple where subject and object URIs are in the namespaces of
more “features" of the datasets) and are easily conceivable? Based different datasets, while the direction of the arrows indicates
on this motivation, in this paper we propose an interactive 3D the dataset that contains the links. The thickness of the arrow
visualization that adopts a quite familiar metaphor, specifically corresponds to the number of links. Three levels of thickness
are supported (thin, medium, thick) each corresponding to one
© 2018 Copyright held by the owner/author(s). Published in the Workshop interval ((0, 1K), [1K, 100K) and [100K, ∞) respectively). Finally,
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permit-
1 Accessible through http://lod-cloud.net/versions/2014-08-30/lod-cloud_colored.svg
ted under the terms of the Creative Commons license CC-by-nc-nd 4.0.

100
Figure 1: Left: The LOD cloud diagram. Right: One perspective of the introduced interactive LOD 3D model.

each circle is colored differently for indicating the 9 different do- b Smal l mode the cube corresponds to the smallest dataset. Con-
mains of the datasets. A new version of the Linking Open Data sequently, the buildings of the datasets that have enough triples
cloud diagram was released on 2017-01-26.2 That version con- tend to become skyscrapers.
tains almost double the number of datasets (i.e. 1163). Datasets In (c), i.e. “feature-based" cuboids, the shape depends on the
are again visualized as circles however only three sizes of cir- features of the corresponding datasets. Since |triples(Si )| ≈ (|Ui |
cles (large, medium, small) are supported. The links are interac- +|Li |+|BNi |)∗Deд(Ui ), the height of the building is set to be anal-
tive and their direction is indicated through color. However, the ogous to |Ui | + |Li | + |BNi |, and the footprint of the building anal-
clustering of the datasets is not favorable in all cases and the la- ogous to Deд(Ui ). Specifically, assuming square footprints, p we
bels are less readable in comparison to the 2014-2017 version of have heiдht(bi ) = |Ui | + |Li | + |BNi | and width(bi ) = Deд(Ui ).
the diagram. The volume of the building bi approximates |triples(Si )|; if its
degree is low it will become a high building with a small foot-
3 AN INTERACTIVE URBAN 3D print, whereas if its degree is high then the building will be less
VISUALIZATION FOR LOD DATASETS tall but will have a big footprint.
For getting building sizes that resemble those of a real urban
3.1 Dataset Notations area, a calibration is required. For this reason we introduce an
Let S = S 1 , . . . Sk be the set of datasets. Each dataset Si con- additional parameter F , through which we can obtain the de-
sists of a set of triples (i.e., a set of subject-predicate-object sired average ratio of height/width of the buildings. Specifically,
statements), denoted by triples(Si ). We shall use Ui to denote the let r be the desired ratio (e.g. 3 for three-floor buildings) pro-
URIs, Li to denote the literals and BNi to denote the blank nodes vided by the user. We can add a parameter F to the definition
that appear in triples(Si ). Hereafter, we consider only those URIs of heiдht and pwidth: heiдht(bi ) = (|Ui | + |Li | + |BNi |)/F and
that appear as subjects or objects in a triple as our primary focus width(bi ) = Deд(Ui ) ∗ F . Note that any positive value of F
is on the data (not on schema). The number of common URIs be- yields a pair of heiдht(bi ) and width(bi ) that preserves the vol-
tween two datasets Si and S j , is given by |Ui ∩ U j |. We define ume. What is left to do is to select the F for obtaining the de-
the Links between two datasets as follows: Linksi, j = Ui ∩ U j . sired average ratio r . This reduces to finding the F such that
If T is a set of triples, then we can define the degree of a URI heiдht (b )
r = avд { w idth(b i) | 1 ≤ i ≤ k }. The solution of this equa-
e in T as: deдT (e) = |{(s, p, o) ∈ T | s = e or o = e }|, while v
u
t
i
!2
for a set of URIs E we can define their average degree in T as Í |S | √
3 i =1 (|Ui |+ |L i |+ |BN i |)/ Deд(Ui )
deдT (E) = avдe ∈E (deдT (e)). Now for each dataset Si we can tion is: F = r ∗ |S | . Obviously in-
compute the average degree of the elements in Ui by considering stead of avд one can specify the min or max desired ratio and in
triples(Si ), i.e.: Deд(Ui ) = avдe ∈Ui (deдtr ipl es(S i ) (e)). that case the formula is changed accordingly, e.g., for the max
desired ratio, we should
v first compute for each dataset the fol-
3.2 Buildings Representation u
t !2
√
(|Ui |+ |L i |+ |BNi |)/ Deд(Ui )
The main idea is that we visualize each dataset Si as a building lowing number: Fi =
3
r . Then, we
bi . The volume of each building represents the number of triples
of the respective dataset ( |triples(Si )| ). As regards the types of should sort all Fi in descending order and select the max Fi . By
the buildings, we support the following options: (a) cubes, (b) selecting the max Fi as the F value in all heiдht(bi ) and width(bi ),
“context"-dependent cuboids, and (c) “feature"-based cuboids. then that guarantees that all buildings will have ratio ≤ r .
In (a), p
each dataset Si is represented by a cube with edge length
equal to 3 |triples(Si )|. 3.3 Placement of the Buildings
In (b) we use “context-dependent" cuboids. The footprint of Below we describe four different building layout approaches, that
the buildings is computed based on either the biggest dataset our system supports.
(b Biд mode) or the smallest dataset (b Smal l mode). In the b Biд 1. Mountainside Layout. The k buildings are placed in an or-
mode the building of the biggest dataset is a cube, while in the √ √
thogonal ⌈ k⌉ × ⌈ k⌉ grid. The biggest building is placed in
one edge of the square area. The second bigger is placed next
2 by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard to the first, and so on, until reaching the end of a row, where it
Cyganiak, http://lod-cloud.net/ continues the same procedure in the next one until there are no

101
Table 1: Comparison of visualizations for Linked Datasets

Aspect Mountain Side Orthogonal Spiral Cyclic Spiral Similarity-based layout LOD Cloud Diagram (2D)
Accurate size X X X X only 5 sizes
Features (e.g. Degree) X X X X
Connectivity X X X X X
Interactive rich rich rich rich poor (only 2D with zoom)
Time Complexity O (n) O (n) O (n) O (e ∗ n 2 ) not mentioned
Distinctive Readability Fast overview of Effective space exploita- Focus on connectivity Label readability, connec-
characteristic datasets’ sizes tion even for power law (less edge crossings) tions, clustered datasets
distributed datasets by domain

(c) collisions should never occur,
(d) no big empty spaces, especially in the outer area that hosts
the majority of the buildings which are small.
For the above requirements, we devised a new 2D placement al-
gorithm called Concetric Spiral. The buildings, in an descending
order with respect to their size, are placed on concentric rings.
The radius of the first (smallest) ring is the size of the biggest
building. The placement of the subsequent buildings is done as
follows. We compute the minimum chord that is required to
avoid collisions based on the sizes of the current and the pre-
vious building. Then we compute the corresponding angle, and
we place the new building at the corresponding spot of the cir-
cle. The sought angle is θ = 2 arcsin( 2·rchor d
adius ). Just before we
Figure 2: Visualizations of the social datasets in LOD 2014 reach 2π , we start the next bigger ring whose radius, is the ra-
dius of the previous ring increased by the size of the last drawn
building (plus a number accounting for “roads"). In this way
the concentric rings become denser as the buildings get smaller
avoiding the unnecessary empty spaces. The algorithm is ap-
propriate for sets of datasets whose sizes vary a lot, even if they
exhibit a power law distribution (i.e. very few big datasets and
too many small ones, see [3, 9] for measurements about current
RDF datasets). A screenshot of the layout based on Cyclic Spiral
is shown in Figure 1 - right and Figure 2 - upper right. The algo-
rithm has O(n) time complexity (n is the number of buildings).
Figure 3: Similarity- Figure 4: Two ways to vi- 4. Similarity-based layout. According to this algorithm, the
based layout sualize the connections more commonalities two sources have (common URIs, common
literals, owl:sameAs relationships, etc.) the closer the correspond-
ing buildings are placed. One way to specify the location of each
buildings to draw. The result resembles a mountainside (Figure building is to adopt a force-directed placement algorithm. In our
2 - upper left). case, we have modified the Fruchterman-Reingold force directed
algorithm [4] as adapted to three.js3 . This algorithm satisfies
2. Orthogonal
√ √ Spiral. The k buildings are placed in an orthogo- the following two principles: a) vertices connected by an edge
nal ⌈ k⌉ × ⌈ k⌉ grid (see the two screenshots at the bottom part
should be drawn near each other and b) vertices should not be
of Figure 2). The biggest building is placed in the centre of the
area (summit). The process continues by adding growing enclos- drawn too close to each other. Figure 3 shows an indicative lay-
ing squares of size N . For example, the first contains 8 squares out produced by the similarity-based algorithm.
Comparison. Table 1 summarizes the distinctive characteris-
(3 above of the summit, one left and one right of the summit
and 3 below the summit). The next enclosing square contains tics points of each visualization approach including the 2D LOD
16 more buildings and so on. Each building is drawn following Cloud diagram. The value “rich" in the line “interactive" refers
to interactive selection, zooming, panning, rotation, and control
the clockwise direction. The result resembles a mountain whose
summit is at the centre of the 2D area. One shortcoming of this of visibility of labels and connections.
algorithm is that if we represent buildings with cubes then this
algorithm yields very sparse peripheral areas. This was actually 3.4 Visualizing the Links of Datasets
the motivation for the subsequent algorithm. If there are links between two datasets Si and S j then a line seg-
3. Cyclic Spiral. Based on the weaknesses of the previous layout ment is created, resembling a road that connects the correspond-
algorithms, we identified the following requirements for a better ing buildings (see the left side of Figure 4). The links can be also
layout algorithm: visualized as bridges (see the right side of Figure 4). The width of
(a) bigger buildings should be placed at the center these bridges/roads, indicates the strength of the connection that
(b) a spiral-like placement seems beneficial as it would result
to a round coverage of the space, 3 https://github.com/davidpiegza/Graph-Visualization

102
the correlated datasets have, and it is calculated by the division a strongly connected component could be visualized as a small
of the number of links between Si and S j with the number of round park (or roundabout) where only one line segment con-
links of the most connected pair (i.e., maxLinks): width(i, j) = nects each building to that park.
|Links(i, j)|
|max Links | .

3.5 Application Cases
We downloaded manually 287 RDF datasets including their con-
tent (i.e., triples, URIs, etc.) from the following resources: (a) the
dump of the data which were used in [11], (b) online datasets
from datahub.io website and (c) a subset of DBpedia version
3.9. To test the algorithms in even bigger datasets we managed
to find metadata from datahub.io for 600 datasets of various do-
mains. Comparing to the 287 datasets (see Figure 1(right)), for
most of these 600 datasets we were not able to access and down-
load their content (i.e., triples, URIs, etc.). However, we managed
to find some basic metadata for these datasets in datahub.io. Un- Figure 6: Overview of the web GUI of the system
fortunately, in datahub.io there is a lack of information for other
features of these datasets such as the number of URIs, literals,
blank nodes and degree of URIs. Therefore, it is not possible to 5 CONCLUDING REMARKS
produce feature-based buildings for these datasets, although the The proposed 3D interactive system: (i) illustrates accurately the
proposed visualizations can support feature-based buildings for relative sizes of the datasets in triples, (ii) can indicate the aver-
thousands of datasets. Figure 5 shows on the left side the cyclic age degree of the datasets, (iii) allows the user to control which
spiral layout and on the right side the orthogonal layout for this connections to show or hide, (iv) makes evident (through the
set of 600 datasets. layout algorithms) the differences in the sizes of datasets or their
commonalities. It supports various building types (cubes, context-
dependent cuboids, feature-based cuboids), as well as several
layout algorithms(mountainside, orthogonal spiral, cyclicSpiral,
similarity-based adaptations of force-directed algorithms), that
order the buildings appropriately, depending on the user needs,
and similarity-based adaptations of force-directed algorithms.
Acknowledgements. Work partially supported by a) the EU pr-
oject BlueBRIDGE (Building Research environments for foster-
Figure 5: 3D Visualizations of 600 datasets ing Innovation, Decision making, Governance and Education to
support Blue growth), H2020-EINFRA-2015-1, 2015-2018 and b)
the General Secretariat for Research and Technology (GSRT) and
the Hellenic Foundation for Research and Innovation (HFRI).
4 IMPLEMENTATION AND FUTURE STEPS
We have implemented a web-based visualization system, which REFERENCES
could be easily accessible by any user. We used the JavaScript [1] Aba-Sah Dadzie and Matthew Rowe. 2011. Approaches to visual-
library Three.js4 which in turn uses the WebGL API5 , which is ising Linked Data: A survey. Semantic Web 2, 2 (2011), 89–124.
http://dblp.uni-trier.de/db/journals/semweb/semweb2.html#DadzieR11
widely supported by all modern desktop and mobile browsers [2] Brian Danchilla. 2012. Three.js Framework. In Beginning WebGL for HTML5.
without the use of plugins. Three.js offers a less tedious pro- Apress, 173–203. https://doi.org/10.1007/978-1-4302-3997-0_7
gramming environment in comparison to WebGL, by abstract- [3] Javier D Fernández, Miguel A Martínez-Prieto, Pablo de la Fuente Redondo,
and Claudio Gutiérrez. 2017. Characterising RDF data sets. Journal of Infor-
ing away many of the WebGL details, which is a JavaScript API mation Science (2017).
that allows the creation of GPU accelerated 3D graphics and an- [4] Thomas MJ Fruchterman and Edward M Reingold. 1991. Graph drawing
by force-directed placement. Software: Practice and experience 21, 11 (1991),
imations inside the environment of a web browser [2] 6 . 1129–1164.
Figure 6 shows an overview of the web-based visualization [5] James Hendler. 2014. Data Integration for Heterogenous Datasets. Big data
system. The visualization is interactive, allowing the user to zoom 2, 4 (2014), 205–215.
[6] Shah Khusro, Fouzia Jabeen, Syed Rahman Mashwani, and Iftikhar Alam.
in any part of the model. For instance, one can change the per- 2014. Linked open data: towards the realization of semantic web-a review.
spective, the shape of the buildings or their placement, search for Indian Journal of Science and Technology 7, 6 (2014), 745–764.
a dataset through an auto-completion search, see all the connec- [7] Jakub Klimek, Jiri Helmich, and Martin Necasky. 2015. Use Cases
for Linked Data Visualization Model. In Proceedings of the Workshop on
tions or those of one dataset, and others. The presented model Linked Data on the Web (LDOW) (CEUR Workshop Proceedings). Aachen.
could be improved in several ways. Below we sketch two in- http://ceur-ws.org/Vol-1409/#paper-08
[8] Luca Matteis. 2014. VoID-graph: Visualize Linked Datasets on the Web. CoRR
dicative enrichments: (a) for aiding the user to get a more in- abs/1408.6691 (2014). http://arxiv.org/abs/1408.6691
formative and “live" overview immediately the system could be [9] Michalis Mountantonakis and Yannis Tzitzikas. 2016. On Measuring the Lat-
enriched with “guided tours" , i.e. with trails of camera move- tice of Commonalities Among Several Linked Datasets. Proceedings of the
VLDB Endowment 9, 12 (2016), 1101–1112.
ments over the space occupied by the buildings and (b) for reduc- [10] Peng Peng, Lei Zou, M Tamer Özsu, Lei Chen, and Dongyan Zhao. 2015. Pro-
ing the crossings of the edges, each set of buildings that forms cessing SPARQL queries over distributed RDF graphs. The VLDB Journal
(2015), 1–26.
4 three.js is available at http://threejs.org/
[11] Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption
5 https://www.khronos.org/webgl/
of the linked data best practices in different topical domains. In The Semantic
6 There are similar JavaScript libraries like GLGE, SceneJS, PhiloGL, etc. Web–ISWC 2014. Springer, 245–260.

103