=Paper= {{Paper |id=Vol-2083/paper-15 |storemode=property |title=An Interactive 3D Visualization for the LOD Cloud |pdfUrl=https://ceur-ws.org/Vol-2083/paper-15.pdf |volume=Vol-2083 |authors=Maria-Evangelia Papadaki,Panagiotis Papadakos,Michalis Mountantonakis,Yannis Tzitzikas |dblpUrl=https://dblp.org/rec/conf/edbt/PapadakiPMT18 }} ==An Interactive 3D Visualization for the LOD Cloud== https://ceur-ws.org/Vol-2083/paper-15.pdf
               An Interactive 3D Visualization for the LOD Cloud
     Maria-Evangelia Papadaki, Panagiotis Papadakos, Michalis Mountantonakis, Yannis Tzitzikas
                                         Institute of Computer Science, FORTH-ICS, GREECE, and
                                        Computer Science Department, University of Crete, GREECE
                                             {marpap|papadako|mountant|tzitzik}@ics.forth.gr

ABSTRACT                                                                                that of an urban area where each dataset is visualized as a build-
The LOD (Linked Open Data) cloud currently contains thousands                           ing. An indicative screenshot of the LOD cloud according to the
of published datasets. Existing visualizations, like the Linking                        interactive 3D visualization that we propose, is shown in Fig-
Open Data cloud diagram, are useful for getting an overview of                          ure 1(right). In a nutshell the contributions of this paper are:
its size, the datasets and their connectivity. An interesting ques-                     (i) it introduces and motivates a novel interactive 3D model for
tion is whether we could come up with more informative and                              LOD datasets that adopts the metaphor of urban area, (ii) it in-
more interactive visualizations that could make evident more                            troduces several variations of the model, and discusses the pros
features of the datasets for aiding the inspection and the dis-                         and cons of each one, and (iii) it demonstrates the application of
covery of related datasets. To this end we propose an interac-                          the model over the datasets of each domain (government, me-
tive 3D visualization that adopts the metaphor of urban area.                           dia, etc.) and the entire LOD cloud. The rest of this paper is
In brief, each dataset is visualized as a building, whose features                      organized as follows: §2 describes the context, §3 describes the
(e.g. volume) reflect various dataset’s features (e.g. number of                        main components of the interactive 3D model, and its applica-
triples), while the proximity of the buildings (and other features)                     tion, §4 describes the implementation of the visualization system
indicates the commonalities of the datasets. The introduced ap-                         as well as directions that are worth further work and research,
proach supports various shapes of buildings and various place-                          and finally §5 concludes the paper. A running prototype is al-
ment algorithms: mountainside, orthogonal spiral, concentric spi-                       ready available to the community and it is accessible through
ral, and similarity-based adaptations of force-directed algorithms.                     http://www.ics.forth.gr/isl/3DLod (needs a recent web browser
The visualization is interactive, i.e. it allows the user to zoom in                    supporting WebGL).
any part of the model, to change the perspective, to change the
shape of the buildings and their placement, to see all the con-                         2    CONTEXT
nections or only those of one dataset, and others. The paper de-                        Visualization has been recognized as important for dataset dis-
tails the construction process and provides examples over real                          covery and dataset selection [10], which consist two of the most
datasets including the entire LOD cloud, and describes the pros                         emerging challenges for the web of data [5, 9]. A number of vi-
and cons of each layout.                                                                sualization approaches and tools for Linked Data have been pro-
                                                                                        posed, some indicative of which are described in [6]. The most
KEYWORDS                                                                                widely known visualization diagram of the LOD is the 2D Link-
Linked Data, Connectivity of Linked datasets, Interactive 3D Vi-                        ing Open Data cloud diagram, which consists of datasets that
sualization                                                                             have been published in Linked Data format by contributors to
                                                                                        the Linking Open Data community project and other individu-
                                                                                        als and organisations. It is based on metadata collected and cu-
1    INTRODUCTION                                                                       rated by contributors to the datahub.io as well as on metadata ex-
During the last years we observe an increasing trend towards                            tracted from periodic crawls of the Linked Data web. The 2014
publishing data as LOD. Thousands of datasets have been pub-                            crawled version of the diagram is shown in Figure 1(left). We
lished and various visualizations that give an overview of their                        refer to the Linking Open Data cloud that was available from
number and interconnections have been proposed (e.g. see [1, 7,                         2014-08-30 to 2017-01-251 that contains datasets from the fol-
8]). The classical visualization of the LOD cloud (Figure 1(left)),                     lowing nine domains (in parenthesis the percentages of datasets
depicts each dataset as a circle (whose size indicates the size of                      that fall in each category): government (23.85%), publications
the dataset in triples). The commonalities between two datasets                         (23.33%), social web (15.78%), life sciences (11.05%), cross-domain
(in terms of common URIs) are made evident by an edge that con-                         (7.19%), user-generated content (7.36%), geographic (4.21%), me-
nects the dataset’s circles. Such visualizations are useful for get-                    dia (3.68%), and linguistics (3.50%). The size of the circles corre-
ting an overview of the entire LOD cloud, or for a part of it, or for                   sponds to the number of triples in each dataset. Only five sizes
a particular set of RDF datasets. There are various visualization-                      of circles (very large, large, medium, small, very small) are sup-
driven tasks. In our work we focus mainly on tasks related to                           ported each corresponding to a particular size interval (> 1 B,
datasets inspection, datasets monitoring, dataset selection and nav-                    10M-1B, 500K-10M, 10K-500K, < 10K resp.). The arrows between
igation across multiple linked datasets. The basic question we                          two circles indicate the existence of at least 50 links between
address here, is: can we come up with visualizations of the LOD                         the corresponding two datasets. A link is considered as an RDF
cloud which are more informative (i.e. which can make evident                           triple where subject and object URIs are in the namespaces of
more “features" of the datasets) and are easily conceivable? Based                      different datasets, while the direction of the arrows indicates
on this motivation, in this paper we propose an interactive 3D                          the dataset that contains the links. The thickness of the arrow
visualization that adopts a quite familiar metaphor, specifically                       corresponds to the number of links. Three levels of thickness
                                                                                        are supported (thin, medium, thick) each corresponding to one
© 2018 Copyright held by the owner/author(s). Published in the Workshop                 interval ((0, 1K), [1K, 100K) and [100K, ∞) respectively). Finally,
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permit-
                                                                                        1 Accessible through http://lod-cloud.net/versions/2014-08-30/lod-cloud_colored.svg
ted under the terms of the Creative Commons license CC-by-nc-nd 4.0.




                                                                                  100
         Figure 1: Left: The LOD cloud diagram. Right: One perspective of the introduced interactive LOD 3D model.


each circle is colored differently for indicating the 9 different do-                 b Smal l mode the cube corresponds to the smallest dataset. Con-
mains of the datasets. A new version of the Linking Open Data                         sequently, the buildings of the datasets that have enough triples
cloud diagram was released on 2017-01-26.2 That version con-                          tend to become skyscrapers.
tains almost double the number of datasets (i.e. 1163). Datasets                         In (c), i.e. “feature-based" cuboids, the shape depends on the
are again visualized as circles however only three sizes of cir-                      features of the corresponding datasets. Since |triples(Si )| ≈ (|Ui |
cles (large, medium, small) are supported. The links are interac-                     +|Li |+|BNi |)∗Deд(Ui ), the height of the building is set to be anal-
tive and their direction is indicated through color. However, the                     ogous to |Ui | + |Li | + |BNi |, and the footprint of the building anal-
clustering of the datasets is not favorable in all cases and the la-                  ogous to Deд(Ui ). Specifically, assuming square footprints, p      we
bels are less readable in comparison to the 2014-2017 version of                      have heiдht(bi ) = |Ui | + |Li | + |BNi | and width(bi ) = Deд(Ui ).
the diagram.                                                                          The volume of the building bi approximates |triples(Si )|; if its
                                                                                      degree is low it will become a high building with a small foot-
3  AN INTERACTIVE URBAN 3D                                                            print, whereas if its degree is high then the building will be less
   VISUALIZATION FOR LOD DATASETS                                                     tall but will have a big footprint.
                                                                                         For getting building sizes that resemble those of a real urban
3.1 Dataset Notations                                                                 area, a calibration is required. For this reason we introduce an
Let S = S 1 , . . . Sk be the set of datasets. Each dataset Si con-                   additional parameter F , through which we can obtain the de-
sists of a set of triples (i.e., a set of subject-predicate-object                    sired average ratio of height/width of the buildings. Specifically,
statements), denoted by triples(Si ). We shall use Ui to denote the                   let r be the desired ratio (e.g. 3 for three-floor buildings) pro-
URIs, Li to denote the literals and BNi to denote the blank nodes                     vided by the user. We can add a parameter F to the definition
that appear in triples(Si ). Hereafter, we consider only those URIs                   of heiдht and pwidth: heiдht(bi ) = (|Ui | + |Li | + |BNi |)/F and
that appear as subjects or objects in a triple as our primary focus                   width(bi ) = Deд(Ui ) ∗ F . Note that any positive value of F
is on the data (not on schema). The number of common URIs be-                         yields a pair of heiдht(bi ) and width(bi ) that preserves the vol-
tween two datasets Si and S j , is given by |Ui ∩ U j |. We define                    ume. What is left to do is to select the F for obtaining the de-
the Links between two datasets as follows: Linksi, j = Ui ∩ U j .                     sired average ratio r . This reduces to finding the F such that
If T is a set of triples, then we can define the degree of a URI                                    heiдht (b )
                                                                                      r = avд { w idth(b i) | 1 ≤ i ≤ k }. The solution of this equa-
e in T as: deдT (e) = |{(s, p, o) ∈ T | s = e or o = e }|, while                                      v
                                                                                                      u
                                                                                                      t
                                                                                                              i
                                                                                                                                            !2
for a set of URIs E we can define their average degree in T as                                           Í |S |                   √
                                                                                                     3     i =1 (|Ui |+ |L i |+ |BN i |)/   Deд(Ui )
deдT (E) = avдe ∈E (deдT (e)). Now for each dataset Si we can                         tion is: F =                          r ∗ |S |                   . Obviously in-
compute the average degree of the elements in Ui by considering                       stead of avд one can specify the min or max desired ratio and in
triples(Si ), i.e.: Deд(Ui ) = avдe ∈Ui (deдtr ipl es(S i ) (e)).                     that case the formula is changed accordingly, e.g., for the max
                                                                                      desired ratio, we should
                                                                                                           v   first compute for each dataset the fol-
3.2     Buildings Representation                                                                           u
                                                                                                           t                                     !2
                                                                                                                                       √
                                                                                                               (|Ui |+ |L i |+ |BNi |)/ Deд(Ui )
The main idea is that we visualize each dataset Si as a building                      lowing number: Fi =
                                                                                                            3
                                                                                                                                 r                  . Then, we
bi . The volume of each building represents the number of triples
of the respective dataset ( |triples(Si )| ). As regards the types of                 should sort all Fi in descending order and select the max Fi . By
the buildings, we support the following options: (a) cubes, (b)                       selecting the max Fi as the F value in all heiдht(bi ) and width(bi ),
“context"-dependent cuboids, and (c) “feature"-based cuboids.                         then that guarantees that all buildings will have ratio ≤ r .
    In (a), p
            each dataset Si is represented by a cube with edge length
equal to 3 |triples(Si )|.                                                            3.3 Placement of the Buildings
    In (b) we use “context-dependent" cuboids. The footprint of                       Below we describe four different building layout approaches, that
the buildings is computed based on either the biggest dataset                         our system supports.
(b Biд mode) or the smallest dataset (b Smal l mode). In the b Biд                    1. Mountainside     Layout.   The k buildings are placed in an or-
mode the building of the biggest dataset is a cube, while in the                                  √        √
                                                                                      thogonal ⌈ k⌉ × ⌈ k⌉ grid. The biggest building is placed in
                                                                                      one edge of the square area. The second bigger is placed next
2 by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard         to the first, and so on, until reaching the end of a row, where it
Cyganiak, http://lod-cloud.net/                                                       continues the same procedure in the next one until there are no




                                                                                101
                                      Table 1: Comparison of visualizations for Linked Datasets

   Aspect                   Mountain Side Orthogonal Spiral Cyclic Spiral                     Similarity-based layout LOD Cloud Diagram (2D)
   Accurate size            X               X                X                         X                                  only 5 sizes
   Features (e.g. Degree)   X               X                X                         X
   Connectivity             X               X                X                         X                                  X
   Interactive              rich            rich             rich                      rich                               poor (only 2D with zoom)
   Time Complexity          O (n)           O (n)            O (n)                     O (e ∗ n 2 )                       not mentioned
   Distinctive              Readability     Fast overview of Effective space exploita- Focus on connectivity              Label readability, connec-
   characteristic                           datasets’ sizes  tion even for power law (less edge crossings)                tions, clustered datasets
                                                             distributed datasets                                         by domain


                                                                                (c) collisions should never occur,
                                                                                (d) no big empty spaces, especially in the outer area that hosts
                                                                                    the majority of the buildings which are small.
                                                                            For the above requirements, we devised a new 2D placement al-
                                                                            gorithm called Concetric Spiral. The buildings, in an descending
                                                                            order with respect to their size, are placed on concentric rings.
                                                                            The radius of the first (smallest) ring is the size of the biggest
                                                                            building. The placement of the subsequent buildings is done as
                                                                            follows. We compute the minimum chord that is required to
                                                                            avoid collisions based on the sizes of the current and the pre-
                                                                            vious building. Then we compute the corresponding angle, and
                                                                            we place the new building at the corresponding spot of the cir-
                                                                            cle. The sought angle is θ = 2 arcsin( 2·rchor  d
                                                                                                                        adius ). Just before we
Figure 2: Visualizations of the social datasets in LOD 2014                 reach 2π , we start the next bigger ring whose radius, is the ra-
                                                                            dius of the previous ring increased by the size of the last drawn
                                                                            building (plus a number accounting for “roads"). In this way
                                                                            the concentric rings become denser as the buildings get smaller
                                                                            avoiding the unnecessary empty spaces. The algorithm is ap-
                                                                            propriate for sets of datasets whose sizes vary a lot, even if they
                                                                            exhibit a power law distribution (i.e. very few big datasets and
                                                                            too many small ones, see [3, 9] for measurements about current
                                                                            RDF datasets). A screenshot of the layout based on Cyclic Spiral
                                                                            is shown in Figure 1 - right and Figure 2 - upper right. The algo-
                                                                            rithm has O(n) time complexity (n is the number of buildings).
Figure 3: Similarity-                     Figure 4: Two ways to vi-         4. Similarity-based layout. According to this algorithm, the
based layout                              sualize the connections           more commonalities two sources have (common URIs, common
                                                                            literals, owl:sameAs relationships, etc.) the closer the correspond-
                                                                            ing buildings are placed. One way to specify the location of each
buildings to draw. The result resembles a mountainside (Figure              building is to adopt a force-directed placement algorithm. In our
2 - upper left).                                                            case, we have modified the Fruchterman-Reingold force directed
                                                                            algorithm [4] as adapted to three.js3 . This algorithm satisfies
2. Orthogonal
      √       √ Spiral. The k buildings are placed in an orthogo-           the following two principles: a) vertices connected by an edge
nal ⌈ k⌉ × ⌈ k⌉ grid (see the two screenshots at the bottom part
                                                                            should be drawn near each other and b) vertices should not be
of Figure 2). The biggest building is placed in the centre of the
area (summit). The process continues by adding growing enclos-              drawn too close to each other. Figure 3 shows an indicative lay-
ing squares of size N . For example, the first contains 8 squares           out produced by the similarity-based algorithm.
                                                                            Comparison. Table 1 summarizes the distinctive characteris-
(3 above of the summit, one left and one right of the summit
and 3 below the summit). The next enclosing square contains                 tics points of each visualization approach including the 2D LOD
16 more buildings and so on. Each building is drawn following               Cloud diagram. The value “rich" in the line “interactive" refers
                                                                            to interactive selection, zooming, panning, rotation, and control
the clockwise direction. The result resembles a mountain whose
summit is at the centre of the 2D area. One shortcoming of this             of visibility of labels and connections.
algorithm is that if we represent buildings with cubes then this
algorithm yields very sparse peripheral areas. This was actually            3.4 Visualizing the Links of Datasets
the motivation for the subsequent algorithm.                                If there are links between two datasets Si and S j then a line seg-
3. Cyclic Spiral. Based on the weaknesses of the previous layout            ment is created, resembling a road that connects the correspond-
algorithms, we identified the following requirements for a better           ing buildings (see the left side of Figure 4). The links can be also
layout algorithm:                                                           visualized as bridges (see the right side of Figure 4). The width of
    (a) bigger buildings should be placed at the center                     these bridges/roads, indicates the strength of the connection that
   (b) a spiral-like placement seems beneficial as it would result
        to a round coverage of the space,                                   3 https://github.com/davidpiegza/Graph-Visualization




                                                                      102
the correlated datasets have, and it is calculated by the division                 a strongly connected component could be visualized as a small
of the number of links between Si and S j with the number of                       round park (or roundabout) where only one line segment con-
links of the most connected pair (i.e., maxLinks): width(i, j) =                   nects each building to that park.
  |Links(i, j)|
 |max Links | .

3.5      Application Cases
We downloaded manually 287 RDF datasets including their con-
tent (i.e., triples, URIs, etc.) from the following resources: (a) the
dump of the data which were used in [11], (b) online datasets
from datahub.io website and (c) a subset of DBpedia version
3.9. To test the algorithms in even bigger datasets we managed
to find metadata from datahub.io for 600 datasets of various do-
mains. Comparing to the 287 datasets (see Figure 1(right)), for
most of these 600 datasets we were not able to access and down-
load their content (i.e., triples, URIs, etc.). However, we managed
to find some basic metadata for these datasets in datahub.io. Un-                        Figure 6: Overview of the web GUI of the system
fortunately, in datahub.io there is a lack of information for other
features of these datasets such as the number of URIs, literals,
blank nodes and degree of URIs. Therefore, it is not possible to                   5    CONCLUDING REMARKS
produce feature-based buildings for these datasets, although the                   The proposed 3D interactive system: (i) illustrates accurately the
proposed visualizations can support feature-based buildings for                    relative sizes of the datasets in triples, (ii) can indicate the aver-
thousands of datasets. Figure 5 shows on the left side the cyclic                  age degree of the datasets, (iii) allows the user to control which
spiral layout and on the right side the orthogonal layout for this                 connections to show or hide, (iv) makes evident (through the
set of 600 datasets.                                                               layout algorithms) the differences in the sizes of datasets or their
                                                                                   commonalities. It supports various building types (cubes, context-
                                                                                   dependent cuboids, feature-based cuboids), as well as several
                                                                                   layout algorithms(mountainside, orthogonal spiral, cyclicSpiral,
                                                                                   similarity-based adaptations of force-directed algorithms), that
                                                                                   order the buildings appropriately, depending on the user needs,
                                                                                   and similarity-based adaptations of force-directed algorithms.
                                                                                      Acknowledgements. Work partially supported by a) the EU pr-
                                                                                   oject BlueBRIDGE (Building Research environments for foster-
            Figure 5: 3D Visualizations of 600 datasets                            ing Innovation, Decision making, Governance and Education to
                                                                                   support Blue growth), H2020-EINFRA-2015-1, 2015-2018 and b)
                                                                                   the General Secretariat for Research and Technology (GSRT) and
                                                                                   the Hellenic Foundation for Research and Innovation (HFRI).
4     IMPLEMENTATION AND FUTURE STEPS
We have implemented a web-based visualization system, which                        REFERENCES
could be easily accessible by any user. We used the JavaScript                      [1] Aba-Sah Dadzie and Matthew Rowe. 2011.                  Approaches to visual-
library Three.js4 which in turn uses the WebGL API5 , which is                          ising Linked Data: A survey.             Semantic Web 2, 2 (2011), 89–124.
                                                                                        http://dblp.uni-trier.de/db/journals/semweb/semweb2.html#DadzieR11
widely supported by all modern desktop and mobile browsers                          [2] Brian Danchilla. 2012. Three.js Framework. In Beginning WebGL for HTML5.
without the use of plugins. Three.js offers a less tedious pro-                         Apress, 173–203. https://doi.org/10.1007/978-1-4302-3997-0_7
gramming environment in comparison to WebGL, by abstract-                           [3] Javier D Fernández, Miguel A Martínez-Prieto, Pablo de la Fuente Redondo,
                                                                                        and Claudio Gutiérrez. 2017. Characterising RDF data sets. Journal of Infor-
ing away many of the WebGL details, which is a JavaScript API                           mation Science (2017).
that allows the creation of GPU accelerated 3D graphics and an-                     [4] Thomas MJ Fruchterman and Edward M Reingold. 1991. Graph drawing
                                                                                        by force-directed placement. Software: Practice and experience 21, 11 (1991),
imations inside the environment of a web browser [2] 6 .                                1129–1164.
   Figure 6 shows an overview of the web-based visualization                        [5] James Hendler. 2014. Data Integration for Heterogenous Datasets. Big data
system. The visualization is interactive, allowing the user to zoom                     2, 4 (2014), 205–215.
                                                                                    [6] Shah Khusro, Fouzia Jabeen, Syed Rahman Mashwani, and Iftikhar Alam.
in any part of the model. For instance, one can change the per-                         2014. Linked open data: towards the realization of semantic web-a review.
spective, the shape of the buildings or their placement, search for                     Indian Journal of Science and Technology 7, 6 (2014), 745–764.
a dataset through an auto-completion search, see all the connec-                    [7] Jakub Klimek, Jiri Helmich, and Martin Necasky. 2015.                Use Cases
                                                                                        for Linked Data Visualization Model. In Proceedings of the Workshop on
tions or those of one dataset, and others. The presented model                          Linked Data on the Web (LDOW) (CEUR Workshop Proceedings). Aachen.
could be improved in several ways. Below we sketch two in-                              http://ceur-ws.org/Vol-1409/#paper-08
                                                                                    [8] Luca Matteis. 2014. VoID-graph: Visualize Linked Datasets on the Web. CoRR
dicative enrichments: (a) for aiding the user to get a more in-                         abs/1408.6691 (2014). http://arxiv.org/abs/1408.6691
formative and “live" overview immediately the system could be                       [9] Michalis Mountantonakis and Yannis Tzitzikas. 2016. On Measuring the Lat-
enriched with “guided tours" , i.e. with trails of camera move-                         tice of Commonalities Among Several Linked Datasets. Proceedings of the
                                                                                        VLDB Endowment 9, 12 (2016), 1101–1112.
ments over the space occupied by the buildings and (b) for reduc-                  [10] Peng Peng, Lei Zou, M Tamer Özsu, Lei Chen, and Dongyan Zhao. 2015. Pro-
ing the crossings of the edges, each set of buildings that forms                        cessing SPARQL queries over distributed RDF graphs. The VLDB Journal
                                                                                        (2015), 1–26.
4 three.js is available at http://threejs.org/
                                                                                   [11] Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption
5 https://www.khronos.org/webgl/
                                                                                        of the linked data best practices in different topical domains. In The Semantic
6 There are similar JavaScript libraries like GLGE, SceneJS, PhiloGL, etc.              Web–ISWC 2014. Springer, 245–260.




                                                                             103