=Paper=
{{Paper
|id=Vol-1184/paper13
|storemode=property
|title=Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud
|pdfUrl=https://ceur-ws.org/Vol-1184/ldow2014_paper_13.pdf
|volume=Vol-1184
|dblpUrl=https://dblp.org/rec/conf/www/KlimekHN14
}}
==Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud==
Application of the Linked Data Visualization Model on Real World Data from the Czech LOD Cloud Jakub Klímek Jiří Helmich Martin Nečaský Czech Technical University in Charles University in Prague Charles University in Prague Prague Faculty of Mathematics and Faculty of Mathematics and Faculty of Information Physics Physics Technology helmich@ksi.mff.cuni.cz necasky@ksi.mff.cuni.cz klimek@fit.cvut.cz http://xrg.cz/ http://xrg.cz/ ABSTRACT can be geocoded. In that case, the user may require to start In the recent years the Linked Open Data phenomenon has the exploration on a visualization of those entities on a map. gained a substantial traction. This has lead to a vast amount Or the dataset may contain a hierarchical structure and var- of data being available on the Web in what is known as the ious techniques for hierarchy visualizations can be used. LOD cloud. While the potential of this linked data space is huge, it fails to reach the non-expert users so far. At the same time there is even larger amount of data that is so Demogra far not open yet, often because its owners are not convinced phy of its usefulness. In this paper we refine our Linked Data Budgets Visualization Model (LDVM) and show its application via LAU OVM its implementation Payola. On a real-world scenario built regions Agendas on real-world Linked Open Data created from Czech open Elections data sources we show how end-user friendly visualizations COI.CZ results can be easily achieved. Our first goal is to show that using Payola, existing Linked Open Data can be easily mashed RUIAN NUTS Research projects up and visualized using an extensible library of analyzers, codes ARES transformers and visualizers. Our second goal is to give po- Business tential publishers of (Linked) Open Data a proof that simply Entities Consolida ted Law by publishing their data in a right way can bring them pow- erful visualizations at virtually no additional cost. Czech Public Institution Contracts s of public Categories and Subject Descriptors TED Geocoordi power (OVM) Court Public nates decisions H.5.2 [User interfaces]: GUIs, Interaction styles; H.3.5 Contracts [Online Information Services]: Data sharing; H.3.5 [Online CPV 2008 Information Services]: Web-based services Governmental Exchange Business-entities rates Geographical Keywords Statistical Linked Data, Visualization, Semantic Web Figure 1: Czech Linked Open Data (CzLOD) Cloud 1. INTRODUCTION The reader may argue that this does not depend on the fact that we work with LOD, which is true. Any non- Recently, vast amount of data represented in a form of LOD dataset can also be explored this way. However, LOD Linked Open Data (LOD) has appeared on the Web. The brings an important new dimension (besides the uniform key LOD principle is linking data entities in a machine in- data model – RDF) to the problem of data presentation, terpretable way so that they form a huge data space dis- especially when we talk about visualization. Suppose that tributed across the Web: the LOD cloud. The LOD cloud we have a dataset with addresses, we geocoded them and is not interesting for end-users until there are useful tools display them on a map. Suppose now that this is a LOD available built on top of it. Very important are tools which dataset and that we have other LOD datasets without GPS are able to present various kinds of LOD to users who want coordinates, but linked to the geocoded entities of the for- to explore the data. This includes LOD browsers and visual- mer one. We can build a tool which displays any of these ization tools. An end-user often does not know more about linked datasets on a map as well. This, of course, applies datasets than that there are some data structures contained only when the links make sense in terms of location, but the which could be visualized. For example, entities in a dataset point is that compared to non-LOD datasets, it is easy to create and use links in LOD. Copyright is held by the author/owner(s). In our previous work, we presented the Linked Data Vi- LDOW2014, April 8, 2014, Seoul, Korea. sualisation Model (LDVM) [4]. It enables us to combine various LOD datasets and visualize them with various kinds scope of this paper. of visualizations. It separates the part which prepares the Business entities datasets. In the heart of the CzLOD data for visualizations from the actual visualizers. Visual- cloud is the ARES dataset. It is data from various Czech izers then specify their expected input data structure (e.g., registries and mainly the Business registry. In the Czech Re- hierarchical with labels, geo-coordinates, etc.) in RDF using public, every business entity has its unique 8-digit identifica- broadly accepted vocabularies. This allows reuse of visual- tion number. Based on this number, it is easy to devise a rule izers for a broad range of LOD datasets. We focus on two for unique business entity URI creation. For example, the groups of users and their cooperation. First are expert users Czech Ministry of Interior is identified by http://linked. who can easily prepare analyses and visualization compo- opendata.cz/resource/business-entity/CZ00007064. For nents and the second group are lay users. They can use and each business entity in the Business registry, the dataset con- combine components prepared for them by the experts us- tains its official name, type, address of headquarters, kinds ing a LDVM implementation without extensive knowledge of of its activities, names of shareholders, etc. RDF and SPARQL. An example of such a use by a lay user Another dataset about business entities is COI.CZ, which can be visualizing a given analysis using various visualizers contains data about inspections, resulting fines, and bans is- or running the same analysis on various data sources. sued by the Czech Trade Inspection Agency. Each inspection Contributions. The primary purpose of this paper is record contains the business entity identification number, lo- to demonstrate the benefits that LDVM brings to users on cation, region (NUTS code and LAU code) and information real-world data. We show that our implementation Payola about the resulting fine or ban. Again, this data links easily allows any expert user to easily access his SPARQL endpoint to our other datasets about business entities via the URL of choice or upload his RDF file, perform analyses using based on the identification number. The source data is pub- SPARQL queries and visualize the results using a library lished as 3 star open data2 (CSV files) by the agency3 . of visualizers. At the same time, the components created Our Research projects dataset contains information about by experts can be also easily used and reused by lay users. research grants funded by various Czech grant agencies. For We present several visualization scenarios of real-world data each project there is data about amounts of money paid to from the Czech LOD (CzLOD) cloud. The Czech LOD cloud each of the participants for each year of the project as well contains various datasets we publish in our research group. as identification numbers of all participants and additional We describe these datasets briefly in this paper as well. Each information about the projects. The source data can be scenario takes several datasets from the CzLOD cloud, com- exported as Excel files from a web portal maintained by the bines them together, extracts a data structure which can be Research, Development and Innovation Council 4 . visualized and offers appropriate visualizers to the user. Outline. The rest of this article is organized as follows: Geographical datasets. Our newest and biggest geo- In Section 2 we survey the CzLOD cloud and describe the graphical dataset is RUIAN - register of territorial iden- currently available datasets. In Section 3 we present the tification, addresses and real estates. It has more than 600 Linked Data Visualization Model (LDVM), a simple yet million triples and contains data about all address points, powerful formalism for building analyses, transformations streets, parts of towns, towns, cities, various levels of regions and visualizations of Linked Data. In Section 4 we briefly and also about every building and every lot in the Czech Re- describe Payola, our implementation of LDVM. In Section 5 public including the hierarchy. Each object in RUIAN has we present our real world examples of analyzers, transform- assigned geocoordinates, which can be transformed to GPS ers and visualizers on our running LDVM instance and put coordinates. This creates a powerful base for geocoding in our contributions in a perspective of publishing data of pub- the Czech Republic. RUIAN is also linked to NUTS and lic administration bodies. In Section 6 we briefly survey LAU codes, which are 5-level European codes for towns, re- related work and finally, in Section 7 we conclude. gions etc. The source data is in XML and freely accessible5 . Other geographical datasets contain the already mentioned NUTS and LAU codes hierarchies. Additionally, the (Geoco- 2. CZECH LOD CLOUD ordinates) dataset contains geocoordinates for each address In this section, we introduce a survey of the Czech Linked found in our datasets created by geocoding. Open Data (CzLOD) cloud. As the data itself is not the Governmental datasets. Currently, there are three kinds main contribution of this paper, we will not go into too of governmental datasets in the CzLOD cloud. The first much detail. We are working on the cloud continuously in kind contains information about institutions of public power the OpenData.cz initiative since 2012 to show the owners of (OVM ), e.g., ministries, cities, but even public notaries, etc. the data, who are mainly public bodies, what benefits can For each institution, that is also a business entity, there is be gained from proper publication. The cloud is accessible its identification number, address, type and also information at http://linked.opendata.cz/sparql and runs on Open- about its offices and their opening hours. In addition, there Link Virtuoso 7 Open-Source triplestore 1 and currently is a dataset with agendas of these institutions, that are also contains approximately 100 million triples not counting the linked to laws according to which they are established. This largest dataset, RUIAN, which is described later. Figure 1 data gives a good base for, e.g., mobile applications that contains a map of the CzLOD cloud similar to the global give the user his location and opening hours of the nearest LOD cloud. It is also color coded, red are the datasets about notary, etc. The second kind of our governmental datasets Czech business entities, green are the geographical datasets, yellow are the governmental datasets and blue are the statis- 2 http://5stardata.info/ 3 tical datasets. In addition, the CzLOD cloud also includes http://www.coi.cz/cz/spotrebitel/ various e-Health datasets, which are, however, beyond the open-data-databaze-kontrol-sankci-a-zakazu/ 4 http://www.isvav.cz 1 5 https://github.com/openlink/virtuoso-opensource http://vdp.cuzk.cz/ (e.g. XML, CSV) as well as semi-structured or even non-structured data (e.g. HTML pages or raw text). Source RDF and non-RDF 2. Analytical abstraction: extraction and representation Data of relevant data in RDF obtained from source data. 3. Visualization abstraction: preparation of an RDF data Data Transformation structure required by a particular visualization tech- nique (e.g., 1D, 2D, 3D or multi-dimensional data, tree Analyzer data, etc.) Analytical 4. View: creation of a visualization for the end user. Analytical RDF SPARQL Abstraction Data is propagated through the LDVM pipeline by apply- Operators ing 3 types of transformation operators: 1. Data transformation: transforms the raw data repre- Visualization sented in a source data model or format into a repre- Visualization Transformation Transformer sentation in the RDF data model; the result forms the base for creating the analytical RDF abstraction. Visualization RDF Visualization 2. Visualization transformation: transforms the obtained Abstraction Operators analytical abstraction into a visualization abstraction. 3. Visual mapping transformation: maps the visualiza- tion abstraction data structure to a concrete visual Visualizer Visual Mapping Transformation structure on the screen using a particular visualization technique specified using a set of parameters. There are operators within the stages that allow for in- View View stage data transformations: Operators 1. Analytical SPARQL operators: transform the output of the data transformation to the final analytical ab- straction (e.g. aggregations, enrichment from LOD). Figure 2: High level LDVM overview. 2. Visualization operators: further refine the visualiza- tion abstraction data structure (e.g., its condensation if it is too large for a clear visualization). are law datasets. The main part consists of consolidated laws 3. View operators: allow a user to interact with the view of the Czech Republic. The other part consists of decisions (e.g., rotate, scale, zoom, etc.). of Czech Supreme court linked to laws. In addition, there are datasets with information about public contracts. 3.2 LDVM stages Statistical datasets. Our statistical datasets include data about demography and budgets of cities linked to the cloud Source RDF and non-RDF Data Stage. The first stage via NUTS and LAU codes. We also have exchange rates of considers RDF as well as non-RDF data sources. The data all currencies to Euro from the European Central Bank. Fi- transformation transforms the source data to an RDF rep- nally, there are results of elections to the Czech parliament. resentation that forms a base for creating an analytical ab- straction. If the source RDF data does not have a suitable structure for the following analysis, the transformation can 3. LINKED DATA VISUALIZATION MODEL be a sequence of one or more SPARQL queries that map the In this section we briefly go through the Linked Data Visu- source data to the required structure. alization Model (LDVM), which we defined in our previous Analytical RDF Abstraction Stage. The output of the sec- work [4]. First, we give an overview of the model and then ond stage (analytical RDF abstraction) is produced by ap- we formalize its key elements. plying a sequence of various analytical SPARQL operators on the RDF output produced by the data transformation. 3.1 Overview of LDVM We call the sequence an analyzer (see Figure 2). Our goal is The Linked Data Visualization Model (LDVM) is an adap- to enable users to reuse existing analyzers for analyzing vari- tation of the general Data State Reference Model (DSRM) [5] ous datasets. We want to enable users to find analyzers that for the specifics of the visualization of RDF and Linked Data. can be applied for analyzing a given data set and, vice versa, It is an abstract data process inspired by a typical Knowl- to find datasets that may be analyzed by a given analyzer edge Discovery Process [10]. We extend DSRM with three automatically. Therefore, it is necessary to be able to decide additional concepts – analyzers, transformers and visualiz- whether an analyzer can be applied on a given dataset, i.e. ers. They denote reusable software components that can whether the analyzer is compatible with the dataset. We be chained to form an LDVM instance. Figure 2 shows an formalize the notion of compatibility later in Section 3.3. overview of the LDVM. The names of the stages, transforma- Visualization Abstraction Stage. We want to ensure that tions and operators proposed by DSRM have been slightly visualizers are reusable for different analytical abstractions. adapted to the context of RDF and Linked Data. LDVM However, building specific visualizers for particular analyti- resembles a pipeline starting with raw source data (not nec- cal abstractions would not enable such reuse. This is because essarily RDF) and results with a visualization of the source each visualization tool visualizes particular generic charac- data. It is organized into 4 stages that source data needs to teristics captured by analytical abstractions. For example, pass through: there can be a visualizer of tree structures using the TreeMap 1. Source RDF and non-RDF data: raw data that can technique or another visualizer of the same structures using be RDF or adhering to other data models and formats the SunBurst technique. And, another visualizer may visu- its input RDF data. We formalized the model in our pre- EU Public vious work [4]. However, since then as the implementation Contracts progressed, we have simplified the formalization for it to be DBpedia more practical and with no effect on its power. Given the formalization, we are then able to decide whether a given analyzer can be applied on a given RDF dataset. Similarly, we can decide whether a visualization transformer can be ap- Class Hierarchy Property Public Spending plied on a given analytical abstraction, etc. Our approach is Analyzer Hierarchy Analyzer Analyzer based on the idea to describe the expected input of a LDVM component with an input signature and the expected out- put with an output data sample. The signature and the ClassProp-2-SKOS Place-2-SKOS Vis. PCO-2-GeoLocQB data sample are provided by the creator of the component. Vis. Transformer Transformer Vis. Transformer Each component can then check whether its input signature is compatible with the output sample of the previous com- ponent. The input signature comprises a set of SPARQL Sunburst TreeMap Columns on ASK queries which should be inexpensive so that they can Visualizer Visualizer GMaps Visualizer be evaluated quickly on a number of datasets. The output data sample is a small RDF data sample that shows the for- mat of the output of the component. The input signature of one component is then compatible with the output data sample of another component when all the SPARQL ASK Figure 3: Sample analyzers and visualizers queries of the signature are evaluated on the data sample as true. Our rationale is to provide a simple and lightweight so- alize 2-dimensional structures on Google Maps. An analyti- lution, which allows to check the compatibility of a number cal abstraction may contain encoded both the tree structure of components without complex reasoning. as well as the 2-dimensional structure. All three mentioned Definition 1 (Input signature). A set of SPARQL visualizers can be applied to the analytical abstraction as ASK queries SC = {Q1 , Q2 , . . . , Qn } is an input signature well as on any other abstraction which contains the same of a LDVM component C. structures encoded. Therefore, we need to transform the analytical abstraction into the form accepted by the desired Note that an analyzer can potentially extract data from visualizers. An example can be that we have a visualizer for multiple data sources, e.g., SPARQL endpoints or RDF files. a tree-like structure which accepts the SKOS6 vocabulary Then the analyzer would have to have a separate input sig- with its skos:broader property for the hierarchy. The ana- nature for each data source. However, for simplicity, we omit lytical abstraction might already contain this hierarchy, then this slight extension. Analyzers and visualization transform- no visualization transformation is required. Or, the analyt- ers provide an output data sample, against which an input ical abstraction might contain a tree-like hierarchy modeled signature of another LDVM component can be checked. using rdfs:subClassOf property and it needs to be trans- formed here. This transformation is performed by the vi- Definition 2 (Output data sample). RDF data DC sualization abstraction stage. We call the transformation a representing the maximum possible structure of the output visualization transformer. Again, a user can reuse various data format produced by a LDVM component C using mini- transformers for extracting visualization abstractions of the mum amount of triples is an output data sample of C. This desired kind from compatible analytical abstractions. only applies to analyzers and visualization transformers. View Stage. The output of the (view ) stage is produced by a component called a visualizer. A view is a visual rep- Definition 3 (Compatibility). We say that LDVM resentation of a visualization abstraction on the screen. A component C with input signature SC is compatible with visualizer performs visual mapping transformation that may LDVM component D with output data sample DD iff each be configured by a user using various parameters, e.g. vi- Qi ∈ SC =Q{Q1 , Q2 , . . . , Qn } returns true when executed on sualization technique, colors and shapes. The user can also DD , i.e., ni=1 E(Qi , DD ) = 1 where E(Qi , DD ) ∈ {0, 1} is manipulate the final view using the view in-stage operators the evaluation of SPARQL ASK query Qi against data DD . such as zoom and move. A visualizer can be reused for visu- alizing various visualization abstractions that contain data Given the output data samples are small and the SPARQL structures accepted by the visualizer. ASK queries are inexpensive, we can, for a given SPARQL endpoint, automatically offer all possible visualizations us- 3.3 Formalization ing available LDVM components to our lay users. The pro- The core concepts of LDVM are reusable components, i.e. cess for checking of available visualizations using LDVM analyzers, visualization transformers and visualizers. An- starts with the analyzers. Each analyzer performs SPARQL alyzers and visualization transformers consume RDF data ASK queries from its input signature. If it is compatible (all via their input interfaces and produce RDF data as their ASKs return true), it is marked as available. Next are the output. Visualizers consume RDF data and produce a vi- visualization transformers. Because they are optional and sualization a user can interact with. The goal is to formally also can be chained, they need to perform their checks in it- introduce the concept of compatibility of a component with erations. In the first iteration, all transformers perform their ASKs from their input signatures on the output data sam- 6 http://www.w3.org/TR/skos-reference/ ples of available analyzers. Those who succeed are marked available. In the next iteration, all transformers that are not 5. DEMONSTRATION OF LDVM available perform their ASKs on the output data samples of In this section, we present our real world example of im- the newly available transformers. This ends when there is plementation and usage of LDVM. We present various ana- no new available transformer. Finally, all visualizers per- lyzers, visualization transformers and visualizers, which are form their ASKs on all available analyzers and visualization actual LDVM components with input signatures and out- transformers. The result is a set of all possible combinations put data samples. The examples run in Payola, our LDVM of what can be visualized in the given SPARQL endpoint. implementation (see Section 4). See Figure 3 for illustration. 5.1 Analyzers 4. IMPLEMENTATION: PAYOLA In this section, we describe two analyzers that create ana- Payola 7 is a web framework for analyzing and visualiz- lytical abstractions from the CzLOD cloud, their input sig- ing Linked Data [12]. It enables users to build their own natures and output data samples. An analyzer is a software instances of LDVM pipelines. Payola provides an LDVM component that produces RDF data and for a given data analyzer editor in which SPARQL queries and custom plu- source (or possibly more data sources) can say whether it gins can be combined. Firstly, the user defines a set of data can extract data from this data source or not. It can, for sources such as SPARQL endpoints or RDF files as input instance, represent a complex computation over simple data data and then connects other plugins to them. Some of the or it can simply be a SPARQL query, which is a case of our plugins are designed to provide simple SPARQL constructs. two examples. Note that when an analyzer is in a form of Join and Union plugins enable users to analyze a dataset a SPARQL CONSTRUCT query, its input signature corre- created from multiple datasets stored in separate SPARQL sponds to its WHERE clause and its output data sample is endpoints. It is also possible to transform results of an an- an instance of its CONSTRUCT clause. alyzer with a custom transformer. When the pipeline is 5.1.1 A1: Institutions of public power evaluated, the user can choose a visualizer to see the results in various forms. Throughout the LDVM pipeline all data The first analyzer A1 takes data from 2 datasets: Institu- is RDF and the user can download the results in a form of tions of public power (OVM) and Geocoordinates. From the an RDF file. OVM dataset, it extracts the institutions with their types Payola also offers collaborative features. A user is able to and addresses8 . The types of the institutions are expected create an analyzer and share it with the rest of the Payola to be skos:Concepts and the labels of the types are expected users. That enables them to run such an analyzer as well to be skos:prefLabels. From the Geocoordinates dataset, as to create a new analytical plugin, which is based on that the analyzer extracts geocoordinates gained by geocoding analyzer. As analytical plugins have parameters that affect the OVM addresses. The input signature of the analyzer their behavior, a new analyzer–based plugin may also have consists of one ASK query SA1 = {QA1 } : parameters, which can be chosen from the parameters of the # Q of A1 plugins of the original analyzer. This feature supports for- [] s : name [] ; mation of an ecosystem where expert users create analyzers ovm : typSubjektu ? type ; s : address ? address . for those who are less experienced. Combining those analyz- ? address s : streetAddress []; ers into new ones enables even inexperienced users to create s : postalCode []; a complex analyzer with less effort. s : addressRegion []; s : a ddre ssLo cal ity []; It is possible to extend Payola with custom plugins for s : geo ? g . analysis and visualization. For instance, a user is allowed to ? type skos : prefLabel [] . upload a code snippet of a new analytical plugin via our web ? g s : longitude []; s : latitude [] . interface. The framework compiles the code and integrates the created plugin immediately into the application. And an example of its output data sample DA1 is: Let us briefly describe some of the latest Payola features. Based on the previous user evaluation presented in [4], we # D of A1 focused our work on improving the user experience. Wes : geo ; s : title " title " ; introduced changes to make it even easier for non–expert s : description " desc " ; users to browse LDVM pipeline results without extensive ovm : typSubjektu < type >. knowledge of LOD principles or Payola itself. < type > skos : prefLabel " Type " . s : latitude " 50.088289 " ; The latest Payola version offers a one–click solution for s : longitude " 14.404446 " . presenting results of an LDVM pipeline in a chosen visu- alizer. When an LDVM pipeline is created, it is assigned Our SPARQL endpoint contains this kind of data and a unique URL. When a user accesses such a URL, Payola therefore QA1 returns true, which means that A1 is com- automatically loads the pipeline and creates the desired vi- patible with our SPARQL endpoint. sualization (see Section 5.3.2). To speed things up, we also implemented caching of analyzer results so that we can serve 5.1.2 A2: Inspections of COI.CZ more users in a shorter time without repeated analysis eval- The second analyzer A2 takes data from 5 datasets: In- uation. This brings us very close to what we see as a final spections of the Czech Trade Inspection Agency (COI.CZ), stage of delivering a visualization to a non–expert user – em- ARES (Business Registry), NUTS codes hierarchy, LAU bedding an LDVM visualization based on an LDVM pipeline codes hierarchy and also Geocoordinates. From COI.CZ it into an external website. That is a part of our future work. extracts information about inspections which resulted into 7 8 http://payola.cz s is a prefix for http://schema.org/ sanctions. Specifically, it extracts their dates, places, result- 5.2 Visualization transformers ing fines, links to business entities inspected and links to In this section we describe a sample visualization trans- LAU regions in which the inspection took place. From LAU former. It can be used to connect output RDF data from our regions, the analyzer takes names of the regions and links analyzers or any other compatible analyzers to the inputs of to broader NUTS codes. From NUTS codes, the analyzer our visualizers. A visualization transformer can be any soft- takes names of the regions and their hierarchy. ware component that transforms data between different for- From ARES, the analyzer extracts names of inspected mats or performs aggregations for better visualization. Note business entities. Finally, from Geocoordinates, it extracts that because we use RDF, the visualization transformers are the geocoordinates of addresses found in COI.CZ. The in- in fact SPARQL CONSTRUCT queries. Again, their input put signature of this analyzer consists of 2 SPARQL ASK signatures correspond to their FROM clauses and their out- queries SA2 = {QA A2 1 , Q2 } : 2 put data samples correspond to their CONSTRUCT clauses. # Q1 of A2 [] a s : CheckAction ; 5.2.1 T1: Region hierarchy to SKOS hierarchy s : location / s : location ? region ; Because we have various tree structure visualizers that s : location / s : geo ? geo ; s : object ? object ; accept tree structures using skos:Concepts for nodes with dcterms : date ? date ; skos:prefLabel for labels and skos:broader properties for s : result ? result . edges and also accept optional rdf:value for the size of a ? result a coicz : Sanction ; s : result / gr : ha s Cu r re n cy V al u e [] . leaf, we need to transform the hierarchy extracted in ana- ? object gr : legalName [] . lyzer A2 (see Section 5.1.2) accordingly. The region hierar- ? region a ec : LAURegion ; chy, that is in the output data sample of analyzer A2 con- ec : level 2 . ? geo s : latitude []; sists of ec:LAURegions for regions, s:CheckAction for the s : longitude []. inspections made by COI.CZ. In addition, the inspections FILTER ( datatype (? date ) = xsd : date ) have their sanction amounts in rdf:value, which we want to visualize as sizes of their corresponding leaves in the re- QA1 2 checks for the inspections s:CheckAction, its region sulting tree visualization. Therefore, the input signature of (LAU), geocoordinates, business entity, date and fine. The T1 consists of one SPARQL ASK query ST1 = {QT1 } which fine has to have an amount, the business entity has to have corresponds to the output data sample of A2 : a legal name, the region must be LAU level 2. QA 2 2 checks the LAU and NUTS datasets whether there is the region # Q of T1 [] a s : CheckAction ; hierarchy present and whether the regions have their names. s : location ? region ; # Q2 of A2 s : title [] ; [] a s : CheckAction ; rdf : value [] . s : location / s : location ? region ; ? region a ec : LAURegion ; s : result / s : result [] . ec : level 2 ; ? region a ec : LAURegion ; rdfs : label [] ; ec : level 2 ; ec : h asP aren tReg ion ? lau1 . dcterms : title [] ; ? lau1 rdfs : label [] ; ec : hasP aren tReg ion ? lau1 . ec : h asP aren tReg ion ? nuts3 . ? lau1 dcterms : title [] ; ? nuts3 rdfs : label [] ; ec : h asP aren tReg ion ? nuts3 . ec : hasP aren tReg ion ? nuts2 . ? nuts3 rdfs : label [] ; ? nuts2 rdfs : label [] ; ec : h asPa ren tReg ion ? nuts2 . ec : hasP aren tReg ion ? nuts1 . ? nuts2 rdfs : label [] ; ? nuts1 rdfs : label [] . ec : h asPa ren tReg ion ? nuts1 . ? nuts1 rdfs : label []. An example of its output data sample DT1 will correspond to the input signature of visualizer V1 (see Section 5.3.2): This data is present in our SPARQL endpoint so both the queries return true. Therefore, A2 is compatible with our # D of T1 data source. An example of the output data sample DA2 is: a skos : Concept ; skos : prefLabel " title " ; # D of A2 rdf : value 100 ; a s : CheckAction ; skos : broader < region > . s : location < region > ; < region > a skos : Concept ; s : geo ; skos : prefLabel " label " ; s : title " title " ; skos : broader < lau1 > . s : description " description " ; < lau1 > a skos : Concept ; dcterms : date " 2014 -02 -16 " ^^ xsd : date ; skos : prefLabel " label " ; rdf : value 2 . skos : broader < nuts3 >. s : latitude " 50.088289 " ; < nuts3 > a skos : Concept ; s : longitude " 14.404446 " . skos : prefLabel " label " ; < region > a ec : LAURegion ; skos : broader < nuts2 >. ec : level 2 ; < nuts2 > a skos : Concept ; rdfs : label " label " ; skos : prefLabel " label " ; ec : h asP aren tReg ion < lau1 >. skos : broader < nuts1 >. < lau1 > rdfs : label " label " ; < nuts1 > a skos : Concept ; ec : h asPa ren tReg ion < nuts3 > . skos : prefLabel " label " . < nuts3 > rdfs : label " label " ; ec : hasP aren tReg ion < nuts2 > . < nuts2 > rdfs : label " label " ; ec : hasP aren tReg ion < nuts1 > . 5.3 Visualizers < nuts1 > rdfs : label " label " . In this section, we present sample visualizers which visu- alize the results of the aforementioned analyzers. Moreover, Figure 4: Tree hierarchy visualizations of a pipeline based on analyzer A2 and transformer T1 . A treemap on the left side, sunburst and circle layout packing on the right side. they demonstrate how visualizers benefit from the concept of We chose to implement 4 different tree hierarchy visual- input signatures and compatibility checks. For each of visu- izers. To demonstrate the flexibility of a LDVM-compliant alizers, we will describe its input signature. Since a product framework, we decided to use a freely available visualiza- of a visualizer is not a dataset, but a visualization, there is tion techniques based on a well-known and commonly used no specification of an output data sample. The compatibil- document manipulation library D3.js 9 . Specifically, we in- ity check is, once again, a SPARQL ASK query which is, troduce the following visualizers: Zoomable Treemap, Sun- in the case of a visualizer, executed against an output data burst, Zoomable Sunburst, and Layout Packing. The library sample of the last transformer in a given LDVM pipeline. provides a module which produces adjacency diagrams or a Since one of the main reasons why the LDVM was pro- hierarchical layout using recursive circle-packing for a given posed is to facilitate the process of LOD exploration, we tree structure. It is not a hard task to build the expected have chosen to utilize some well-known visualization tech- tree structure of JavaScript objects based on the data that niques to present a dataset in a form, which is understand- conforms the described input signature. Among others, we able by non-expert users. We have experimented with two use Apache Jena10 to serialize the results of an analyzer or commonly used visualization techniques: a tree hierarchy a transformer into RDF/JSON 11 . The serialization is trans- visualization and a map visualization. One of the goals of ferred to a user’s web browser, processed by a visualizer our experiments was to show that it is possible to integrate and passed to the visualization library, which computes the well-known visualization libraries into an application, which visualization itself. Note that LDVM does not specify im- works with RDF and is based on the LDVM. plementation details, we could use JSON-LD, RDF/XML, Turtle or any other serialization format, moreover an arbi- 5.3.1 V1: Tree hierarchy visualizers trary non-RDF format. Tree hierarchy visualization is a commonly used visual- We present some visualizations based on aforementioned ization technique. The results of the analyzers A1 and A2 analyzers in Figure 4 (live demos 12 13 14 ). (followed by the transformer T1 ) contain hierarchical data structures which can be visualized in this way. As described 5.3.2 V2: Geo data visualizers before, we chose the SKOS vocabulary as a format for tree Geo data visualization is another example of commonly visualizations and therefore we present QV1 as the input sig- used visualization techniques. There are many Open Data nature for a tree hierarchy visualizer V1 : mashups that integrate map visualizations in order to pro- vide an eye–catching presentation of arbitrary datasets. # Q of V1 The input signature of a map visualizer can be actually [] a skos : Concept ; very simple. We define QV2 to be the only query of an input skos : prefLabel [] ; rdf : value [] ; signature of the visualizer V2 : skos : broader ? b . # Q of V2 ? b a skos : Concept ; ?[] s : geo ? c ; skos : prefLabel [] . s : description [] ; 9 http://d3js.org Query QV1 enforces that the visualized dataset contains a 10 https://jena.apache.org/ leaf node with a value specified, as well as a reference to its 11 https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html parent. To traverse the hierarchy, we use the skos:broader 12 http://vis.payola.cz/coi-treemap 13 property, which stands for the has parent relationship. It is http://vis.payola.cz/coi-z-sunburst now easy to check compatibility of QV1 with DA1 and DT1 . 14 http://vis.payola.cz/coi-packed resource (institution) to its type. In our case, the institution type can be a notary, a municipality, a ministry, etc. We al- low the users to use these properties as facets to customize the visualization when exploring the dataset by letting them, e.g., to change a color of a specific type of institutions or even hide them. Figure 5: Map visualization with the faceted brows- ing feature enabled. It presents data of a pipeline based on analyzer A1 . s : title [] . ?c s : geoCoordinates [ s : latitude [] ; Figure 6: Map visualization of the data produced by s : longitude [] . ] . analyzer A2 . To detect this type of properties, the following query is Again, it is easy to see that such a query would return true executed against the visualized dataset: when matched against DA1 or DA2 . Hence, V2 is compatible with A1 and A2 . select distinct ? p where We integrated two different map visualization libraries in { [] < http :// schema . org / geo > []; order to provide three different visualizers. Two of them are ? p [] . based on the Google Maps JavaScript API 15 and the third } one utilizes the ArcGIS API for JavaScript 16 . The former stands for a classic map visualization where a resource with It gives us a list of properties that might be used for geo data is represented on a map by a single marker. The marker grouping. Since such a list contains also properties other one generates a heatmap layer. The signature does not defined by the input signature, we need to involve property contain any additional values. Each resource contributes to blacklisting to exclude properties that would probably cre- the generated heatmap layer equally, while locally increasing ate a group for each marker separately (titles, descriptions, the intensity of the heatmap by one. The third visualizer uti- etc.). lizes a clustering layer which automatically groups markers Examples of visualizations with faceted browsing can be that are close to each other. When zooming in, markers are seen in Figures 517 and 618 (see footnotes for links to live getting further apart. Therefore, the layer dissolves clusters demos). When multiple properties are matched, we need to into smaller ones or reveals single markers. solve some minor issues such as compute visibility based on These map visualizers clearly motivate the notion of in- all filters or how to apply multiple color settings to a single put signatures. We have three different software compo- marker. In the case of the color, we let the user decide which nents doing the same task so it is very natural to unify their property is to be used to change colors of markers. input format. As in the case of hierarchies, we could uti- lize other vocabularies with properties such as wgs84:lat or 5.4 Evaluation geo:point, which, in fact, have the same semantic meaning. One of the main aims of LOD is to enable data reuse in That is also one of the reasons for the support of transformer unforeseen ways by third parties. Specifically, public bodies chaining in LDVM. We could have a LDVM pipeline, where representatives often state that they do not want to publish an analyzer outputs data in a proprietary geographic coordi- raw data, because it does not have a nice visualization for nate system, followed by a transformer, which converts such the public. And then they spend large amounts of money to a system to WGS84 using the wgs84 ontology, followed by a build custom portals with functionality that has been imple- transformer, which converts wgs84 to s:geoCoordinates. mented many times before that visualize their unpublished To make the basic map visualizer more usable, we ex- data. They are hard to convince that publishing the data tended it to provide a faceted browsing capability. Let us itself, not to mention in some standardized format or even use the results of analyzer A1 for demonstration. Consider as LOD, can bring them benefits. This is because for the input signature QV2 and the output data sample DA1 . A vi- general public the data is useless without interpretation in sualizer with this signature ignores other properties, such as a form of an application. What the public bodies do not ovm:typSubjektu (type of institution). The types of insti- realize is that the development of those applications could tutions are instances of skos:Concept, which is often used be left to third-parties. With these facts in mind we can as a type. This might suggest that the property links the say that with Payola framework and LDVM as its formal 15 17 https://developers.google.com/maps/documentation/javascript/ http://vis.payola.cz/ovm-gmaps 16 18 https://developers.arcgis.com/javascript/ http://vis.payola.cz/coi-gmaps background, we can show a library of analyzers, transform- ticated visualization widgets. One of them is focused on ers and visualizers that can be easily used and reused for visualization of spatial data. Since Facete is an exploration all Linked Data. In addition, we have concrete examples tool, it completely lacks features that would provide data as evidence of feasibility of this approach as shown in this analysis or transformation like Payola does. It just enables paper. We also showed that implementation of open source a user to explore and filter data from a chosen SPARQL visualizers such as those from D3js.org as plugins to Payola endpoint. Another group are vocabulary-based visualizers. can be done easily. The data from COI.CZ, that are pre- CubeViz [9] offers the same circle packing layout visualiza- sented in this paper, are one of the COMSODE datasets. tion as Payola does, but based on the DataCube vocabulary. The publisher of this data in COI.CZ now gets a free and Payola also offers an experimental version of a DataCube powerful visualization of its data and integration with other visualizer, but is not limited to it. FoaF Explorer 22 is fo- datasets and all he had to do was to publish a CSV file. cused on visualizing FOAF profiles. One can also mention ViDaX [8], a Java desktop Linked Data visualizer based on the Prefuse 23 visualization library. Based on ontologies 6. RELATED WORK and property types, it suggest suitable visualizations to its More and more projects are focused on analyzing, explor- users. However, we did not find a copy of the tool anywhere ing and visualizing Linked Data. The most sophisticated so we were unable to experiment with it. Rhizomer [3] of- survey to date has been presented by Dadzie and Rowe [6]. fers various types of visualizations for different datasets. It They concluded that most of the tools were not suitable to suggests a reasonable workflow for datasets exploration: 1) be used by lay users and the situation has not significantly Overview, 2) Filter, 3) Visualize. It includes the treemap, improved since. One is still required to understand the ba- timeline, map and chart visualizers. However, it focuses just sics of the Semantic Web while using Linked Data browsers on the visualization stage of a LDVM pipeline. such as Tabulator [2] and Explorator [1]. The user is ex- There are also tools which let the user to build a custom pected to navigate a graph through tabular views displaying analyzer as Payola does. The best known is Deri Pipes [13], property–value pairs of explored resources. However, they which is a platform that enables a user to create mashups do not offer features that would enable a user to overview a and perform data transformations. However, it is focused whole dataset. At the time of writing of this paper, Tabu- just on data analysis which means that there could be a Pay- lator did not support current versions of web browsers and ola analytical plugin which would use a pipeline produced therefore it was not possible to completely check up on its by Deri Pipes as another analyzer data source. Open Data progress. Compared to our one-click pipeline execution, Mashup 24 provides a very similar functionality, but it also which enables an expert user to prepare a visualization and offers visualizations based on vocabularies, including map share it with a non-expert one, we find Explorator a rather visualizations. It is based on two types of widgets a user complicated tool. It is not very easy even for an expert is able to combine together. The first one is a data source, user to start using it. Another exploration tool is Freebase the second one is a visualizer. However, it distinguishes only Parallax 19 , which offers advanced visualizations like time- two dataset types - statistical and spatial and lacks flexibil- lines, maps and other rich snippets, but works with a fixed ity since a visualizer receives data a widget which combines data source – Freebase. Semaplorer [15] is an exploration a data source, an analyzer and a transformer. mashup based on multiple large datasets. It demonstrates We have also seen some generic graph visualizers like Vi- the power of Linked Data while being focused on the tourism sualRDF 25 , which is a work in progress and is being cur- data domain. It provides faceted browsing capabilities for 4 rently developed while utilizing the D3.js library. Tools different types of facets (person, time, tag and location). like IsaViz [14], Fenfire [11] and RDF–Gravity 26 use the There are several tools that visualize data based on vo- well-known node-link visualization technique to represent a cabularies in a similar way that our new visualizers do. dataset. Payola also offers generic graph visualizations, but Let us start with visualizers like map4rdf 20 , LinkedGeoData on top of that, it provides a way of customizing the visu- browser [16] and Facete 21 , which understands spatial data. alization based on ontologies and user-defined vocabularies. The first two focus on geographical data visualizations, but Using an extensible library of visualizers, Payola is able to both are built on top of specific datasets. That means that visualize an arbitrary dataset. compared to Payola, the user is not able to apply the visual- IsaVis also belongs to a group of tools implementing Fres- izer to his own dataset. map4rdf supports faceted discovery nel - Display Vocabulary for RDF 27 , which specifies how a of Spanish institutions and enables the user to add a specific resource should be visually represented by Fresnel-compliant overlay containing statistical SCOVO-based data in a form tools like LENA 28 and Longwell 29 . Those are also focused of a timeline visualization. According to [7], the authors only on the visualization stage of LDVM. Fresnel vocabulary are currently working on DataCube vocabulary support. Its could be perceived as a LDVM visualization abstraction. most interesting feature is filtering of values by choosing We have already mentioned Facete, which is a SPARQL an arbitrary region on a map. LinkedGeoData browser en- based JavaScript library. There are also other similar li- ables its users to explore POIs all over the world. Facete is a JavaScript SPARQL-based Faceted Search library. It 22 enables users to explore an arbitrary dataset stored on an http://xml.mfd-consult.dk/foaf/explorer/ 23 arbitrary SPARQL endpoint. Using facets a user is able to http://prefuse.org/ 24 narrow down the volume of the explored data. Facete offers http://ogd.ifs.tuwien.ac.at/mashup/ 25 a basic table view, but it also provides some more sophis- https://github.com/alangrafu/visualRDF 26 http://semweb.salzburgresearch.at/apps/rdf-gravity/ 19 27 http://parallax.freebaseapps.com http://www.w3.org/2005/04/fresnel-info/ 20 28 http://oeg-dev.dia.fi.upm.es/map4rdf/ https://code.google.com/p/lena/ 21 29 http://cstadler.aksw.org/facete/ http://simile.mit.edu/issues/browse/LONGWELL braries like Sgvizler 30 or Visualbox 31 , which enables a user Symposium on Information Vizualization 2000, to embed a dataset visualization into their website. Un- INFOVIS ’00, Washington, DC, USA, 2000. IEEE. like Facete, they require a user to have a deep knowledge [6] A.-S. Dadzie and M. Rowe. Approaches to visualising of SPARQL language, since that is the only possible way of Linked Data. Semantic Web, 2(2):89–124, 2011. using those tools. Last but not least, we mention a publish- [7] A. de León, F. Wisniewki, B. Villazón-Terrazas, and ing framework Exhibit 32 . It enables the user to create web O. Corcho. Map4rdf - Faceted Browser for Geospatial pages with advanced search and filtering features providing Datasets. In Proceedings of the First Workshop on visualizations like maps, timelines or charts. However, it USING OPEN DATA. W3C, June 2012. requires the input data to be in a form of JSON and rec- [8] B. Dumas, T. Broché, L. Hoste, and B. Signer. Vidax: ommends using Babel 33 service to transform RDF and other An interactive semantic data visualisation and data formats into the desired JSON variant. exploration tool. In Proceedings of the International Working Conference on Advanced Visual Interfaces, 7. CONCLUSIONS AVI ’12, pages 757–760, New York, NY, USA, 2012. ACM. In this paper, we presented the Czech LOD cloud – a set of interlinked LOD datasets we have published in our research [9] I. Ermilov, M. Martin, J. Lehmann, and S. Auer. group and we used it for demonstration of the benefits of Linked open data statistics: Collection and the Linked Data Visualization Model (LDVM) for LOD vi- exploitation. In P. Klinov and D. Mouromtsev, sualization. We briefly recapitulated the basic principles of editors, Knowledge Engineering and the Semantic LDVM, updated its formalization and shortly described our Web, volume 394 of Communications in Computer own implementation of LDVM - Payola. Then we presented and Information Science, pages 242–249. Springer several visualization scenarios of datasets from the Czech Berlin Heidelberg, 2013. LOD cloud. The scenarios demonstrated benefits of LDVM [10] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From for users – how they can combine various LDVM components data mining to knowledge discovery in databases. AI to extract required data structures from their datasets (with magazine, 17(3):37, 1996. so called analyzers and visualization transformers) and how [11] T. Hastrup, R. Cyganiak, and U. Bojars. Browsing even lay users can easily reuse suitable visualizers to visual- Linked Data with Fenfire. In Linked Data on the Web ize the extracted structures. (LDOW2008) workshop, in conjunction with WWW 2008 conference, 2008. [12] J. Klı́mek, J. Helmich, and M. Nečaský. Payola: 8. ACKNOWLEDGMENTS Collaborative Linked Data Analysis and Visualization This work was partially supported by a grant from the Eu- Framework. In 10th Extended Semantic Web ropean Union’s 7th Framework Programme number 611358 Conference (ESWC 2013), pages 147–151. Springer, provided for the project COMSODE and also partially by 2013. the TAČR grant no. TA02010182. [13] D. Le-Phuoc, A. Polleres, M. Hauswirth, G. Tummarello, and C. Morbidoni. Rapid prototyping 9. REFERENCES of semantic mash-ups through semantic web pipes. In Proceedings of the 18th international conference on [1] S. Araujo, D. Shwabe, and S. Barbosa. Experimenting World wide web, WWW ’09, pages 581–590, New with Explorator: a Direct Manipulation Generic RDF York, NY, USA, 2009. ACM. Browser and Querying Tool. In WS on Visual [14] E. Pietriga. IsaViz: a Visual Environment for Interfaces to the Social and the Semantic Web Browsing and Authoring RDF Models. In WWW (VISSW2009), 2009. 2002, the 11th World Wide Web Conference, [2] T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly, Honolulu, Hawaii, USA, 2002. World Wide Web R. Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets. Consortium. Tabulator: Exploring and analyzing linked data on the [15] S. Schenk, C. Saathoff, S. Staab, and A. Scherp. semantic web. In 3rd Int. Semantic Web User SemaPlorer—interactive semantic exploration of data Interaction WS, 2006. and media based on a federated cloud infrastructure. [3] J. Brunetti, R. Gil, and R. Garcia. Facets and Web Semantics: Science, Services and Agents on the Pivoting for Flexible and Usable Linked Data World Wide Web, 7(4):298–304, 2009. Exploration. In Interacting with Linked Data [16] C. Stadler, J. Lehmann, K. Höffner, and S. Auer. Workshop, ILD’12, Crete, Greece, May 2012. LinkedGeoData: A Core for a Web of Spatial Open [4] J. M. Brunetti, S. Auer, R. Garcı́a, J. Klı́mek, and Data. Semantic Web Journal, 2011. M. Nečaský. Formal Linked Data Visualization Model. In Proceedings of the 15th International Conference on Information Integration and Web-based Applications & Services (IIWAS’13), pages 309–318, 2013. [5] E. H. Chi. A Taxonomy of Visualization Techniques Using the Data State Reference Model. In IEEE 30 http://dev.data2000.no/sgvizler/ 31 http://alangrafu.github.io/visualbox/ 32 http://www.simile-widgets.org/exhibit/ 33 http://service.simile-widgets.org/babel/