=Paper=
{{Paper
|id=Vol-1409/paper-08
|storemode=property
|title=Use Cases for Linked Data Visualization Model
|pdfUrl=https://ceur-ws.org/Vol-1409/paper-08.pdf
|volume=Vol-1409
|dblpUrl=https://dblp.org/rec/conf/www/KlimekHN15
}}
==Use Cases for Linked Data Visualization Model==
<pdf width="1500px">https://ceur-ws.org/Vol-1409/paper-08.pdf</pdf>
<pre>
                Use Cases for Linked Data Visualization Model

                       Jakub Klímek                     Jiří Helmich                              Martin Nečaský
             Czech Technical University in      Charles University in Prague                Charles University in Prague
                       Prague                   Faculty of Mathematics and                  Faculty of Mathematics and
                Faculty of Information                    Physics                                     Physics
                     Technology                 helmich@ksi.mff.cuni.cz                 necasky@ksi.mff.cuni.cz
                 klimek@fit.cvut.cz

ABSTRACT                                                         catalog instance, which provides full–text search and key-
There is a vast amount of Linked Data on the web spread          word and faceted browsing of the textual metadata of the
across a large number of datasets. One of the visions be-        datasets. To be able to decide whether a given dataset is
hind Linked Data is that the published data is conveniently
reusable by others. This, however, depends on many details
such as conformance of the data with commonly used vocab-
ularies and adherence to best practices for data modeling.                                                                Elections
                                                                                                  Budgets                  results
Therefore, when an expert wants to reuse existing datasets,                       Demogra                                                     Exchange
                                                                                    phy                                                         rates
he still needs to analyze them to discover how the data is                                                    Czech
modeled and what it actually contains. This may include                                                     Ministry of         Contracts,
                                                                                                             Finance            Invoices,
analysis of what entities are there, how are they linked to                                   NUTS             data             Payments
                                                                          LAU                 codes
other entities, which properties from which vocabularies are            regions
used, etc. What is missing is a convenient and fast way of                                                                                   TED
seeing what could be usable in the chosen unknown dataset                                                                                   Public
                                                                                                                                           Contracts
without reading through its RDF serialization. In this paper               Geocoordi                         Czech
                                                                                              RUIAN
we describe use cases based on this problem and their real-                  nates                          Business            Czech
                                                                                                            Entity IDs
ization using our Linked Data Visualization Model (LDVM)                                                                        Public
                                                                                                                               Contracts         CPV 2008
and its new implementation. LDVM is a formal base that
exploits the Linked Data principles to ensure interoperability
                                                                                                                    Institution
and compatibility of compliant analytic and visualization                   COI.CZ            ARES
                                                                                                                    s of public       Consolida
                                                                                             Business                                  ted Law
components. We demonstrate the use cases on examples                                                                  power
                                                                                              Entities
                                                                                                                      (OVM)
from the Czech Linked Open Data cloud.
                                                                    Governmental                              Czech
                                                                                                                                 OVM              Court
Categories and Subject Descriptors                                  Business-entities         Research
                                                                                                              Social
                                                                                                             Security
                                                                                                                                Agendas         decisions
                                                                    Geographical               projects
                                                                                                             Administr
H.5.2 [User interfaces]: GUIs, Interaction styles; H.3.5            Statistical                                ation
[Online Information Services]: Data sharing; H.3.5 [Online
Information Services]: Web-based services
                                                                 Figure 1: The 2015 Czech Linked Open Data Cloud
Keywords
Linked Data, RDF, visualization, discovery                       valuable for his use case or not, the expert needs to find out
                                                                 whether it contains the expected entities and their proper-
1.      INTRODUCTION                                             ties. In addition, the entities and properties can be present
  A vast amount of data represented in a form of Linked          in the dataset, but they may be described using a differ-
Open Data (LOD) is now available on the Web. Therefore,          ent vocabulary than expected. A good documentation of
the focus of Linked Data experts now starts to shift from        datasets is rare, therefore, the expert needs to go through
the creation of LOD datasets to their consumption and sev-       the dataset manually, either by loading it to his triplestore
eral new problems arise. Consider a Linked Data expert           and examining it using SPARQL queries or by looking at the
working for a modern company who has a task of finding           RDF serialization in a text editor. Only recently some other
suitable datasets in the LOD Cloud1 that would enhance           technical approaches started to emerge such as LODeX [2]
the company’s internal data. As of today, he can search          that shows some statistics about the number of RDF triples,
for the datasets in http://datahub.io, which is a CKAN2          classes, etc. and tries to extract the data schema from a
1
                                                                 SPARQL endpoint. However, what the expert would really
    http://lod-cloud.net/                                        need is a simple process where he would provide the data
2
    http://ckan.org/                                             and see whether the entities he expects, and maybe even
                                                                 some others he does not expect, can be found in the given
Copyright is held by the author/owner(s).                        dataset, and see them immediately. In this paper, we will
WWW2015 Workshop: Linked Data on the Web (LDOW2015).             focus on this part of dataset consumption. Now let us assume
that the expert found a suitable dataset such as a list of all       fast, when the expert knows the source data and the desired
cities and their geocoordinates and saw its map visualization.       RDF form well, or they can be slow, when for example the
This data could enhance his closed enterprise linked data            expert shows the result of each iteration to a customer and
containing a list of the company’s offices. It would then be         discusses what part of the source data is he going to transform
useful for the expert if he could just include the appropriate       next. Either way, it would be better to have a visualization
links from his dataset to the discovered one and see his offices     accompanying the data in each iteration, which would show
on a map in a generic visualization. It would be even better         how the data gets better and more detailed. Also, trying
if he could then refine this visualization instead of creating       to visualize the resulting data provides additional means of
another one from scratch.                                            validation of the transformation, e.g. when it is entities on
   In this paper we define the use cases which have the po-          a map, it is always better to see the result on a map than
tential to help Linked Data experts in their work. Then              just an RDF text file. On the other hand, the visualization
we briefly describe our Linked Data Visualization Model              method needs to be quick and easy and not custom made
(LDVM) [4] and show its new implementation, using which              for the data, because the data between iterations is only
we demonstrate that the use cases can be executed, which is          temporary as it lasts only until it gets improved in the next
the main contribution of this paper. We will demonstrate our         iteration. However, this is made possible by the Linked
approach using the datasets we prepare in our OpenData.cz            Data vocabulary reuse principle, all we need is a library of
initiative, which can be seen in Figure 1, namely Institutions       components supporting standard vocabularies and usage of
of Public Power (OVM3 ) and registry of land identification,         the vocabularies in the data, which is a well known best
addresses and properties of the Czech Republic (RUIAN4 ).            practice. Finally, when developing advanced visualizations,
   This paper is structured as follows. In Section 2 we define       the designer can start with the automatically offered one and
the use cases we want to support by our approach. In Sec-            refine it instead of starting from scratch.
tion 3 we briefly describe the principles and basic concepts of         An example of this use case is that when a dataset con-
LDVM. In Section 4 we introduce our new proof–of–concept             taining a hierarchy is provided, then a visualization using a
LDVM implementation and describe its components and re-              hierarchy visualizer should be offered and it should display
lation to LDVM. In Section 5 we show a sample visualization          some meaningful data from the source dataset. To be spe-
pipeline. In Section 6 we show how we use our implementa-            cific, we will show this use case on a dataset that contains a
tion to support the defined use cases. In Section 7 we survey        hierarchy or regional units ranging from individual address
related work and in Section 8 we conclude.                           points to the whole country.

2.      MOTIVATING USE CASES                                         2.2    What Can I Combine My Data With To
                                                                            See More
  In this section we motivate our work using a series of use
cases with which we aim at helping Linked Data experts in               The second use case is to show which additional visualiza-
various stages of their work.                                        tions of the input data can be used when the data is simply
                                                                     linked to another dataset. One motivation of this use case is
2.1      What Can I See in the Given Data                            to visually prove the value of linking by showing the addi-
                                                                     tional visualization options gained by it. Another motivation
   The first use case is to show possible visualizations of data
                                                                     is the expert in a modern company that has its internal
in a given dataset. The dataset must be given in an easy
                                                                     linked data and wants to see the improvement gained by
way - using either a link to an RDF dump or using direct
                                                                     linking it to the public LOD cloud. For this use case the user
RDF file upload, or using a link to a SPARQL endpoint that
                                                                     should be able to provide his data easily as in the previous
contains the data. The result should be a list of possible
                                                                     use case. This time he is interested in seeing which additional
visualizations that could be meaningful for the dataset. When
                                                                     visualizations of his data he can use when he linked his data
the user clicks on a possible visualization, he should see his
                                                                     to another dataset. The result should again be a list of pos-
data visualized by the selected technique, e.g. on a map,
                                                                     sible visualizations which, however, use not only the input
using a hierarchy visualizer, etc. This use case has several
                                                                     dataset, but also some other to achieve a better visualization.
motivations. Firstly, one should be able to quickly sample a
                                                                     For example, a dataset with addresses of public institutions,
previously unknown dataset that may be potentially useful
                                                                     linked to a geocoded dataset of all addresses yields a map
based on its textual description such as the one on http:
                                                                     visualization with no additional effort.
//datahub.io. Another motivation for this use case is the
need to be able to quickly and easily show someone what can          2.3    What Data Can I Visualize Like This
be done with his data in RDF. In addition, this use case can
help Linked Data experts even during the process of Linked              The third use case is a reverse one compared to the previ-
Data creation which usually happens in iterations. In the first      ous two. It is to show datasets or their combinations which
iteration of creating Linked Data an expert usually writes           can be visualized using a selected visualizer. The motivation
a transformation of some basic information about entities            for this use case is that the user sees a visualization that he
in the source data such as their names and types. Then he            likes and he wants to prepare his data so that it is compati-
reviews the created RDF data, selects another portion of             ble with the visualization. For that he wants to see which
the source data, amends his transformation, executes it and          other datasets possibly combined with some transformations
again observes the resulting, more detailed RDF data. He             use this visualization. For this use case the user selects a
repeats this process until all of the source data, or at least the   visualization and he should get a list of data sets possibly
desired parts, is transformed to RDF. The iterations can be          with transformations which can be visualized by the selected
                                                                     visualizer. For example, the user selects a map visualizer
3
    http://datahub.io/dataset/cz-ovm                                 and he should see that a dataset with a list of cities can be
4
    http://datahub.io/dataset/cz-ruian                               visualized this way.
3.    LINKED DATA VISUALIZATION MODEL                            out component combinations that do not make any sense
   To realize the use cases defined in the previous section      and to help the users to use the right components before
we will use a new implementation of our Linked Data Visu-        they actually run the pipeline. Therefore, we need a way to
alization Model (LDVM), which we defined and refined in          check the compatibility without the actual data.
our previous work [4, 6, 8]. It is an abstract visualization        Each LDVM component has a set of features, where each
process customized for the specifics of Linked Data. In short,   feature represents a part of the expected component func-
LDVM allows users to create data visualization pipelines that    tionality. A component feature can be either mandatory or
consist of four stages: Source Data, Analytical Abstraction,     optional. For example, a visualizer that displays points and
Visualization Abstraction and View. The aim of LDVM is           their descriptions on a map can have 2 features. One feature
to provide means of creating reusable components at each         represents the ability to display the points on a map. This
stage that can be put together to create a pipeline even         one will be mandatory, because without the points, the whole
by non-expert users who do not know RDF. The idea is to          visualization lacks purpose. The second feature will represent
let expert users to create the components by configuring         the ability to display a description for each point on the map.
generic ones with proper SPARQL queries and vocabulary           It will be optional, because when there is no data for the
transformations. In addition, the components are configured      description, the visualization still makes sense - there are
in a way that allows the LDVM implementation to automat-         still points on a map. Whether a component feature can be
ically check whether two components are compatible or not.       used or not depends on whether there is the data needed for
If two components are compatible, then the output of one         it on the input. Therefore, each feature is described by a set
can be connected to the input of the other in a meaningful       of input descriptors. An input descriptor describes what is
way. With these components and the compatibility checking        expected on the inputs of the component. In this paper we
mechanism in place, the visualization pipelines can then be      use a set of SPARQL queries for the descriptor. We could
created by non-expert users.                                     also consider other forms of descriptors, but that is not in
                                                                 the scope of this paper. A descriptor is applied to certain
3.1    Model Components                                          inputs of its component.
   There are four stages of the visualization model populated       In order to evaluate the descriptors in design time, we
by LDVM components. Source Data stage allows a user              require that each LDVM component that produces data (data
to define a custom transformation to prepare an arbitrary        source, analyzer, transformer) also provides a sample of the
dataset for further stages, which require their input to be      resulting data, which is called an output data sample. For the
RDF. In this paper we only consider RDF data sources such        data sample to be useful, it should be as small as possible,
as RDF files or SPARQL endpoints, e.g. DBPedia. The              so that the input descriptors of other components execute as
LDVM components at this stage are called data sources.           fast as possible on this sample. Also, it should contain the
The Analytical Abstraction stage enables the user to specify     maximum amount of classes and properties whose instances
analytical operators that extract data to be processed from      can be produced by the component, making it as descriptive
one or more data sources and then transform it to create         as possible. For example, when an analyzer transforms data
the desired analysis. The transformation can also compute        about cities and their population, its output data sample will
additional characteristics like aggregations. For example, we    contain a representation of one city with all the properties
can query for resources of type dbpedia-owl:City and then        that the component can possibly produce. Note that, e.g.
compute the number of cities in individual countries. The        for data sources, it is possible to implement the evaluation
LDVM components at this stage are called analyzers. In the       of descriptors over the output data sample as evaluation
Visualization Abstraction stage of LDVM we need to prepare       directly on the represented SPARQL endpoint. For other
the data to be compatible with the desired visualization         components, fast evaluation can be achieved by using a static
technique. We could have prepared the analytical abstraction     data sample.
in a way that is directly compatible with a visualizer. In          We say that a feature of a component in a pipeline is
that case, this step can be skipped. However, the typical        usable when all queries in all descriptors are evaluated true
use case for Visualization Abstraction is to facilitate reuse    on their respective inputs. A component is compatible with
of existing analyzers and existing visualizers that work with    the mapping of outputs of other components to its inputs
similar data, only in different formats. For that we need to     when all its mandatory features are usable. The usability of
use a LDVM transformer. In View stage, data is passed to         optional features can be further used to evaluate the expected
a visualizer, which creates a user-friendly visualization. The   quality of the output of the component. For simplicity, we
components, when connected together, create a analytic and       do not elaborate on the output quality in this paper. The
visualization pipeline which, when executed, takes data from     described mechanism of component compatibility can be used
a source and transforms it to produce a visualization at the     in design time for checking of validity of the visualization
end. Not every component can produce meaningful results          pipeline. It can also be used for suggestions of components
from any input. Typically, each component is designed for        that can be connected to a given component output. In
a specific purpose, e.g. visualizing map data, and therefore     addition, it can be used in run time for verification of the
it does not work with other data. To create a meaningful         compatibility using the actual data that is passed through
pipeline, we need compatible components.                         the pipeline. Finally, this concept can be also used for
                                                                 periodic checking of data source content, e.g. whether the
3.2    Component Compatibility                                   data has changed its structure and therefore became unusable
                                                                 or requires pipeline change. For a detailed description and
   Now that we described the four basic types of LDVM
                                                                 thorough examples of compatibility checking on real world
components, let us take a look at the notion of their compati-
                                                                 data see our previous paper [8].
bility, which is the key feature of LDVM. We want to use the
checking of component compatibility in design time to rule
                                                 Figure 2: LDVM Vocabulary


4.   ARCHITECTURE OF THE NEW LDVM                                   we have devised a vocabulary for LDVM. In Figure 2 there is
     IMPLEMENTATION                                                 a UML class diagram depicting the structure of the LDVM
                                                                    vocabulary. Boxes represent classes, edges represent object
   In our previous LDVM implementation version Payola [7]           properties (links) and properties listed inside of the class
we had the following workflow. First, the pipeline designer         boxes represent data properties. The architecture of our new
registered the data sources he was planning to use, if they         implementation corresponds to the vocabulary. The data
were not already registered. Then he started to create an           entities correspond to software components and their config-
analysis by selecting data sources and then added analyzers
                                                                    uration. We chose the ldvm5 prefix for the vocabulary. The
and transformers to the analytic pipeline. Then the pipeline        vocabulary and examples are developed on GitHub6 . Let us
was executed and when done, the pipeline designer selected          now briefly go through the individual parts of the vocabulary,
an appropriate visualizer. There were no hints of which             which correspond to parts of the LDVM implementation
visualizer is compatible with the result of the analysis and this
                                                                    architecture.
workflow contained unnecessary steps. The pipeline and its
components existed only inside Payola with no means of their        4.1      Templates and Instances
creation and management from the outside. Nevertheless it
                                                                       In Figure 2 there are blue and green (dashed) classes.
demonstrated some advantages of our approach. It showed
                                                                    The blue classes belong to template level of the vocabulary
that pipelines created by expert users can be reused by lay
                                                                    and green classes belong to the instance level. These two
users and the technical details can be hidden from them.
                                                                    levels directly correspond to two main parts of the LDVM
It also showed that results of one analysis can be used by
                                                                    implementation. At the template level we register LDVM
various visualizers and also it showed that one analysis can
                                                                    components as abstract entities described by their inputs,
run on various data sources.
                                                                    outputs and default configuration. At the instance level we
   In our new implementation of LDVM we aim for having
                                                                    have a pipeline consisting of interconnected specific instances
individual components running as independent web services
                                                                    of components and their configurations. The easiest way to
that accept configuration and exchange information needed
                                                                    imagine the division is to imagine a pipeline editor with a
to get the input data and to store the output data. Also
                                                                    toolbox. In the toolbox, there are LDVM component tem-
we aim for easy configuration of components as well as easy
                                                                    plates and when a designer wants to use a LDVM component
configuration of the whole pipeline. In accordance with the
                                                                    in a pipeline, he drags it onto the editor canvas, creating an
Linked Data principles, we now use RDF as the format for
                                                                    instance. There can be multiple instances of the same LDVM
storage and exchange of configuration so that any code that
works with RDF can create, maintain and use LDVM com-               5
                                                                        http://linked.opendata.cz/ontology/ldvm/
                                                                    6
ponents both individually and in a pipeline. For this purpose           https://github.com/payola/ldvm
component template in a single pipeline, each with a different    laries. It can be even linked to other datasets according to
configuration that overrides the default one. The template        the Linked Data principles. Therefore it is not a trivial task
holds the input descriptors and output data samples which         to determine the boundaries of the configuration data in the
are used for the compatibility checking together with the         RDF data graph in general. On the other hand, each com-
instance input and output mappings. Each instance is con-         ponent knows precisely what is expected in its configuration
nected to its corresponding template using the instanceOf         and in what format. This is why we need each component
property.                                                         to provide a SPARQL query that can be used to obtain
                                                                  its configuration data so that the LDVM implementation
4.2    Component Types                                            can extract it. That SPARQL query is connected to every
   There are four basic component types as described in           configuration using the mandatory configurationSPARQL
Section 3.1 - data sources, analyzers, transformers and visu-     property.
alizers. They have their representation on both the template
level - descendants of the ComponentTemplate class - and in-      4.6    Pipeline
stance levels - descendants of the ComponentInstance class.          Finally, the pipeline itself is represented by the Pipeline
From the implementation point of view, transformers are just      class instance. It links to all the instances of LDVM compo-
analyzers with one input and one output, so the difference        nents used in the pipeline. Another feature supporting col-
is purely semantic. This is why transformers are subclass of      laboration of expert and non-expert users is pipeline nesting.
analyzers.                                                        An expert can create a pipeline that is potentially complex in
                                                                  number of components, their configuration and binding, but
4.3    Data Ports                                                 could be reused in other pipelines as a black box data source,
  Components have input and output data ports. On the             analyzer or transformer. As this feature is not important in
template level we distinguish the inputs and outputs of a         this paper, we do not further describe it. It is sufficient to say
component. To InputDataPortTemplate the input descrip-            that the nestedPipeline and nestedBoundTo properties of
tors of features can be applied. OutputDataPortTemplate           LDVM serve this purpose.
has the outputDataSample links to the output data samples.
Both are subclasses of DataPortTemplate. The data ports           4.7    Component Compatibility Checking
are mapped to each other - output of one component to                The component compatibility checks (see Section 3.2) are
input of another - as instances of DataPortInstance using         exploited in various places. The checks can happen during
the boundTo property. This data port instance mapping             run time when the actual data passed between components
forms the actual visualization pipeline which can be then exe-    is verified as it is passed along the pipeline. They can also
cuted. Because data ports are not LDVM components, their          happen in scheduled intervals when existing pipelines are
instances are connected to their templates using a separate       re-checked to determine possible changes in data sources that
property dataPortInstanceOf.                                      can cause pipelines to stop being executable. This can be
                                                                  easily used for verification of datasets that change frequently.
4.4    Features and Descriptors                                   Another usage of the checks is during design of a pipeline in a
   On the template level, features and descriptors (see Sec-      future pipeline editor, which is not implemented yet, when a
tion 3.2) of components are represented. Each component           user wants to connect two components in a pipeline. However,
template can have multiple features connected using the           the most valuable usage of component compatibility checking
feature property. The features themselves - instances of          is in the pipeline discovery algorithm.
either the MandatoryFeature class or the OptionalFeature
class - can be described using standard Linked Data tech-         4.8    Pipeline Discovery Algorithm
niques and vocabularies such as dcterms and skos. Each               The pipeline discovery algorithm is used to generate all
feature can have descriptors, instances of Descriptor con-        possible pipelines based on a set of datasets and is therefore
nected using the descriptor property. The descriptors have        the core functionality for this paper and the use cases it
their actual SPARQL queries as literals connected using the       demonstrates. It is inspired by the classical Breadth-first
query property. In addition, the input data port templates        search (BFS) algorithm where, simply put, an edge between
to which the particular descriptor is applied are denoted         two nodes representing LDVM components in a pipeline ex-
using the appliesTo property.                                     ists if and only if the descriptor of the second one matches
                                                                  the output data sample of the first one. The edge then repre-
4.5    Configuration                                              sents the possibility to pass data from the output of the first
   Now that we have the LDVM components, we need to               component to the input of the second component. The algo-
represent their configuration. On the template level, com-        rithm works in iterations and builds up pipeline fragments
ponents have their default configuration connected using          in all compatible combinations. We will demonstrate it on
the componentConfigurationTemplate property. On the               an example of two data sources (RUIAN, Institutions), a
instance level, components point to their configuration using     RUIAN geocoder analyzer, which takes two inputs, a Towns
the componentConfigurationInstance property when it is            extractor, and a Google Maps visualizer.
different from the default one. The configuration itself is the      It starts with the inputs of all available LDVM components
same whether it is on the template level or the instance level    (analyzers, transformers, visualizers) checking selected data
and therefore we do not distinguish the levels here and we        sources, which form trivial, one member pipeline fragments.
only have one class ComponentConfiguration.                       In Figure 3 we can see the trivial fragments in the top
   The structure of the configuration of a LDVM component         right part. In the first step every available component is
is completely dependent on what the component needs to            checked with each selected data source. When a component’s
function. It is also RDF data and it can use various vocabu-      input is compatible with the output of the last component
                                                                 ented from leaves to root where leaves are data sources and
                                                                 the root is the visualizer. Of course the complexity of this
                                                                 algorithm raises with the number of components and data
                                                                 sources to check. On the other hand the compatibility checks
                                                                 greatly reduce the number of possibilities and leave only the
                                                                 compatible ones. A more rigorous measurements of time
                                                                 consumed are part of our future work.

                                                                 4.9    Component implementations
                                                                   Note that we have not talked about the actual LDVM
                                                                 component implementations yet, only templates, which are
                                                                 abstract descriptions and default configurations, and instance,
                                                                 which are representations in a pipeline with inputs and out-
                                                                 puts bound to other components. Our vision is to have
                                                                 LDVM component implementations as standalone web ser-
                                                                 vices configurable by RDF, reusable among LDVM instances
                                                                 and other applications. They would live separately from our
                                                                 LDVM implementation instance, which would then serve
       Figure 3: Pipeline discovery iteration 1                  only as a pipeline editor, catalog and launcher (see Figure 6).
                                                                 The components would register to the LDVM instance with
                                                                 access data and a link to LDVM templates that can be
of a pipeline fragment (output of a data source in the first     processed using this component implementation. The RDF
iteration), it is connected. A successful check is denoted by
a green arrow and a unsuccessful one with a red arrow.
   When all of the components inputs are connected, the com-
ponent is added to the pipeline fragment and this fragment
gets checked again by all LDVM component inputs in the
next iteration. This is visible in Figure 4 – a pipeline frag-
ments from iteration 1 ending with the RUIAN geocoder are
not checked until both inputs of the geocoder are connected,
which happens in iteration 2 and 3. When the algorithm
reaches a visualizer and binds all of its inputs, the pipeline
fragment leading to this visualizer is saved as a possible
pipeline. This happens in Figure 3 with the 2 member
pipeline and in Figure 4 iterations 3 and 4. When there are
no new pipeline fragments to consider in the next algorithm
iteration, we have generated all possible pipelines and the
algorithm ends. In the example we generated 3 pipelines.
Note that the generated pipelines are in a form of trees ori-


                                                                 Figure 6: LDVM implementation and LDVM com-
                                                                 ponents

                                                                 configuration sent to the component implementation would
                                                                 contain the actual component instance configuration together
                                                                 with access information for getting input data, storing out-
                                                                 put data and a callback URL for sending execution result
                                                                 information. The execution result information would include
                                                                 the status (success, failure) and optionally logs to be dis-
                                                                 played to the pipeline user. However, the complete freedom
                                                                 of the components is not a trivial development task, so it
                                                                 still remains a part of our future work. Now, the actual
                                                                 component implementations we use run on the same machine
                                                                 as the LDVM implementation and we have a hard coded list
                                                                 of which components can process which LDVM component
                                                                 templates. Nevertheless, this implementation limitation does
                                                                 not affect the use cases we show in this paper.

                                                                 5.    LDVM PIPELINE EXAMPLE
                                                                   Let us now briefly demonstrate how a simple LDVM
                                                                 pipeline looks like from the data point of view. See Fig-
   Figure 4: Pipeline discovery - more iterations                ure 5. Again, we will have blue for template level and green
                                            Figure 5: Sample LDVM Pipeline


(dashed) for instance level. The instance level is simpler,      We are still in early stages of the new implementation, it
let us start with it. On the bottom we have a pipeline,          runs at http://ldvm.opendata.cz and all the mentioned
which points to LDVM component instances that belong to          LDVM components should be loaded there. The use cases
the pipeline via the ldvm:member property. It is a simple        from this section should be executable there and anyone can
pipeline consisting of one data source, one analyzer and a       experiment with any SPARQL endpoint or RDF data that
visualizer. The data source instance is configured to access     is compatible with the loaded components.
our RUIAN SPARQL endpoint and it is an instance of a
generic SPARQL endpoint data source template. The ana-
lyzer instance extracts information about Czech towns from
the data source, which contains far more information. The
descriptor of the analyzer’s feature Input contains Towns is
a SPARQL ASK query which checks for presence of a data
source with towns information and is applied to the input
data port template of the analyzer. The output data port
template of the analyzer has a link to an output data sample,
which is a Turtle file containing data about one Czech town
as a sample. This brings us to the Google Maps visualizer,
which only has input data port template with one feature
and a descriptor checking for presence of geocoordinates in
its input data. Note that the data port binding is done on                    Figure 7: Endpoint selection
the instance level, which is what will be seen and edited in a
pipeline editor. On the other hand, features, descriptors and
output data samples are all on the template level. Because       6.1    What Can I See in the Given Data
RUIAN includes geocoordinates for each entity, the resulting        In this use case we are in the role of a Linked Data expert
visualization shows towns in the Czech Republic on a map.        that has a link to a dataset and wants to quickly see what
                                                                 can be seen in it. The actual offered visualizations depend
                                                                 solely on the set of analyzers, transformers and visualizers
6.   REALIZATION OF USE CASES                                    present in the LDVM instance. We assume that the expert
   In our new proof–of–concept implementation of LDVM            has his LDVM instance populated with components that he
(LDVMi) we aim at a more straight forward workflow utiliz-       plans to use and he checks a potentially useful dataset he
ing the compatibility checking feature of LDVM. The goal         has a link to. For the realization of this use case we will use
is to provide the user with some kind of a meaningful vi-        the RUIAN dataset. It contains a lot of data among which
sual representation of his data as easily as possible. This      is a hierarchy of regional units ranging from address points
means that the user specifies the location of his data and       to the whole country modeled using SKOS. We have a link
that should be all that is needed to show some initial visu-     to the dataset in a SPARQL endpoint, so we point to it (see
alizations. This is achieved by our compatibility checking       Figure 7). Next, we see the list of possible pipelines based
mechanism (see Section 3.2) using which we generate all pos-     on the evaluation of compatibility with the endpoint, see
sible pipelines that can be created using the data sources and   Figure 8. We can see that there are 3 pipelines. We can
LDVM components registered in the used LDVM instance.            also see their data sources, their visualizer, and in between
                                                                                     Figure 10: File upload


            Figure 8: Possible pipelines list                    of Public Power from DataHub.io. This could also be an
                                                                 internal dataset of a company. Note that we will also add
                                                                 the schema of the data for better filtering in the visualizer,
the number of analyzers the pipeline contains. The first two     which would, however, typically be in the second iteration
end with a Google Map visualizer – the dataset contains          after we found out that without it, we cannot filter properly
geocodes for all objects – and the third one with a TreeMap      in the visualizer, but we would still see the towns on the map.
hierarchy visualization, which means that a hierarchy was        The dump and schema are in TriG serialization, which is not
found in our dataset and it can be visualized. We select         yet supported, however it is easy to convert it to Turtle. We
the third pipeline and we see the hierarchy as in Figure 9.      upload the schema7 , the links8 to the RUIAN dataset and
This proves that using a LDVM instance populated with            the data in Turtle9 as in Figure 10. After the file upload is
                                                                 finished, we see the possible pipelines as in Figure 11. Note


                                                                 Figure 11: Possible pipelines for dataset combina-
                                                                 tion

                                                                 that this pipeline discovery is the same as in the example in
                                                                 4.8 with one difference. There, the algorithm searched for all
                                                                 pipelines that could be created from the two datasets, which
Figure 9: Treemap visualization of a simple hierar-              included a direct visualization of RUIAN on a map. Here,
chy                                                              the discovery algorithm does not return this pipeline because
                                                                 it searches for pipelines that visualize the linked dataset in
a set of LDVM components, we can easily check whether            combination with other data sources. Therefore, we have two
a given dataset contains usable data and see some initial        possibilities of how to see our newly linked dataset on a map.
results quickly. This is thanks to the static data samples       One is applying the RUIAN Towns geocoder to the linked
and SPARQL ASK based descriptors. As part of our future          dataset and takes the whole RUIAN data source as the other
work we will do rigorous measurements to see the response        input. This one, while theoretically possible, is not usable in
times according to number of components and number and           our current implementation because the whole RUIAN data
complexity of descriptors.                                       source is large (600M triples) and contains tens of thousands
                                                                 of entities of various types. This is why we will choose the
6.2   What Can I Combine My Data With To                         other possible pipeline, which, in addition, runs the RUIAN
      See More                                                   data through RUIAN Towns extractor analyzer, which filters
   We assume that we have the RUIAN data source and Maps         out data about other RUIAN entities. The chosen pipeline
visualizer registered in our LDVM instance. In addition, we      can be seen in Figure 12. All that is left is to evaluate the
will have a RUIAN Towns geocoder analyzer registered, which      pipeline (press the Run button) and display the resulting
takes a list of towns from RUIAN (reference dataset) on one      visualization that can be seen in Figure 13. The filtering of
input and the dataset linked to RUIAN (linked dataset) on        7
                                                                     http://opendata.cz/ldvm/ovm-vocab.ttl
the other input. It outputs the linked dataset enhanced with     8
                                                                   https://raw.githubusercontent.com/payola/ldvm/master/rdf/
GPS geocoordinates of the entities from the reference dataset.   examples/ovm-obce-links.ttl
                                                                 9
For this use case, we will use our dataset of Institutions         http://opendata.cz/ldvm/ovm.ttl
                                                                   our LDVM instance. In this list, we will choose the desired
                                                                   visualizer such as the Google Map visualizer. There we click


                                                                           Figure 14: All pipelines using this visualizer

                                                                   the list pipelines using this visualizer button (see Figure 14)
                                                                   and we see the list of pipelines that contain it. This tells
                                                                   us the data sources and their possible transformations using
                                                                   analyzers and transformers that result in data that can be
Figure 12: The chosen pipeline combining datasets                  visualized by the chosen visualizer.


displayed institution of public power is possible thanks to        7.       RELATED WORK
the schema that we included in the beginning. Note that the           More and more projects are focused on analyzing, explor-
compatibility checks so far have a form of a query that checks     ing and visualizing Linked Data. For a more complete survey
for a presence of certain class and property instances in a        of various Linked Data visualization tools see our previous
dataset. E.g. in this use case, we determine the presence of       paper [8]. Here, we will focus on the most recent approaches.
links to RUIAN by presence of properties which we created          With the LDVM vocabulary and our new implementation
for linking to RUIAN objects in the RUIAN vocabulary and           we aim at an open web-services like environment that is
we rely on creators of other datasets that they will use these     independent of the specific implementation of the LDVM
properties when linking their data to RUIAN. What we do            components. This of course requires proper definition of
not check for at this time is whether the links in one dataset     interfaces and the LDVM vocabulary is the base for that.
lead to existing or valid objects in the second dataset, because   However, the other approaches so far usually aim at a closed
that would require non–trivial querying.                           browser environment. Those are similar to our Payola [7]
                                                                   in their ability to analyze and visualize parts of the Linked
                                                                   Data cloud. They do not provide configuration and descrip-
                                                                   tion using a reusable vocabulary and they do not aim at
                                                                   a more open environment with their implementation that
                                                                   would allow other applications to reuse their parts. Recent
                                                                   approaches include Hide the stack [5], where the authors
                                                                   describe a browser meant for end-users which is based on
                                                                   templates based on SPARQL queries. Also recent is LD-
                                                                   VizWiz [1], which is a very LDVM-like approach to detecting
                                                                   categories of data in SPARQL endpoints and extracting basic
                                                                   information about entities in those categories. An lightweight
                                                                   application of LDVM in enterprise is described in LinDa [10].
                                                                   Yet another similar approach that analyzes SPARQL end-
                                                                   points to generate faceted browsers is rdf:SynopsViz [3]. In
Figure 13: Maps visualization of linked dataset with               [2] the authors use their LODeX tool to summarize LOD
filtering                                                          datasets according to the vocabularies used.
                                                                      The most relevant related work to the specific topic of a
   This means that if someone connected a dataset of Ger-          vocabulary supporting Linked Data visualization is Fresnel
man towns represented using our RUIAN vocabulary to the            - Display Vocabulary for RDF [9]. Fresnel specifies how a
RUIAN geocoder analyzer together with the Czech Institu-           resource should be visually represented by Fresnel-compliant
tions of Public Power, he would get a compatible pipeline.         tools like LENA 10 and Longwell 11 . Therefore, Fresnel vo-
However, this pipeline would not produce any usable data           cabulary could be perceived as a vocabulary for describing
on evaluation as the two input datasets simply do not have         LDVM visualization abstraction. This is partly because the
matching contents. We will look into possibilities of checking     vocabulary was created before the Linked Data era and there-
for these situations in our future work.                           fore focuses on visualizing RDF data without considering
                                                                   vocabularies and multiple sources.
6.3    What Data Can I Visualize Like This
  In this use case we will find datasets that are visualizable     8.       CONCLUSIONS AND FUTURE WORK
by a specific visualizer. It is in fact a simple filter of all
                                                                   10
known possible pipelines that contain this visualizer. For              https://code.google.com/p/lena/
                                                                   11
this use case we go to the list of available components in              http://simile.mit.edu/issues/browse/LONGWELL
   In this paper we defined use cases that aid Linked Data         [3] N. Bikakis, M. Skourla, and G. Papastefanatos.
experts in various stages of their work and showed how we              rdf:SynopsViz – A Framework for Hierarchical Linked
can realize them using our implementation of the Linked                Data Visual Exploration and Analysis. In V. Presutti,
Data Visualization Model (LDVM). The first use case was                E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, and
to easily show contents of a given dataset using LDVM                  A. Tordai, editors, The Semantic Web: ESWC 2014
components and mainly visualizers. The second use case was             Satellite Events, Lecture Notes in Computer Science,
to show for a given dataset with which other known datasets            pages 292–297. Springer International Publishing, 2014.
it can be combined to achieve a visualization. The third use       [4] J. M. Brunetti, S. Auer, R. Garcı́a, J. Klı́mek, and
case was to show which known datasets can be visualized                M. Nečaský. Formal Linked Data Visualization Model.
using a selected visualizer so that the expert can adjust his          In Proceedings of the 15th International Conference on
data accordingly. Then we briefly described LDVM and its               Information Integration and Web-based Applications &
vocabulary and implementation and our vision of LDVM                   Services (IIWAS’13), pages 309–318, 2013.
components as independent web services. Finally, we showed         [5] A.-S. Dadzie, M. Rowe, and D. Petrelli. Hide the Stack:
that using the LDVM implementation populated by LDVM                   Toward Usable Linked Data. In G. Antoniou,
components we are able to execute the defined use cases.               M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis,
   During our work we have identified multiple directions              P. De Leenheer, and J. Pan, editors, The Semantic
we should investigate further. When we evolve our LDVM                 Web: Research and Applications, volume 6643 of
implementation into a distributed system with components               Lecture Notes in Computer Science, pages 93–107.
as individual web services, many new opportunities will                Springer Berlin Heidelberg, 2011.
arise. We could be able to do load balancing, where we will        [6] J. Helmich, J. Klı́mek, and M. Nečaský. Visualizing
have multiple implementations running on multiple machines             RDF data cubes using the linked data visualization
able to process the same LDVM template and its instances.              model. In V. Presutti, E. Blomqvist, R. Troncy,
Also, the SPARQL implementations, while identical in prin-             H. Sack, I. Papadakis, and A. Tordai, editors, The
ciple, can be differentiated using various properties. One of          Semantic Web: ESWC 2014 Satellite Events - ESWC
those properties can be the actual SPARQL implementation               2014 Satellite Events, Anissaras, Crete, Greece, May
used as, from our experience, every implementation supports            25-29, 2014, Revised Selected Papers, volume 8798 of
SPARQL in a slightly different way or supports a slightly              Lecture Notes in Computer Science, pages 368–373.
different subset of it. Also, the same SPARQL query can run            Springer, 2014.
substantially faster on one implementation and substantially       [7] J. Klı́mek, J. Helmich, and M. Nečaský. Payola:
slower on another one, etc. Another direction to investigate           Collaborative Linked Data Analysis and Visualization
further is towards Linked Data Exploration – a process of              Framework. In 10th Extended Semantic Web
searching the Linked Data Cloud for datasets that contain              Conference (ESWC 2013), pages 147–151. Springer,
data that we can reuse. Our approach so far requires selecting         2013.
the dataset to investigate. However, that alone can be a non–
                                                                   [8] J. Klı́mek, J. Helmich, and M. Nečaský. Application of
trivial effort and using Linked Data Exploration we could
                                                                       the Linked Data Visualization Model on Real World
identify the datasets for LDVM processing based on some
                                                                       Data from the Czech LOD Cloud. In C. Bizer,
form of user requirements. Our closest goal of course is to
                                                                       T. Heath, S. Auer, and T. Berners-Lee, editors,
make our new LDVM implementation more user friendly and
                                                                       Proceedings of the Workshop on Linked Data on the
to develop a more presentable library of visualizers, analyzers
                                                                       Web co-located with the 23rd International World Wide
and transformers.
                                                                       Web Conference (WWW 2014), Seoul, Korea, April 8,
                                                                       2014., volume 1184 of CEUR Workshop Proceedings.
9.    ACKNOWLEDGMENTS                                                  CEUR-WS.org, 2014.
  This work was partially supported by a grant from the Eu-        [9] E. Pietriga, C. Bizer, D. R. Karger, and R. Lee.
ropean Union’s 7th Framework Programme number 611358                   Fresnel: A Browser-Independent Presentation
provided for the project COMSODE.                                      Vocabulary for RDF. In I. F. Cruz, S. Decker,
                                                                       D. Allemang, C. Preist, D. Schwabe, P. Mika,
                                                                       M. Uschold, and L. Aroyo, editors, The Semantic Web -
10.    REFERENCES                                                      ISWC 2006, 5th International Semantic Web
 [1] G. A. Atemezing and R. Troncy. Towards a                          Conference, ISWC 2006, Athens, GA, USA, November
     Linked-Data based Visualization Wizard. In O. Hartig,             5-9, 2006, Proceedings, volume 4273 of Lecture Notes in
     A. Hogan, and J. Sequeda, editors, Proceedings of the             Computer Science, pages 158–171. Springer, 2006.
     5th International Workshop on Consuming Linked Data          [10] K. Thellmann, F. Orlandi, and S. Auer. LinDA -
     (COLD 2014) co-located with the 13th International                Visualising and Exploring Linked Data. In Proceedings
     Semantic Web Conference (ISWC 2014), Riva del                     of the Posters and Demos Track of 10th International
     Garda, Italy, October 20, 2014., volume 1264 of CEUR              Conference on Semantic Systems - SEMANTiCS2014,
     Workshop Proceedings. CEUR-WS.org, 2014.                          Leipzig, Germany, 9 2014.
 [2] F. Benedetti, S. Bergamaschi, and L. Po. Online Index
     Extraction from Linked Open Data Sources. In A. L.
     Gentile, Z. Zhang, C. d’Amato, and H. Paulheim,
     editors, Proceedings of the 2nd International Workshop
     on Linked Data for Information Extraction (LD4IE),
     number 1267 in CEUR Workshop Proceedings, pages
     9–20, Aachen, 2014.

</pre>