Linked Data City - Visualization of Linked
                   Enterprise Data

      Joachim Baumeister1,2 , Sebastian Furth1 , Lea Roth1 and Volker Belli1
           1
               denkbares GmbH, Friedrich-Bergius-Ring 15, 97076 Würzburg
                2
                  University of Würzburg, Am Hubland, 97074 Würzburg


        Abstract. A generic technique for the visualization of hierarchical struc-
        tures is introduced. The actual visualization is not only defined by the
        underlying data but also the application of domain-driven metrics. The
        paper shows two use cases for the analysis of linked enterprise data in
        the domain of technical service information systems.


1     Introduction
In the age of digitalization and automation of industries, many companies are
consolidating their business information systems and product meta-data, such
as ERP, CRM, file directories, and extranet data. In many cases, not all ele-
ments of these information resources are accessible to all relevant users. The
intransparent access hinders effective work processes and often threatens busi-
ness success. Therefore, the primary goal of many ICT projects is the linkage of
the existing information silos into an integrated information infrastructure. Here,
semantic technologies, and especially linked data models, are a successful enabler
for building such knowledge warehouses mediating the information silos. Linked
Enterprise Data [1] transfers the ideas and technologies of linked data [2] into
the much more restricted world of business and enterprises. Standard semantic
languages, such as RDF and SPARQL, are used to represent the core entities
of the enterprise. Useful de-facto standard vocabularies for the enterprise usage
already exist, see for instance SKOS [14] and GoodRelations [8]. Within a se-
mantic infrastructure, all information resources are uniformly and semantically
accessible by the user and novel services. In consequence, a number of advanced
applications with business added value become possible [13]:
    – Semantic enterprise search
    – Semantic B2B portal with standardized data exchange
    – Semantic assistants
    – Automated data quality and curation processes
    During the migration from the existing information structure to linked enter-
prise data, existing information sources need to be linked with semantic concepts.
Here, a toolbox of core technologies ranging from Natural Language Process-
ing/Information Extraction to Information Retrieval methods [4] is employed.
    In Figure 1 the semantification process of enterprise data is depicted [5]. Each
step of the process includes a detailed analysis:
    Enterprise Corpus


       Operations
       Manual #05                                                                              Token
                              Repair Manual #4711
                                                          Information Unit: Segment 1
                                                                                               necessary
      Repair Manual              Document                 The necessary components for
         #4711                     Document
                                    Info Unit #1          the transmission control such as
                                                          the gear selector switch, the
                                                                                               Term
                                                          electric ..
       Spare Parts
       Data Base                                                                               gear selector switch


   I Corpus Analysis    II Information Source Analysis   III Information Unit Analysis       IV Concept/Term Analysis


 Fig. 1. Simplified process for the semantification of enterprise information sources.


I Corpus Analysis The existing data is collected and described. All relevant
    information systems are analyzed with respect to the included information
    sources.
II Information Source Analysis The included information sources are ana-
    lyzed in more detail, for instance, with respect to the size and number of
    elements in the corpus.
III Information Unit Analysis Information sources are composed of infor-
    mation units, i.e., segments of a source that are not dividable anymore.
    Number and distribution of segments for information sources are relevant
    factors.
IV Concept/Term Analysis The use of ontological concepts in the informa-
    tion units is analyzed and improved by automated methods.

    During the process, the exploration and visualization of the data is of core
importance [11]. Since much of the data is generated during the process, the
manual inspection and evaluation of the results need to be supported. Visualiza-
tion methods help to understand existing deficiencies and motivate further pro-
cess steps. In this paper, we introduce the generic visualization method Linked
Data City that effectively supports the exploration and analysis of linked en-
terprise data during the semantification phase. The visualization of linked data
differs from general ontology visualization methods [3, 6, 9], since usually linked
data models exploit less relational structure but tend to be larger by orders-
of-magnitude. We emphasize that the method is general usable for hierarchic
structures in a way that it can be deployed very easily in different scenarios.
    The rest of the paper is organized as follows: First we introduce the core
components of a linked data city, namely buildings and (nested) districts. Then,
we describe the current implementation of the approach and demonstrate the
usefulness of the visualization method by examples. In the conclusions we show
promising steps for further work.
2   Linked Data Cities
The metaphor for visualizing linked data as a city is inspired by the work of
Wettel [15]. In the original approach, the code of software applications is visu-
alized as structures of a city. Figure 2 shows an example of a city visualization.
Classes of software code are a represented as buildings and code packages are
defining the districts of a city. Special properties of classes and packages are
communicated via color and size of the artifacts. Later this idea was adapted for
the visualization of the test coverage after the evaluation of knowledge bases [7],
where knowledge base elements are represented as buildings.


                                                          building with
                                                         different levels


                                                                                            city
     building


                                                                      district

                                                               nested
                                                             sub-district


                                                                                 district


                Fig. 2. Building, districts, and sub-districts of a data city.


    Complex artifacts with part-of or hierarchical relations can be naturally vi-
sualized as a city: As we see in Figure 2, core elements of the artifacts are usually
depicted as buildings of the city, that are grouped within districts. For deeper
part-of or hierarchy relations the districts can be nested in sub-districts. The
size of districts correlates with the number and sizes of the included buildings,
whereas the size and color of the buildings represent specific performance indi-
cators that are defined by the current analysis query. The specific configuration
of buildings, districts, sizes, and colors is called data city metric. At the top of
the figure we also see that buildings can have different levels. Different levels
of a building are commonly used to include different metric attributes in the
visualization.
   For humans it is easy to understand the city metaphor. Like in the real
world, local areas of the city are represented in districts and often the size of
houses corresponds to their weight. For large cities (real and artificial) the user is
familiar in incrementally explore the particular districts and buildings. For this
reason, an advanced visualization application need to allow for the interactive
exploration by drill-down and roll-up operations within the city.

3    Implementation
The presented visualization technique is implemented as a JavaScript library.
That way, it can be easily integrated into (web-based) knowledge engineering
tools but also runs as a stand-alone tool in a web browser. The definition of
the city itself is represented as a JSON document. The following example shows
books of a technical documentation that are represented as districts. Sections of
a book are represented as buildings contained in the district.
{
    " Label " : " Linked Data City 0815 " ,
    " Districts " : [
       {
         " Label " : " Technical Documentation " ,
         " color " : " # 297B48 "
         " Districts " : [
            {
               " Label " : " Repair Manual # 4711 " ,
               " color " : " # 848484 # "
               " Buildings " : [
                  {
                     " Label " : " Section # 4711.1 " ,
                     " color " : " # 0F2A65 "
                     " depth " : 2 ,
                     " height " : 1 ,
                     " width " : 2
                  },
                  {
                     " Label " : [ " Section # 4711.2a " ,
                                   " Section # 4711.2b " ] ,
                     " color " : [ " # 0F2A65 " , " # AF5C0B " ] ,
                     " height " :[ 1 , 3 ] ,
                     " depth " : 2 ,
                     " width " : 2
                  },
       ...
}
   The definition of the city is straight-forward, as we can see that districts can
be nested in other districts. A leaf district contains a collection of buildings in
a corresponding element “Buildings”. For the district “Repair Manual #4711”
two buildings are shown. The first building “Section #4711.1” is an example for
a simple building having only one level included. The second building shows two
labels, colors and heights representing the two levels of the building.


4    Case Study: Metrics for Linked Enterprise Data in
     Technical Service Information Systems
The architecture of a city is defined by the applied data city metric, i.e., the
definition of colors, sizes, and levels of the buildings and districts. In this section,
we demonstrate the approach by two use cases.
    The presented metrics were used in the context of an industrial semantifi-
cation project, where information sources of technical service information were
analyzed. Here, buildings of a city represent a special kind of elements of the
linked enterprise data. In enterprise systems the data refers to a domain-specific
ontology. For instance, machinery builders typically align their data (documenta-
tion, parts, 3D models, etc.) to an ontology of products, components and func-
tions, cf. [10, 12]. Enterprise information resources—document sections, parts,
and wiring diagrams—are usually annotated by one or more instances of this hi-
erarchy, for example a repair paragraph is annotated by the involved components
and influenced functions.
    In the following we present two basic metrics that investigate (a) the use
of the product structure within the available information resources and (b) the
availability of annotations in the information resources.

Use Case: Usage of Product Structure (UPS)
The product structure of an enterprise defines how products are organized in
different levels. This organization includes multiple hierarchies for representing
the relation of components and parts, but also for the functions of the product.
    The primary subject of the UPS analysis is the actual use of the elements
defined in the product structure. The use of the elements corresponds to an-
notations done with these elements included in the data of the investigated
information systems. The metric is applied to find out how well the product
structure is used in current enterprise information.
    In the visualization, leaf elements of the product structure are represented
by buildings and upper elements of the product structure are represented as
wrapping districts. The height of the buildings correlates with the number of
uses within all considered information sources. Higher buildings are thus used
more often.
    At the left side of Figure 3, a zoomed building representing the component
“Engine block” is shown. The building itself is located in the district with the
name “Engine”. We see that the building consists of multiple levels specializing
the location of occurrences. Here, the element was used most often in the resource
“doc#1” and the resource “3d#3”.
 Component
 “Engine block”


 Uses in doc#1


 Uses in doc#2


 Uses in 3d#3


 Uses in parts#4


Fig. 3. Example for the usage of a product structure for a selection of technical docu-
mentation.


    The visualization gives a very quick overview of actual application of a prod-
uct structure. Unused areas can be easily spotted as well as elements with heavy
use. Applied to a representative corpus of information resources, the visualiza-
tion method points to areas in the structure that need refinement; both for lazy
and frequent elements.
    An interactive visualization is appropriate for very deep hierarchical struc-
tures. Then, buildings do not necessarily represent leaf-elements of the hierarchy
but aggregated elements. Entering a building (e.g., by clicking on it) the build-
ing will drill-down the product structure and build a city visualization for all
sub-elements contained in the aggregated element.


Use Case: Corpus Annotation Frequency (CAF)

Besides the actual use of the product structure the annotation frequency of
the information resources is of prime interest. Usually, meta-data is attached to
the information units to formally describe the contents. This meta-data mainly
corresponds to elements of the product structure.
    For the metric CAF, the city visualization is created as follows: The informa-
tion sources in the corpus are represented as districts of the city, e.g., technical
documentation, spare parts catalog, or FAQ data base. Sub-elements of these
districts are further represented as nested sub-districts, e.g., a particular repair
manual contained in the technical documentation or the spare parts catalog of
a specific machine. Core information units are represented as buildings, for in-
stance, a specific chapter of a repair manual in the technical documentation.
The height of a building corresponds to its number of meta-data annotations; a
Fig. 4. Visualization of the annotations existing particular information units of a tech-
nical documentation for a specific machine.


buildings can can have more than one level when different types of meta-data
are included corresponding information unit. For instance, a chapter may include
annotations of a component hierarchy but also of the functional hierarchy.
    This visualization gives an overview of the corpus size and the existing anno-
tations. Less annotated areas can be easily spotted but also districts (information
sources, books types, etc.) with a high annotation quality. The results can help to
motivate which areas of the structures need to be used much more frequently in
information resources. With this knowledge, annotation initiatives (automated
or manual) can be motivated and precisely planned.


5    Conclusions

Recently, more and more business information systems are transformed to linked
enterprise data models. Appropriate visualization and exploration techniques
support the semantification process of enterprise data. In this paper we presented
Linked Data Cities, an interactive and generic method for the visualization of
hierarchical structures. The actual visualization is defined by the application of
a domain-specific metric. We introduced a number of metrics that showed its
usefulness in an industrial semantification project.
    In the future we are planning to improve the simplicity of the visualization
by drill-down techniques, where similar buildings are clustered in aggregated
building or districts. Then, even very large system structures can be (interac-
tively) explored. Furthermore, we are working on the automated linkage of the
city structure to existing linked data. Currently, scripts are used to transfer the
information into the city data notation (the shown JSON). In the future, the
automated transformation by SPARQL queries could be a possible simplification
of this process.
References
 1. Auer, S., Bühmann, L., Dirschl, C., Erling, O., Hausenblas, M., Isele, R., Lehmann,
    J., Martin, M., Mendes, P.N., van Nuffelen, B., Stadler, C., Tramp, S., Williams,
    H.: Managing the life-cycle of linked data with the LOD2 stack. In: Proceedings
    of International Semantic Web Conference (ISWC 2012) (2012)
 2. Berners-Lee, T.: Linked data (2009), http://www.w3.org/DesignIssues/
    LinkedData.html
 3. Fluit, C., Sabou, M., van Harmelen, F.: Ontology-based information visualization.
    In: Visualizing the Semantic Web, pp. 36–48. Springer (2006)
 4. Furth, S., Baumeister, J.: On the semantification of 5-star technical documentation.
    In: Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. pp.
    264–271 (2015)
 5. Furth, S., Baumeister, J.: Semantification of large corpora of technical documen-
    tation. In: Atzmueller, M., Oussena, S., Roth-Berghofer, T. (eds.) Enterprise Big
    Data Engineering, Analytics, and Management. IGI Global (2016), http://www.
    igi-global.com/book/enterprise-big-data-engineering-analytics/145468
 6. Geroimenko, V., Chen, C. (eds.): Visualizing the Semantic Web. Springer, 2 edn.
    (2006)
 7. Hatko, R., Baumeister, J., Puppe, F.: Coveragecity: Test coverage for clinical guide-
    lines. In: The 8th Workshop on Knowledge Engineering and Software Engineering
    (KESE2012) (2012), http://ceur-ws.org/Vol-949/kese8-01_02.pdf
 8. Hepp, M.: GoodRelations: An ontology for describing products and services offers
    on the web. In: Gangemi, A., Euzenat, J. (eds.) EKAW. Lecture Notes in Computer
    Science, vol. 5268, pp. 329–346. Springer (2008), http://dblp.uni-trier.de/db/
    conf/ekaw/ekaw2008.html#Hepp08
 9. Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., Giannopoulou, E.: Ontology
    visualization methods - a survey. ACM Comput. Surv. 39(4) (Nov 2007), http:
    //doi.acm.org/10.1145/1287620.1287621
10. Lin, J., Fox, M.S., Bilgic, T.: A product ontology. Enterprise Integration (1997)
11. Mader, C., Martin, M., Stadler, C.: Facilitating the exploration and visualiza-
    tion of linked data. In: Auer, S., Bryl, V., Tramp, S. (eds.) Linked Open Data—
    Creating Knowledge Out of Interlinked Data, pp. 90–107. Lecture Notes in Com-
    puter Science, Springer International Publishing (2014), http://dx.doi.org/10.
    1007/978-3-319-09846-3_5
12. Mohammad, N.N., Amin, I.M., Othman, R.M., Asmuni, H., Hassan, R., Kasim,
    S.: Design and implementation of product structure ontology. Ontology-Based Ap-
    plications for Enterprise Systems and Knowledge Management p. 246 (2012)
13. Oberle, D.: How ontologies benefit enterprise applications. Semantic Web 5(6),
    473–491 (2014)
14. W3C: SKOS Simple Knowledge Organization System reference: http://www.w3.
    org/TR/skos-reference (August 2009)
15. Wettel, R., Lanza, M.: Visualizing software systems as cities. In: Visualizing Soft-
    ware for Understanding and Analysis, 2007. VISSOFT 2007. pp. 92–99 (2007)