=Paper= {{Paper |id=Vol-2977/paper3 |storemode=property |title=Interoperability and Integration: An Updated Approach to Linked Data Publication at the Dutch Land Registry (short paper) |pdfUrl=https://ceur-ws.org/Vol-2977/paper3.pdf |volume=Vol-2977 |authors=Alexandra Rowland,Erwin Folmer,Wouter Beek,Rob Wenneker |dblpUrl=https://dblp.org/rec/conf/esws/RowlandFBW21 }} ==Interoperability and Integration: An Updated Approach to Linked Data Publication at the Dutch Land Registry (short paper)== https://ceur-ws.org/Vol-2977/paper3.pdf
 Interoperability and Integration: An Updated Approach
  to Linked Data Publication at the Dutch Land Registry
        Alexandra Rowland1, Erwin Folmer2, Wouter Beek3, Rob Wenneker4
          1 Kadaster & University of Twente, 7500 AE Enschede, The Netherlands

                                  lexi.rowland@kadaster.nl
          2 Kadaster & University of Twente, 7500 AE Enschede, The Netherlands

                                  erwin.folmer@kadaster.nl
                 3 Kadaster & Triply, 1043 BP Amsterdam, The Netherlands

                                      wouter@triply.cc
                      4 Kadaster, 7311KZ Apeldoorn, The Netherlands

                                 rob.wenneker@kadaster.nl



       Abstract. Kadaster, the Dutch national Land Registry and Mapping Agency,
       has been actively publishing their base registries as linked (open) spatial data
       for a number of years. To date, a number of its base registers have been pub-
       lished as linked data and are publicly available. Increasing demand for these
       services and the availability of new linked data technologies has highlighted the
       need for new, innovative approach to linked data publication within the organi-
       zation in order to reduce the time and costs associated with publication. This
       new approach is novel both in its approach to dataset modelling and architec-
       ture implementation and forms part of Kadaster’s larger vision for the develop-
       ment of its Knowledge Graph.

       Keywords: linked spatial data, knowledge graph, semantic technologies, in-
       teroperability, semantic modelling.


1      Introduction1

The Dutch Cadastre, Land Registry and Mapping Agency, Kadaster
(www.kadaster.nl), is the authoritative source of information relating to administra-
tive and spatial data surrounding property and ownership rights in the Netherlands.
Kadaster maintains large registers including several key registers of the Dutch Gov-
ernment such as the Base Register for Addresses and Buildings (Dutch acronym:
BAG) and the Base Register Large-Scale Topography (Dutch acronym: BGT); both
of which are available as Open Data. The organisation actively publishes and main-
tains some of its geospatial assets as Linked (Open) Data and, as part of this effort,
and in the spirit of continuous innovation, two of these geospatial assets have now
been republished as Linked Open Data following a new approach. The intention of
this position paper is to outline the problem context which drove the design and im-


1 Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons Li-
cense Attribution 4.0 International (CC BY 4.0).
2


plementation of this approach within Kadaster, comment briefly on the effects this has
on organisational resources and the outline both the novelty and the architecture that
was used to publish these two datasets. The larger vision behind implementation of
this approach will also be discussed and illustrated in section 6.


2      Problem Context

Although several of Kadaster’s geospatial assets have been available as linked (open)
data for a number of years, the network effects of increased uptake in linked data
technologies on a broader scale has demanded that an updated, scalable approach to
publication be designed. Indeed, increased demand for linked data services is seen
both within Kadaster as part of internal innovation processes as well as part of its
service delivery to other organisations, both national and European. This demand,
coupled with the increasing availability and ongoing development of linked data tech-
nologies and standards and the potential this has for smarter, more efficient service
delivery, has been the driving force behind the innovation of linked data publication
within the organisation.
   Guiding the choices made around the technologies, tools and processes chosen for
this approach were several key concerns. Firstly, the datasets that were being trans-
formed are complex and the instance data itself voluminous. Based on user require-
ments of these datasets, this complexity needs to be preserved in any transformation.
Secondly, the transformation of these datasets was done in the context of a larger
vision (section 6) and time-efficiency was a key concern. This requires that the trans-
formation both happened both quickly and correctly based on a validation technique.
Thirdly, this concern for time-efficiency required that, wherever possible, existing
tools, libraries and standards were used; a practice in line with general architectural
principles. The approach outlined in this paper meets the demands and requirements
of this transformation in several ways.
   Firstly, this approach makes use of existing community libraries, building on top of
open source projects and, therefore, circumventing the need to develop custom, in-
house solutions. The use of SHACL, as outlined more extensively below, highlights
this reuse of existing community standards. In a similar line, this approach makes use
of existing commercial products where they are available in the interest of reducing
maintenance costs. Secondly, this approach applies a configuration-over-code princi-
ple which ensures that the same pipeline is applied to all linked data publication pro-
jects, only configuring components where necessary. Lastly, the implementation of all
the relevant components in this design is done with a streaming approach in mind. In
practice this means that all linked data models are as close to the source model as
possible and that the selected sources are able to support streaming functionality in
the interest of real-time data delivery.
   In the interest of grounding where the implementation of this approach saw con-
crete improvements over existing approaches, it is important to note that the BAG and
BGT registries were delivered by a small internal team within Kadaster in 9 and 5
weeks, respectively. These are relatively large and complex linked datasets with re-
                                                                                       3


gards to data model: each containing some 800 million and over 1 billion triples.
Where previous approaches could be lengthy, this approach highlights improved cost-
and resource-effectiveness, strengthening the business case for linked data within an
organisation such as Kadaster [1]. The sections that follow outline the concepts and
architecture which support this updated approach to linked data publication within
Kadaster; including the standards, technologies and choices made with regard to these
during the recent publication of two geospatial datasets.


3      Native Geospatial Sources

There are currently two geospatial assets maintained by Kadaster that have been
transformed and published as Linked Open Data using the approach detailed in this
paper. Firstly, the BGT was transformed and published in November 2020 and updat-
ed in the first quarter of 2021. This asset is a digital map of the Netherlands which
includes objects such as buildings, roads, bodies of water and railways. The model-
ling, updating and maintenance of this dataset is regulated by Dutch law. Secondly,
the BAG was transformed and published in February 2021. As the dataset name im-
plies, the dataset includes all buildings and addresses in the Netherlands as well as the
attributes associated with these, including house numbers, designations, main- and
side addresses. This dataset has a counterpart dataset, namely INSPIRE Addresses2,
that is published based on INSPIRE compliance requirements. Both base registries,
including information regarding API availability and querying possibilities, are avail-
able in the triple store managed by Kadaster’s Data Science Team
(https://data.labs.kadaster.nl).


4      Knowledge Model vs. Information Model

   The first of the new additions to Kadaster’s publication of linked data is the explic-
it distinction between the Knowledge Model and the Information Model, both com-
posing the larger linked data model. This separation reflects the fact that a linked data
model must be able to describe the meaning of the data to the outside world
(Knowledge Model), while at the same time describing the organisation-specific as-
pects (Information Model). This separation allows the Information Model to be opti-
mised towards organisation’s internal requirements (including models and processes
relating to an asset). At the same time, this also allows the Knowledge Model to be
optimised towards efficiently supporting external, community standards of publica-
tion required for discoverability and interoperability purposes (W3C, 2014). Since the
internal and external aspects are both important for Kadaster’s efforts in data publica-
tion, this new approach is better able to implement the organisational requirements for
linked datasets.
   Firstly, an Information Model for a given asset contains the specific and internal
information. This information includes the properties of the current information sys-

2 https://www.pdok.nl/introductie/-/article/adressen-inspire-geharmoniseerd-
4


tems being used in and around this asset, the organisation-specific rules relating to
this asset as well as the asset’s technical details. This Information Model is represent-
ed using Shapes Constraint Language (SHACL), which serves to constrain a given
model based on internally-defined rules and relationships for a given asset. Secondly,
a Knowledge Model for the same asset defines any generic and interchangeable
knowledge that is both important to retain within the organisation but which should
also be shared with others. This Information Model will also makes it easier to reuse
external linked data models with an organisation-specific context.
   As illustrated in Figure 1A and B, the Information and Knowledge Models are not
entirely independent of one another and, indeed, are actually mapped to each other
when defining and transforming the model. There are two variants to this mapping
process, one being the mapping of datatype properties across the two models and the
other being the mapping of object properties across the models. Both variants should
be completed over the course of a data model transformation into linked data.




    Fig. 1. Model mapping implementation of both object (A) and datatypes (B) properties.

   In the first variant (Figure 1A), the process is almost identical except for the fact
that SHACL node and property shapes defined for object properties are mapped to the
relevant OWL classes and object properties defined in the Knowledge Model. In the
second variant (Figure 1B), the datatype properties are defined by mapping the rele-
vant SHACL node and property shapes for each data type to the relevant OWL class,
datatype property and RDFS literal defined in the Knowledge Model.
   In an effort to support better validation of the resultant model, a key requirement in
the design of this approach, a SHACL validation step has also been applied to the
modelling process. This step ensures that the shapes for each object and datatype
property in the data model completely validates against the instance data and includes
a number of self-defined best practices with regards to the modelling of the Infor-
mation Model centred on the use of closed node shapes. These best practices are
based only on experience with the models being transformed in the context of Kadas-
ter but might be points of note for other applications of this approach. This ensures
that the Model is both as specific as is necessary to ensure that there is a meaningful
                                                                                        5


validation of the Model while still allowing correct, but rare, data instances to vali-
date.
    Firstly, the use of the appropriate Regex is required to ensure that, for example,
properties with type string are not returned empty, that display characters allow for
non-English standard characters and have a reasonable number of display characters
(see XML Schema Datatypes). Secondly, the definition of a label as a node shape (see
skos:label) might be needed to ensure that a label is returned in a number of languages
such as, for example, where a city has both a Dutch and an English name. Where this
is the case, a shape should be closed such that it is mandatory for both languages to
appear in the instance data for validation of the model. Lastly, for streaming ETL and
validation purposes, a self-contained record should include an extra triple which re-
lates the SHACL path to the SHACL node shape.


5        Design and Development of Supporting Architecture

   The process of converting relational data to linked data for an asset is completed in
a number of steps during the Extract, Transform and Load (ETL) process. This pro-
cess is illustrated in the architecture outlined in the figure below (Figure 2). The first
step loads the relational data from the source to a PostgreSQL database following a
Geography Markup Language (GML) indexing step. A GraphQL endpoint is then
used to access the data delivered through API following the delivery and validation of
the data model from the end user. In practice, this step is done by extending the
typedefs such that the objects in a data model are correctly described in GraphQL,
expanding resolvers in order to allow objects to be queried with the right parameters
and, finally, adding the required SQL queries to the relevant resolvers. Note that the
approach is not inherently limited to relational data sources, as a GraphQL endpoint
may also be able to deliver from other source types.
   Once the model is available as a GraphQL endpoint, it can be queried by the En-
hancer through a configuration process. Firstly, the enhancer has a set of predefined
queries with specific time and/or pagination parameters for each object such that the
object is delivered as an endpoint that the microservice can access for delivery of the
JSON-LD results. Secondly, a reference to the relevant location of the JSON-LD
context should be defined. This was done for each new dataset that went through this
ETL process. As is probably evident from the validation discussion in the previous
section, SHACL can be used both for the validation of the data model using example
data but also for validation of the transformed instance data using the data model.
Within this architecture, a SHACL validation step is required to ensure the data deliv-
ered from the enhancer is valid.
   Finally, the approach makes use of Apache Airflow 3 as a ‘handler’ which guides
the data through the entire ETL process. The microservice fetches data from the en-
hancer and repeats until all data is retrieved as JSON-LD. When all data is validated



3   https://airflow.apache.org/
6


and loaded into the Triple Store, which in this case is an instance of TriplyDB4, vari-
ous services can be instantiated, including ElasticSearch, a data browser, a SPARQL
endpoint for use in data stories. These can be instantiated within the interface of the
triple store itself. In the interest of better accessibility of the linked data models, the
data models for each base registry is also visualised using the Weaver 5 tool.




Fig. 2. Architecture supporting the ETL Process which delivers linked (spatial) data.


6        Vision for Geospatial Data Integration

While advancements in linked data technologies and standards, as well as increased
demand for these services, initiated the need for an updated approach to Kadaster’s
delivery of linked geospatial data, this approach is now also at the centre of Kadas-
ter’s ambition to deliver a Knowledge Graph (KG) [3]. The contents of the KG are the
linked base registries, the digital cadastral map as well as other datasets centred
around the theme of a building. The overall vision is to transform these datasets to
their linked data versions; keeping the linked data registration as ‘close to the source’


4 https://triplydb.com
5   https://kadaster.wvr.io/bag2-0/home
                                                                                                 7


as possible in terms of the data model itself while also supporting the improved reuse
and findability of Kadaster’s geospatial data.
   The KG is delivered by creating a further implementation layer on top of these LD
registrations based on schema.org specification relating to buildings; for which there
are two reasons. Firstly, by layering the vision in this way, provenance of the original
datasets is still available to the end user of the KG if necessary. Secondly, making use
of the schema.org specifications is done in the interest of reusing existing community
standards as well as in the interest of supporting external discoverability and interop-
erability. Access to the KG is planned to be delivered through REST, GraphQL, Geo-
SPARQL and ElasticSearch services wherein third party applications make use of
these in delivering geoinformation to the end user [4].


7      Conclusion

Kadaster, the Dutch National Land Registry, has recently implemented an updated
approach to linked data publication of their geospatial assets in response to growing
demand and the pressing need to innovate existing approaches to meet scalability
requirements. Building on existing experience with the publication of their base regis-
tries as linked data, Kadaster has made use of existing community technologies and
standards as well as available commercial products to define an approach which de-
livers LD assets in a timely, cost-efficient manner and with increased reusability
across projects. This approach forms part of a larger vision to deliver a knowledge
graph centred around the schema.org ‘Building’ theme where both this larger vision
and the principles applied to this central approach to transformation of the base regis-
tries is done in the interest of better geospatial data integration, interoperability and
discovery. Although innovative, Kadaster’s effort to improve geospatial findability
and linkability are not done in isolation and highlight a general need for better spatial
interoperability between (European) countries [5] and reusability of this data in vari-
ous contexts [6].


References
 1. Folmer, E., Ronzhin, S., Van Hillegersberg, J., Beek, W., Lemmens, R. Business Rationale
    for Linked Data at Governments: A Case Study at the Netherlands’ Kadaster Data Plat-
    form. IEEE. Access 8, 70822-70835, (2020).
 2. World Wide Web Consortium. Best Practices for Publishing Linked data. W3C Working
    Group Note. http://hdl.handle.net/10421/7479. (2014)
 3. Ronzhin, S., Folmer, E., Lemmens, R., Mellum, R., von Brasch, T. E., Martin, E., Romero,
    E. L., Kytö, S., Hietanen, E., Latvala, P. Next generation of spatial data infrastructure: les-
    sons from linked data implementations across Europe. International journal of Spatial Data
    Infrastructures Research, 14. 83-107. (2019).
 4. Rowland, A., Folmer, E., Beek, W. Towards Self-Service GIS-Combining the Best of the
    Semantic Web and GIS. ISPRS International Journal of Geo-Information 9(12). (2020).
8


    5. Ronzhin, S., Folmer, E., Maria, P., Brattinga, M., Beek, W., Lemmens, R., van’t Heer, R.
       Kadaster Knowledge Graph: Beyond the Fifth Star of Open Data. Information, 10(10).
       (2019).
    6. Bucher, B., Folmer, E., Brennan, R., Beek, W., Hbeich, E., Würriehausen, F., Rowland, L.,
       Maturana, R. A., Alvarado, E., Buyle, R. Spatial Linked Data in Europe: Report from Spa-
       tial Linked Data Session at Knowledge Graph in Action. (2021).