<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linked Environment Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Getting Things Connected</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Bandholtz</string-name>
          <email>thomas.bandholtz@innoq.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joachim Fock</string-name>
          <email>joachim.fock@uba.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Federal Environment Agency (UBA)</institution>
          ,
          <addr-line>Dessau-Roßlau</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>innoQ Deutschland GmbH</institution>
          ,
          <addr-line>Monheim am Rhein</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>After three years of discussion and early prototypes, the Federal Environment Agency (UBA), Germany, now has launched a two-year research &amp; development project on Linked Environment Data (LED) with innoQ Deutschland GmbH as a contractor. This project will set up a core cloud of environment data with a well-elaborated domain terminology as its semantic backbone. Data will be taken from the “Environmental Specimen Bank”, the “German Metadata Portal on Soil” and further databases such as the “Joint Substance Data Pool of the German Federal Government and the German Federal States” as well as the environmental library and research databases. The infrastructure will support a sustainable process of keeping the data permanently up-to-date, and there will be a dynamic and intuitive user interface. All the work will be fully Semantic Web compliant, based on vocabularies such as SKOS, SCOVO or Data Cubes, and Dublin Core.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Environmental protection</kwd>
        <kwd>domain terminology</kwd>
        <kwd>observation data</kwd>
        <kwd>linking open data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Networking among comprehensive observation data and domain terminology has
been a basic concern of the UBA since the 1990s with various project generations
(named UMPLIS, UDK, GEIN, SNS and PortalU). All these implementations so far
have two common weaknesses:
 The linkage established by these systems has connected data containers (data
bases, information systems, complex Web pages) but not individual data records.
 There was no shared data structure to be accessed for exploitation, so that every
link ended up so to say in front of the door of the referenced database, at best on a
Web page describing the respective data access.</p>
      <p>
        Linked Data, however, stands for linking individual data records that can be easily
dereferenced. Tim Berners-Lee has summarized the four principles already in 2006
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
      </p>
    </sec>
    <sec id="sec-2">
      <title>1. “Use URIs as names for things</title>
      <p>2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.”
In 2009 he added a “5 star rating” to make this more clear and to acknowledge the
Linking Open Data movement:
*
**
***
****
*****
“Available on the web (whatever format) but with an open license, to be
Open Data
Available as machine-readable structured data (e.g. excel instead of
image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
All the above plus, Use open standards from W3C (RDF and SPARQL)
to identify things, so that people can point at your stuff
All the above, plus: Link your data to other people’s data to provide
context”
Here we see that Linked Data has been envisioned without an explicit demand of
“openness” in mind, and actually Linked Data can be perfectly applied within closed
communities as well.</p>
      <p>
        The environmental authorities in Europe have a strong tradition of publishing open
data which has been expressed by the Aarhus Convention [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in 1998 and Directive
2003/4/EC on public access to environmental information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in 2003. So 1 and in
parts 2 and 3 star data has been provided by these authorities since years. While there
certainly is some remaining discussion about legal limitations of this openness, the
real input is the “Linked” aspect in this domain, which has been described more
indepth by Tom Heath and Chris Bizer in 2010 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The vision of Linked Environment Data came up at the eTerminology workshop
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] at the e-Envi conference in Prague in March 2009 and was elaborated during the
5th Ecoterm meeting [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in Rome in October of the same year.
      </p>
      <p>In 2010 the European Environment Agency made the General Environmental
Multilingual Thesaurus (GEMET)1 and the European Nature Information System
(EUNIS)2 available as Linked Open Data, followed by the Environmental
Applications Reference Thesaurus (EARTh)3 provided by Istituto Inquinamento Atmosferico
in Italy. In December there was a 2-day Ecoinformatics International Webinar on</p>
    </sec>
    <sec id="sec-3">
      <title>1 http://www.eionet.europa.eu/gemet/ 2 http://eunis.eea.europa.eu 3 http://uta.iia.cnr.it/earth_eng.htm</title>
      <p>Linked Open Data4. LED was also discussed by the W3C eGovernement Interest
Group5 and topic several conference contributions.</p>
      <p>In 2011, the German “Umwelt-Thesaurus” UMTHES6 has been published as
Linked Data as well, and a (strictly non-open) species taxonomy in the context of
substances approval. There was an early (open) Linked Data test-bed of the German
Environmental Specimen Bank (ESB)7 which was not deployed into production. The
yearly EnviroInfo8 conference hosted a full day session on „Linked Open Data,
Semantic Search and Interoperability“, and there will be a follow-up in 2012: “Linked
Environment Data – Getting Things Connected”.</p>
      <p>However, these early implementations have been rather scattered and have
dominant focus on domain terminology, not so much observation data. In a „Use Case
Crosslinking Environment Data and the Library“9 you can read about the German
contributions: “The most prominent obstacle is the lack of a dedicated funding for this
initiative. There are some projects of the participating systems that draw up some of
their budget for pieces of the puzzle, but there is no overall plan of the agency so far.”</p>
      <p>This use case drafts a scenario where observation and library data get cross-linked
among each other and with the domain terminology which has been seized by the
Linked Environment Data research &amp; development project (UFOPLAN 3712 12 100)
finally launched by the German agency by the time this is written.
2
2.1</p>
      <p>Strategic Issues of the LED Project</p>
      <sec id="sec-3-1">
        <title>Master Plan and Project Portfolio</title>
        <p>By end of 2012 there will be a master plan, inter-coordinated with all stakeholders,
which provides a strategic foundation beyond the borders of the two-year project.
There will be prioritised work packages, some of which may be implemented in 2012
as well.</p>
        <p>The overall portfolio will be highly dependent on how far the corresponding
projects can work on their interfaces themselves or have to delegate this to the LED
project. Currently we cannot make certain assumptions.</p>
        <p>In any case we aim for a - more or less comprehensive – pilot system (or pilot
cloud) which makes the aspired “added information value through interlinked data” a
real experience. Moreover there must be a demonstration of how the standardised
RDF interfaces and the LED workbench simplify the integration of further data.
4
http://projects.eionet.europa.eu/ecoinformatics/library/ecoinformatics_indicator/meeting_67122010
5 http://www.w3.org/egov/wiki/Linked_Environment_Data
6 http://data.uba.de
7 http://umweltprobenbank.de
8 http://www.ec-gis.org/Workshops/EnviroInfo2011
9http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Crosslinking_Environment_Data_and_
the_Library
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Project Infrastructure</title>
        <p>During the first month we will decide on the project infrastructure together with the
computer centre of the agency. It will consist of:
 Production system with man/machine interface (content negotiation)
 Triple store
 Registry based on the vocabulary of interlinked data sets (VoID)10
 Cross database data-recall client
 (geo-)graphic visualisation services
 Workbench with tools enabling RDF interfaces and data-linking
One special part of this infrastructure is iQvoc11, an open source terminology
management tool that we have developed jointly over the last two years.</p>
        <p>All this is glued together by a careful selection and extension of standardised RDF
vocabularies such as VoID, SKOS12, SCOVO13 or Data Cubes14 which are
“understood” and interpreted by the machine.</p>
        <p>The registry will know which participant uses which standard und can even
describe local extensions, so that code extensions are not necessary. Of course such
extensions have a limited freedom, which needs to be defined and communicated.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Integration and Extension of Existing Approaches</title>
        <p>The existing LED prototypes of the agency have to be aligned with the LED master
plan. They all include native methods for RDF data rendering and can synchronise
with a triple store incrementally. However, these methods have been developed and
need to be revisited, refactored, and extended. The same applies to the RDF formats
and the linkage.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Environment Specimen Bank (ESB)</title>
        <p>The Environmental Specimen Bank records the accumulation of (harmful) substances
in defined samples at certain locations and times. However the ESB itself is not
responsible for the comprehensive description of all relevant elements, so specialized
information should be referenced instead. For substances such data is provided by
GSBL, for species there is EUNIS, for locations and times SNS's geo thesaurus and
environmental chronicle, respectively. The environmental thesaurus (UMTHES)
provides an overarching envelope which is in turn linked with the international GEMET.
10 http://www.w3.org/TR/void/
11 https://github.com/innoq/iqvoc
12 http://www.w3.org/2004/02/skos/
13 http://vocab.deri.ie/scovo
14 http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html</p>
        <p>In the early test-bed the ESB data model was represented in SCOVO, but today we
consider the Data Cubes vocabulary which needs to be decided. Some extensions are
required to represent the domain-specific dimensions (specimen, analyte, location).
Each record in the ESB can link directly to the information from those specialized
systems. Ideally those provide a back-reference, enabling two-way navigation.</p>
        <p>In addition to the information systems mentioned so far, there are numerous
specialized systems operated independently from governmental agencies, e.g. Chemical
Entities of Biological Interest ChEBI15 or GeoNames16. Whether those should be
referenced is merely a matter of policy - the technical opportunity exists.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Semantic Network Service (SNS)</title>
        <p>SNS17 has been developed since 2001 based on ISO Topic Maps18 and the XML
Topic Maps interface. Unfortunately the Topic Maps community has rejected a fusion
with the Semantic Web which means we have to abandon their paradigm.</p>
        <p>SNS includes a thesaurus, a gazetteer, and a chronicle. The thesaurus has already
been implemented based on iQvoc, the Simple Knowledge Management System
(SKOS) and the complementing “Extension for Lables” (SKOS-XL). The gazetteer is
currently being implemented in a similar way, combining SKOS and the GeoNames
Ontology. The chronicle will have to follow, based on SKOS and the Linked Events
Ontology19.
2.4</p>
      </sec>
      <sec id="sec-3-6">
        <title>Data Lifting</title>
        <p>Most databases at the Agency are not able to render RDF, and many of them don’t
even have any defined interfaces like a Web Service or CSV export. We have to take
some examples and look for reference solutions for typical cases.</p>
        <p>One example should be the library metadata system which is also used to describe
research projects. This legacy system is not maintained anymore and may be replaced
in the future, possibly based on an RDF representation of the data. It provides a
classical OPAC interface, and this may be the key to access the data from outside.</p>
        <p>Another example is the already mentioned GSBL, which has a Web Service
interface to provide its Web client with the data, and it may provide LED as well.</p>
        <p>Currently under development is the Soil Metadata Portal which will include an
INSPIRE20 compliant Web Catalogue Service (CSW). This year’s INSPIRE
conference which will take place in Istanbul at the end of June will host a tutorial on
Geographical Linked Data21, and we will carefully observe the patterns presented there, as
15 http://www.ebi.ac.uk/chebi/
16 http://www.geonames.org/
17 http://www.semantic-network.de
18 http://isotopicmaps.org/
19 http://linkedevents.org/ontology/
20 http://inspire.jrc.ec.europa.eu/
21 http://datalift.org/en/node/21
implementing INSPIRE through Linked Data is not yet regulated (and in INSPIRE
everything has to be regulated).</p>
        <p>If there is absolutely no existing data interface we have to go down to the physical
data model and use D2RQ22, but most of the legacy data models are badly
documented and rather cryptic.
2.5</p>
      </sec>
      <sec id="sec-3-7">
        <title>Front End</title>
        <p>So far we will have millions (or even billions) of HTTP URIs that can be resolved in
RDF, we have links to be followed, and we have a SPARQL endpoint. This is not
enough to convince humans (and especially decision makers) of any added
information value – we need a human-oriented interface so they can explore the data and
visualize the results in tables, diagrams, and maps. This should be generic enough to
work on any data that conforms the supported standards (SKOS, SCOVO, etc.), but
should also specific enough to compete with the native user interfaces of the
integrated systems.</p>
        <p>Some of these systems have very elaborate interfaces dealing with all the
subtleness of their respective individual model, and we will have to leave some of this to
them. We cannot go into every individual detail, but we offer a transparent integration
point for all.</p>
        <p>As the registry knows all the properties and notably which properties link between
databases, it should be possible to demonstrate walk-throughs like starting with a
specimen in the ESB, look-up the GSBL about the characteristics of the observed
substance and then retrieve all the soil observation programs dealing with the same
substance and maybe share location with the ESB specimen. This is something that
has been envisioned by decision makers for many times but it never has come true.
2.6</p>
      </sec>
      <sec id="sec-3-8">
        <title>Sustainability</title>
        <p>In the domain of environmental protection sustainability is a strategic asset, and this
should also be valid in case of information systems. Many of the systems we are
talking about have been working over 10 years and more, and the outcome of LED should
be able to do the same.</p>
        <p>In parts this is an organisational matter that cannot be regulated by the LED
project, but the implementation can support easy continuation and evolution.</p>
        <p>Linked Data contributions that make data available once and then move over to the
next node will not survive. So the key issue is implementing self-updating interfaces,
either by direct life access to the native production data or by continuous incremental
one-way synchronisation into the LED triple store.</p>
        <p>The second key issue is the transparency of the integration work bench so that
further systems can be easily integrated even after the LED project has been completed.
22 http://d2rq.org/
The launch of a dedicated R&amp;D project by the German agency will raise the previous
LED initiatives to a new level by:
 implementing a national core cloud with links to the EEA terminology and nature
information system;
 developing sustainable integration patterns and tools;
 producing reusable software components that may be adopted by others.
 establishing a comprehensive reference terminology on the national level;
 providing an intuitive user interface on top of the most convenient RDF standards
(SKOS, SCOVO …);
 generating added information value by cross-database walk-through patterns.
As usual in research, we cannot anticipate the outcome in detail, and there may be
unpredictable ideas at any time during the contract period. Anyway, as the data is
provided by a governmental agency, LED will provide a reliable, always topical
information source to the public.
See also Web-links in footnotes on the previous pages.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data. W3C Design Issues</article-title>
          . (
          <year>2006</year>
          /9). http://www.w3.org/DesignIssues/LinkedData.html
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Convention on Access to Information,
          <article-title>Public Participation in Decision-making and Access to Justice in Environmental Matters" by the United Nations Economic Commission for Europe (UNECE)</article-title>
          . http://www.unece.org/fileadmin/DAM/env/pp/documents/cep43e.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <source>Directive</source>
          <year>2003</year>
          /
          <article-title>4/EC of the European Parliament and of the Council of 28 January 2003 on public access to environmental information</article-title>
          and
          <source>repealing Council Directive</source>
          <volume>90</volume>
          /313/EEC. http://europa.eu/legislation_summaries/environment/general_provisions/l28091_en.htm
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Linked Data: Evolving the Web into a Global Data Space. (1st edition)</article-title>
          .
          <source>Synthesis Lectures on the Semantic Web: Theory and Technology</source>
          ,
          <volume>1</volume>
          :
          <issue>1</issue>
          ,
          <fpage>1</fpage>
          -
          <lpage>136</lpage>
          . Morgan &amp;
          <string-name>
            <surname>Claypool</surname>
          </string-name>
          (
          <year>2011</year>
          ). http://linkeddatabook.com/editions/1.0/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bandholtz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schleidt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Summary of W4 eEnvironment Terminology</article-title>
          . In: Hřebíček,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hradec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Pelikán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Mírovský</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Pillmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Holoubek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Bandholtz</surname>
          </string-name>
          , T. (Eds.):
          <article-title>Proceedings of the European conference of the Czech Presidency of the Council of the EU TOWARDS eENVIRONMENT</article-title>
          .
          <article-title>Opportunities of SEIS and SISE: Integrating Environmental Knowledge in Europe</article-title>
          . Prague (
          <year>2009</year>
          ) http://www.e-envi2009.
          <source>org/SummaryTerminologyW4</source>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hodge</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Report on the Outcome of the Ecoterm</article-title>
          V Workshop,
          <string-name>
            <given-names>U.N.</given-names>
            <surname>Food</surname>
          </string-name>
          and Agriculture Organization, Rome 5-
          <issue>6</issue>
          <year>October 2009</year>
          . (
          <year>2010</year>
          ) http://projects.eionet.europa.eu/ecoinformatics/library/ecoinformatics_indicator/ecoterm_
          <fpage>5</fpage>
          -
          <lpage>6102009</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>