<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Green Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julia Hoxha</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anisa Rula</string-name>
          <email>anisa.rula@disco.unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Basil Ell</string-name>
          <email>basil.ellg@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Informatica Sistemistica e Comunicazione, Universita degli Studi di Milano-Bicocca</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute AIFB, Karlsruhe Institute of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We here present a vision of what needs to be addressed when designing and publishing linked data on the Web. Our approach aims at reducing the amount of incorrect, irrelevant, or redundant content { which can also be seen as pollution in the Web of Data { when publishing linked data. At the foundation lie the design principles adapted from green engineering.We envision a holistic framework that evaluates, along these principles and their respective assessment metrics, datasets from publishers and allows con guration of new validation tools.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The rapid growth of the Web of Data has contributed to the creation of large
amounts of linked data that often results in low quality content. For this reason,
it is important to investigate the problem of pollution from which the linked data
environment may su er. Pollution refers in our case to incorrect, irrelevant, or
redundant content, which aggregates low value to users and services that
consume these data. Examples include broken links, ambiguous use of owl:sameAs,
redundant de nition of vocabularies, multiple URIs for the same resource in a
dataset, complex vocabularies that cannot be e ciently reused,
uncomprehensible data, unaccessible data, non-maintained data, etc.</p>
      <p>
        We approach the eld of green engineering, since it has long been involved
with quality assurance when designing materials, processes and systems that are
benign to the environment. Moreover, this eld o ers an ecological perspective
when discussing the problems encountered in linked data publishing. At the
foundation of our approach lie the fundamental principles of green engineering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which we adapt for the linked data setting. We aim at providing a vision of what
needs to be addressed when designing and publishing linked data, in order to
minimize pollution in the Web of Data, increase reuse and achieve sustainability.
To concretize this vision, we introduce a framework that applies the principles
with measurable aspects to evaluate how green the datasets from publishers are.
      </p>
      <p>
        The issue of quality on the Web of Data has been addressed along aspects
such as syntax errors and inconsistencies in datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], link discovery and
maintainance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], quality and trustworthiness assessment based on provenance
information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], etc. Our approach is not complementary to these works, rather
encompasses and aligns them to our principles. The framework that we introduce
is holistic and based on green engineering aspects, which can be extended with
new measures and validation tools for higher quality of the published data.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Green Linked Data Principles</title>
      <p>
        We introduce each principle with a short description and assessment measures,
which are partly contribution of Web community [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Basic resources related to
the Web of Data include vocabularies, datasets, RDF links, and URIs. The
principles are non-orthogonal, therefore some measures occur in di erent principles.
      </p>
      <sec id="sec-2-1">
        <title>Principle 1. Inherent rather than circumstantial</title>
        <p>Ensure that data are as inherently benign as possible
Benign refers to data that maximize the qualities in which the publishers and
consumers are interested. Publishers are interested that their data is consumed,
i.e. data is 1)accessible 2)understandable by consumers and 3)meet their demand.
Dimension Measures</p>
      </sec>
      <sec id="sec-2-2">
        <title>Principle 2. Prevention Instead of Treatment</title>
        <p>It is better to prevent waste than to treat or clean up after it is formed.
Publishers should strive to produce data with "zero-waste", which in the Web
of Data results from the lack of use or consumption, i.e. consumers (human and
machines) are unable to e ectively exploit published data for bene cial use.</p>
        <p>Dimension Measures</p>
      </sec>
      <sec id="sec-2-3">
        <title>Principle 4. Design for Separation</title>
        <p>Modularization operations should be a component of the design process
Engineering large monolithic ontologies leads to artifacts that can rarely be
reused, due to tting to the design requirements. Modularization helps solve
this challenge using instead a set of micro-ontologies, therefore increasing
opportunities for the reuse of the developed artifacts.</p>
        <p>Dimension Measures</p>
      </sec>
      <sec id="sec-2-4">
        <title>Principle 5. Maximize E ciency</title>
        <p>Design datasets in order to maximize e cient exploitation
Published linked data should allow consumers to search, query and browse them
achieving required results with minimum e ort and time.</p>
        <p>Dimension Measures</p>
      </sec>
      <sec id="sec-2-5">
        <title>Principle 6. Output-Pulled Versus Input-Pushed</title>
        <p>Bringing content and publishing rate in line with demand
Publishers should have possible consumers in mind when designing their data.To
this aim, they should cover user needs providing only the necessary resources.
Dimension Measures</p>
      </sec>
      <sec id="sec-2-6">
        <title>Principle 7. Conserve Complexity</title>
        <p>When making design choices, publishers should strive to reuse a complex
ontology or dataset as it is, instead of recycling i.e. extracting parts of it and
modifying them for further use. Complexity should be viewed as an investment
for reuse.
Dimension</p>
        <p>Measures</p>
      </sec>
      <sec id="sec-2-7">
        <title>Principle 8. Meet Need, Minimize Excess</title>
        <p>Design for unnecessary capability or capacity solutions should be considered a
design aw
Publishers should try to provide datasets that meet the necessary capabilities,
with no excessive details, while "`one size ts all"' solutions are a design aw.</p>
        <p>Dimension Measures</p>
      </sec>
      <sec id="sec-2-8">
        <title>Principle 9. Design for Afterlife</title>
        <p>Design for performance in a commercial afterlife
It is necessary to provide updates and maintainance after the planned end of life
of the data. To reduce waste, components that remain functional and valuable
can be recovered for reuse and/or recon guration.</p>
        <p>Dimension Measures</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Green Linked Data Framework</title>
      <p>In this section, we introduce an envisioned framework, which is a Web platform
addressing three main groups of visitors 1) those who want to learn about linked
data and the green approach, 2) publishers that wish to check their linked data
before publishing them online, and 3) software developers who can contribute
with validators that check particular measures pertaining to the principles.</p>
      <p>We have initiated the implementation of this framework online1, aiming to
make it a future point of reference for the users of the Web of Data. Through the
1 http://www.greenlinkeddata.org
introduction of the green principles and the dimensions in which they expand,
as well as via the further enrichment of the website with materials and related
links, we aim to raise the concern among these users about the importance of
the quality of linked data published online.</p>
      <p>Besides its informative nature, the platform aims at enabling users to make
concious decisions about the data they need to publish, and most importantly
help them evaluate how these data conform to the green principles. Therefore,
for the publishers the framework o ers the possibility to automatically check the
datasets or vocabularies they want to publish based on the measures de ned.
A publisher may choose to check its data towards one or several principles and
dimensions (Fig. 1).</p>
      <p>
        The evaluation of the data will be done through validators which will consist
of open source or o -the-shelf algorithms o ered in the Web of Data community,
as well as new validators (e.g. to check comprehensibility) that we are
implementing. A more interesting feature of the framework is the possibility provided
to software developers to submit their validators, for example as Web services.
For example one measure for the comprehensability of a dataset is the
labeling completeness metric LClp where lp is a set of labeling properties such as
rdsf:label. This metric evaluates the ratio of non-information resources for
which at least one label is de ned [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>There is also the possibility to suggest new dimensions and respectively
contribute with appropriate validators. Thus, our goal is to provide an open
framework, where Web users not only contribute with validators, but also with new
ideas, materials and tools. Furthermore, the platform allows adding to each
principle in the website new comments that may consist of, but are not restricted to,
best practices, bene ts or even di culties they have had when dealing with those
aspects. They may also contribute with suggestions on how to extend dimensions
and measures of that principle.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>At the foundation of this approach lie green engineering principles, which we
have transfered to linked data publishing. In contrast to the physical artifacts
addressed in the original approach, we deal with data that represent immaterial
artifacts. The fundamental di erences between these two types of artifacts have
necessarily been taken into account.</p>
      <p>Physical artifacts are subject to decay and abrasion in consequence of usage.
They cannot easily be duplicated or distributed, and possess the property of
excludability. Since material goods are naturally scarce, this can lead to rivalry.
In contrast, data are immaterial, thus can be easily duplicated and distributed,
without being subject to decay. While porting the principles to the linked data
setting, we have extraced only 9 of the original 12 principles, excluding e.g.
those dealing with renewable type of resources, infrastructure used to create and
provide the data, or discussion on green energy consumption.</p>
      <p>In our future work, we will focus on extending the principles with other
measures and bringing to life via further development the envisioned framework.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Quality criteria for linked data sources</article-title>
          , http://sourceforge.net/apps/mediawiki/trdf/ index.php
          <article-title>?title=quality-criteria-for-linked-data-</article-title>
          <string-name>
            <surname>sources</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Anastas</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zimmerman</surname>
          </string-name>
          ,
          <article-title>Design through the 12 principles of green engineering</article-title>
          , Engineering Management Review, IEEE,
          <volume>35</volume>
          (
          <year>2007</year>
          ), p.
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Simperl,</surname>
          </string-name>
          <article-title>Labels in the Web of Data</article-title>
          ,
          <source>in Proceedings of the 10th International Semantic Web Conference (ISWC2011), Lecture Notes in Computer Science</source>
          , Berlin / Heidelberg, 2011, Springer.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Happel</surname>
          </string-name>
          ,
          <article-title>Semantic need: guiding metadata annotations by questions people #ask</article-title>
          , in Proceedings of ISWC'
          <fpage>10</fpage>
          - Volume
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , Berlin, Heidelberg,
          <year>2010</year>
          , SpringerVerlag, pp.
          <volume>321</volume>
          {
          <fpage>336</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <source>Provenance Information in the Web of Data</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <article-title>Weaving the pedantic web</article-title>
          ,
          <source>in 3rd International Workshop on Linked Data on the Web (LDOW2010)</source>
          ,
          <source>in conjunction with 19th International World Wide Web Conference</source>
          , CEUR,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          , E. Meij, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>Investigating the semantic gap through query log analysis</article-title>
          ,
          <source>in Proceedings of the 8th International Semantic Web Conference, ISWC '09</source>
          , Berlin, Heidelberg,
          <year>2009</year>
          , Springer-Verlag, pp.
          <volume>441</volume>
          {
          <fpage>455</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Volz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke</surname>
          </string-name>
          , and G. Kobilarov,
          <article-title>Discovering and Maintaining Links on the Web of Data, in The Semantic Web - ISWC</article-title>
          <year>2009</year>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Feigenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , E. Motta,
          <article-title>and</article-title>
          K. Thirunarayan, eds., vol.
          <volume>5823</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2009</year>
          , ch. 41, pp.
          <volume>650</volume>
          {
          <fpage>665</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>