<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linked Open Data Infrastructure for Public Sector Information: Example from Serbia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valentina Janev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uroš Miloševiü</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Spasiü</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jelena Milojkoviü</string-name>
          <email>jelena.milojkovic@stat.gov.rs</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanja Vraneš</string-name>
          <email>sanja.vranes@pupin.rs</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mihailo Pupin Institute, University of Belgrade</institution>
          ,
          <addr-line>Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Statistical Office of the Republic of Serbia</institution>
          ,
          <addr-line>Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>To improve transparency and public service delivery, national, regional and local governmental bodies need to consider new strategies to openning up their data. We approach the problem of creating a more scalable and interoperable Open Government Data ecosystem by considering the latest advances in Linked Open Data. More precisely, we showcase how an integrated and coherent collection of aligned state of the art software tools, the LOD2 Stack, can be used to deliver trusted, open and rich collections of interlinked datasets to the public. The usage of the Tool Stack is demonstrated on the case of one of the largest data providers in the Republic of Serbia - its Statistical Office.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>linked open data</kwd>
        <kwd>open government data</kwd>
        <kwd>infrastructure</kwd>
        <kwd>tools</kwd>
        <kwd>public sector</kwd>
        <kwd>Serbia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In order to improve efficiency in the provision of public services, increase
transparency and interaction with citizens and society as a whole, but also create new businesses
and job opportunities, both local and national governments need to find better
strategies for delivering large amounts of trusted data to the public. The fact that the
European Commission is investing considerable amounts of finances to overcome this
problem is a strong indicator of its significance. As a direct example, consider the ISA
(Interoperability Solutions for European Public Administrations) program for the
period from 2010-2015 that has been assigned a budget of 164,1 million euros1. The
program enables “the delivery of electronic public services and ensures the
availability, interoperability, re-use and sharing of common solutions”2. To make government
data truly open (for use and re-use), and increase transparency, it needs to be
published in a non-proprietary, machine-readable format (e.g. RDF,
http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210).</p>
      <p>In this paper, we will show why Linked Data is considered a promising approach
to the above problem, and how the LOD2 Stack, a powerful set of software tools and
components, can be used to lower the cost of addressing the challenges of publishing
and integrating Open Government Data (OGD). The evaluation of the tools used in
the National Statistical Office use case workflow (see Fig. 1) will be given in section
2. Section 3 discusses the achieved results in the process of integration of Serbian
public data in the LOD cloud, with a special attention to the case of one of the largest
data providers in the Republic of Serbia – its Statistical Office (SORS).
1.1</p>
      <p>
        LOD2: The Project and the OGD Use Case
In the last few years the Linked Data paradigm has evolved as a powerful enabler for
the transition of the current document-oriented Web into a Web of interlinked Data
and, ultimately, into the Semantic Web. Aimed at speeding up this process, the LOD2
project ("Creating knowledge out of interlinked data", http://lod2.eu) partners have
delivered the LOD2 Stack, “an integrated collection of aligned state of the art
software components that enable corporations, organizations and individuals to employ
Linked Data technologies with minimal initial investments” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>One of the LOD2 objectives is to showcase the wide applicability of the LOD2
Stack for building public services for ordinary citizens of the European Union. As
partners of the LOD2 project, the Mihailo Pupin Institute’s team established the
Serbian CKAN,3 the first catalogue of this kind in the West Balkan countries, with a goal
of becoming an essential tool for enforcing business ventures based on open data in
this region. The RDF datasets cataloged with the Serbian CKAN (rs.ckan.net) are
periodically harvested and synchronized at an international level with the
PublicData.eu portal4 and integrated into the LOD cloud.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Evaluation of LOD Tools and Technologies</title>
      <p>
        The LOD2 Stack was evaluated for allowing governments and governmental agencies
to publish their data based on open standards. Requirements identified for the
National Statistical Office scenario [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] were grouped into the following types: Data
extraction and transformation, Domain-specific modeling, Data enrichment and
interlinking, Data storage, Exploration and analysis, and Data and Service administration.
Table 1 shows how the LOD2 Stack responds to these requirements.
      </p>
      <p>
        Vocabularies suitable for modeling statistical data in RDF format are the Data
Cube vocabulary [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which is fully compatible with the cube model that underlines
SDMX5, and VoID (Vocabulary of Interlinked Datasets,
http://www.w3.org/TR/void/), an RDF based schema used to describe linked datasets.
3 CKAN is a data catalogue system used by various institutions and communities to manage
open data.
4 PublicData.eu has been developed as a part of the LOD2 project.
5 SDMX (Statistical Data and Metadata eXchange),
http://code.google.com/p/publishingstatistical-data/wiki/Documentation.
In an attempt to adopt the LOD2 Stack for the Statistical Office of the Republic of
Serbia, over 100 datasets were extracted from the central statistics database
(http://webrzs.stat.gov.rs/WebSite/public/ReportView.aspx), transformed into RDF,
stored as RDF dump files on a local server (http://elpo.stat.gov.rs/lod2/) and
registered with the Serbian CKAN. The data includes statistics from the Prices, National
accounts, Usage of Information and Communication Technologies, and Science,
Technology and Innovation domains (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for more details). Performed activities
can be summarized as follows.
      </p>
      <p>
        Metadata Management. The statistics published by National Statistical Offices or
Eurostat are organized by theme, presented in aggregate form by using a wide range
of standard metadata (code lists). In the SORS Use case, a knowledge model was built
where standard code lists were modeled using the SKOS vocabulary [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The model
(http://lod2.poolparty.biz/) currently incorporates 12 concept schemas including the
NACE (revision 1 and revision 2), COICOP, and SITC (revision 4), as well as other
schemas used in SORS statistical publications, such as geographical, time and
statistical areas code lists. In order to formalize the conceptualization of the National
accounts domain, for instance, the ESA 95 (European system of accounts ESA,
http://circa.europa.eu/irc/dsis/nfaccount/info/data/ESA95/en/titelen.htm) was used. In
governmental organizations, the metadata management activity is carried out by users
with administration permissions (depicted in Fig.1). Using Silk and LODGefine
(http://code.zemanta.com/sparkica/) some of the code lists were interlinked with
DBpedia and Eurostat code lists.
      </p>
      <p>The Serbian CKAN. The Serbian CKAN portal is deployed on a server with the
following characteristics: Intel® Xeon® CPU 5140, dual core @ 2.33GHz 8GB
RAM, Ubuntu 11.04, with kernel version: 2.6.38-12. The CKAN software was fully
translated to Serbian, enabling support for two character sets (Latin and Cyrillic).
Furthermore, a large number of dataset relationships have been defined, making the
CKAN browsing and navigation experiences more comfortable. The Serbian CKAN
is currently maintained by the Mihailo Pupin Institute’s team.</p>
      <p>The SORS LOD Cloud. The SORS statistical data in XML form was passed as input
to the XSLT processor and transformed into RDF using the aforementioned
vocabularies (RDF Data Cube, SDMX-RDF, SKOS, Dublin Core Terms, VoID) and
developed concept schemes. The VoID definition of the SORS LOD dataset is given in
Fig.2. The SORS dataset (87.968 triples, see http://stats.lod2.eu/serbia) was also
uploaded to the LOD Cloud Cluster knowledge store under the graph name
http://elpo.stat.gov.rs/lod2/.
x
x
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and Outlook</title>
      <p>This paper contributes to the understanding of the LOD2 tools and technologies and
discusses their use for publishing and consuming public sector information through
the SORS Use case. The main lessons learnt from this study are:</p>
      <p>The Data Cube RDF vocabulary is mature enough to be used for publishing
statistical data as it improves interoperability and allows comparison of data from
different statistical sources.</p>
      <p>The LOD2 Stack provides a wide range of data transformation, enrichment and
exploitation tools. However, advanced tools for analysis and visualization of
statistical data are still under development.</p>
      <p>For publishers who currently only offer static files, Linked Data offers a flexible,
non-proprietary, machine-readable means of publication that supports an
out-ofthe-box web API for programmatic access.</p>
      <p>The Serbian CKAN increases the visibility and accessibility of Serbian public
sector data</p>
      <p>We conclude that adoption of LOD2 tools and technologies leads to establishment of an
interoperable Open Government Data ecosystem. Future work will include an analysis of
the LOD2 Stack components for building custom applications for different LOD
stakeholders.</p>
      <p>Acknowledgements. The research presented in this paper is partly financed by the
European Union (FP7 LOD2 project, Pr. No: 257943), and partly by the Ministry of
Science and Technological Development of Republic of Serbia (SOFIA project, Pr.
No: TR-32010). The Linked Open Data example was realized through close
cooperation with the Statistical Office of the Republic of Serbia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frischmuth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deblieck</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Facilitation the publication of Open Governmental Data with the LOD2 Stack</article-title>
          . Share-PSI workshop, Brussels. Retrieved from http://share-psi.eu/papers/LOD2.pdf (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vraneš</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spasiü</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miloševiü</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Establishment of the Serbian CKAN</article-title>
          .
          <source>LOD2 Deliverable 9.5</source>
          .1,
          <string-name>
            <given-names>Institute</given-names>
            <surname>Mihajlo Pupin</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cyganiak</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tennison</surname>
            <given-names>J.</given-names>
          </string-name>
          :
          <source>The RDF Data Cube vocabulary (July</source>
          <volume>14</volume>
          .
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>50 Billion plus Triple LOD Cloud Hosted on the LOD2 Knowledge Store Cluster</article-title>
          .
          <source>LOD2 Deliverable 2.1.3</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>