<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using the EU Big Data Test Infrastructure to Publish MITOS Public Service Descriptions as Linked Open Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Efthimios Tambouris</string-name>
          <email>tambouris@uom.edu.gr</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitris Zeginis</string-name>
          <email>zeginis@uom.edu.gr</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Matziaras</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikolaos Stefanidis</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafail Promikyridis</string-name>
          <email>r.promikyridis@uom.edu.gr</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantinos Tarabanis</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantinos Doutsos Oikonomou</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iraklis Varlamis</string-name>
          <email>varlamis@admin.grnet.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Claudia Bodino</string-name>
          <email>Mariaclaudia.BODINO@ec.europa.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Commission</institution>
          ,
          <addr-line>Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GRNET S.A. - National Infrastructures for Research and Technology</institution>
          ,
          <addr-line>7 Kifisias Street, 11523, Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Harokopio University of Athens</institution>
          ,
          <addr-line>70 Eleftheriou Venizelou Street, 17676, Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Hellenic Open University</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Macedonia</institution>
          ,
          <addr-line>156 Egnatia Street, 54636, Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The provision of public services is one of the main goals of public authorities worldwide. In this frame the availability of structured public service descriptions openly is very important since it can improve transparency and trust. The CPSV-AP specification has been proposed as a standard model to publish public service descriptions as Linked Open Data (LOD) and thus enhance their interoperability and integration with other data sources. The aim of this project is publish public service descriptions of the National Registry of Administrative Procedures (MITOS) in Greece as LOD based on CPSV-AP by leveraging the EU Big Data Test Infrastructure (BDTI). The methodology involves understanding the context, modelling public service descriptions, generating LOD, and publishing the data. The project architecture utilizes Apache Airflow for orchestration and Open Link Virtuoso for data storage and retrieval. Through various usage scenarios, the paper highlights the potential benefits of publishing public service descriptions as LOD.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BDTI</kwd>
        <kwd>MITOS</kwd>
        <kwd>LOD</kwd>
        <kwd>CPSV-AP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The provision of public services is one of the main goals of public authorities worldwide.
An important part of public service provision is the publishing of structured public service
descriptions through Public Service Description Catalogues (PSDCs) that can empower citizens,
contribute to open government, and promote trust between public administrations and citizens.</p>
      <p>
        Public service descriptions are usually based on underlying models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that enhance semantic
interoperability, transparency, accessibility, and eficiency in public service delivery. A public
service model developed by the EU is the Core Public Service Vocabulary Application Profile
(CPSV-AP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which aims to be a de facto standard model for publishing public service
descriptions as Linked Open Data and thus enhances their interoperability and integration with other
data sources (e.g., DBpedia).
      </p>
      <p>In Greece, the government has launched MITOS1 as the oficial National Registry of
Administrative Procedures, which currently contains descriptions of almost 4000 public services. The
data model underlying MITOS is compatible with CPSV-AP. MITOS provides access to its data
through an API however its data have not yet been published as Linked Open Data.</p>
      <p>The aim of the MITOS_LOD project presented in this paper is to publish MITOS public
service descriptions as Linked Open Data. Towards this direction we: i) apply an existing
framework for publishing PSDCs as LOD based on CPSV-AP and ii) leverage the EU Big Data
Test Infrastructure (BDTI) to pilot the developed solution.</p>
      <p>The remainder of this paper is structured as follows. Section 2 provides background
information on MITOS, LOD, and BDTI. Section 3 presents the project approach. Section 4 presents the
project architecture, section 5 provides a discussion and section 6 the main conclusions along
with directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background Information</title>
      <p>This section presents relevant background information on i) Public Service Description
Catalogues (PSDCs) including MITOS, i.e. the oficial Greek PSDC, ii) Linked Open Data (LOD)
and CPSV-AP that provide principles and vocabularies for publishing and connecting
structured public service data and iii) the EU Big Data Test Infrastructure (BDTI) which provides
cloud-based services for big data (including LOD) cases.</p>
      <sec id="sec-2-1">
        <title>2.1. Public Service Description Catalogues and MITOS</title>
        <p>Public Service Description Catalogues (PSDCs) are comprehensive repositories of standardized
descriptions of public services provided by governments or public agencies. These catalogues
contain detailed information about the services ofered, including their scope, objectives,
procedures, eligibility criteria, and any associated costs or fees. The purpose of PSDCs is to enhance
transparency, accessibility, and eficiency in public service delivery by providing consistent
information about available services. PSDCs enable users to easily find and understand the
services they need and facilitating interactions with government agencies.</p>
        <p>
          MITOS is the Greek oficial National Registry of Administrative Procedures [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. It allows the
modeling of existing administrative procedures of public services, as well as documenting the
required supporting documents in collaboration with the competent units of the entities. It
incorporates a mechanism for continuous updating the procedures to support the legislative
goal of continuous compliance and exclusive information on changes after any simplification.
The data model underlying MITOS is compatible with CPSV-AP, and most of the main entities
have URIs, however, the data are not yet published according to the LOD principles. MITOS
also ofers a publicly available API to provide data in JSON format.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Linked Open Data and CPSV-AP</title>
        <p>
          Linked Open Data (LOD)[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is an approach to publishing and connecting structured data on
the web, making it interlinked and easily accessible. It follows principles such as using URIs
to identify entities, employing RDF (Resource Description Framework) for representing data,
and utilizing HTTP for accessing data and metadata. LOD aims to enable data interoperability,
integration, and reuse across various domains and applications [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          CPSV-AP (Core Public Service Vocabulary Application Profile) is a specification developed
by the European Commission for describing public services in a standardized manner as LOD.
LOD and CPSV-AP ofer a powerful framework for representing, sharing, and integrating
public service-related data in a linked and interoperable manner. By leveraging LOD principles
and adhering to the CPSV-AP specification, organizations can enhance the discoverability,
accessibility, and usability of public service information [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          In this frame, the literature has proposed a relevant framework [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for publishing public
service descriptions as LOD using CPSV-AP. The framework was piloted based on data from the
Region of Epirus in Greece. A similar approach applied at the European level [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] transformed
information from European government portals into LOD. They mapped data from various
portals using a single model but faced challenges with manual efort and limitations of poorly
formatted data. Their experience highlights the importance of designing scalable solutions for
data transformation that prioritize interoperability across systems.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. The EU Big Data Test Infrastructure (BDTI)</title>
        <p>The Big Data Test Infrastructure (BDTI)2 is a ready-to-use and free analytics cloud platform
which aims to help European public administrations to experiment with data and derive insights
to support them in their decision-making. To achieve this goal it provides open-source data
science tools, completely deployed in the cloud, to support public administrations in their data
journey, from data collection, to data visualization through data analysis and data orchestration.</p>
        <p>
          More concretely, public administrations can easily apply for the initiative and benefit of a free
of charge cloud-based analytics infrastructure to experiment and to prototype solutions before
deploying them in the production environment on their own premises. The test environment
provided by the BDTI consists of several integrated open-source solutions, and the required
cloud infrastructure that includes virtual machines, analytics clusters, storage and networking
facilities. Among other BDTI provides services related to LOD management (namely Virtuoso)
and automation (namely Airflow). Many pilots have successfully helped national, regional and
local administrations to optimize their processes [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], to design better services and increase
transparency, improving the availability, quality and usability of public sector information.
        </p>
        <sec id="sec-2-3-1">
          <title>2https://big-data-test-infrastructure.ec.europa.eu/</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The MITOS_LOD project Approach</title>
      <p>
        The aim of the MITOS_LOD project is to publish MITOS public service descriptions as LOD
using CPSV-AP. The process followed is adopted by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and comprises the following steps.
      </p>
      <p>Step 1: Understanding the context, preparation, and Public Service (PS) selection. This step
involves grasping the context including geographical scope and current PS models. This
information has been presented in section 2. In addition, in the project there was no PS
selection; instead it was decided that all PS in MITOS would be transformed to LOD.</p>
      <p>Step 2: Reuse or definition of a URI design policy. In this regard, the EU advocates
adopting the following URI structure: http://domain/type/concept/reference. In the frame
of this project we used the domain "mitos.gov.gr:8890" while an example full URI is
https://mitos.gov.gr:8890/PublicServices/id/evidence/evidence14</p>
      <p>Step 3: PS descriptions modelling. In this step the Public Service descriptions of MITOS are
mapped to CPSV-AP concepts. Figure 1 below shows the subset of CPSV-AP that is used for
pilot purposes.</p>
      <p>Step 4: Linked data generation (and interlinking). This step involves converting CPSV-AP
compliant Public Service descriptions into linked data. The linked data generation is performed
by collecting MITOS data through an API and transforming them to RDF through a mapping
implemented in Python. More details are provided in section 4. An excerpt of the result is
depicted in Figure 2.</p>
      <p>Step 5: Linked data validation. During this stage, the generated linked data undergo validation
to identify any syntax errors and ensure alignment with the CPSV-AP schema.</p>
      <p>Step 6: Linked data publication (and interlinking). The linked dataset, such as the RDF/Turtle
ifle, is then uploaded to an RDF store (Virtuoso), making it accessible as an RDF graph via a</p>
      <sec id="sec-3-1">
        <title>SPARQL endpoint. The Virtuoso RDF store is provided through BDTI.</title>
        <p>Step 7: Linked data exploitation. The published linked dataset serves multiple purposes. It
aids in acquiring valuable insights into essential public service characteristics or aggregated
statistics related to public services. Some example exploration SPARQL queries are presented in
Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Architecture</title>
      <p>The project is developed based on a distributed architecture (Figure 4) that uses a multitude
of technologies for of data acquisition, transformation, storage and publishing. The main
architecture components are as follows:</p>
      <p>Big Data Test Infrastructure (BDTI): The BDTI plays a crucial role in this project by
providing a cloud-based environment and resources specifically designed for experimenting
with open-source big data tools and technologies, including Airflow and Virtuoso. This readily
available infrastructure eliminates the need for extensive setup and configuration, allowing for
rapid development and testing.</p>
      <p>Apache Airflow Orchestration: An instance of Apache Airflow is deployed within the
BDTI environment in order to orchestrate the entire data processing pipeline. It ofers a
userfriendly web interface for workflow visualization, allowing for real-time monitoring of task
progress.</p>
      <p>MITOS: The data acquisition process commences with interaction with the MITOS through a
call to its open API. The API of MITOS provides easy access to the available data in a structured
machine-readable way.</p>
      <p>CPSV-AP-Transformation to LOD: The core of the project is a series of Python scripts
that run through Airflow to transform the data retrieved through the API, based on a mapping,
to a format compliant with CPSV-AP. The transformation process results in RDF data using the
Turtle syntax.</p>
      <p>Open Link Virtuoso: The LOD data created are stored in the deployed Open Link Virtuoso
instance on the BDTI that ofers also a publicly available SPARQL endpoint.</p>
      <p>The pipeline for producing and publishing linked data contains the following steps:
1. Airflow orchestrator: The process begins with Apache Airflow orchestrating the entire
workflow by first calling the services responsible for data retrieval.
2. Retrieve Public Services IDs: The Python scripts send asynchronous HTTP GET
requests to the Mitos.gov.gr API to retrieve a list of IDs of the available public service.
3. Retrieve Public Services Descriptions: Using the retrieved IDs the Python script
requests through the MITOS API the full details of all public services using a separate
request for each ID.
4. Processor function: As the details are received, they are processed, structured, and
stored in CSV files organized by groups of informations i.e., the classes of CPSV-AP. For
example the groups include General information, Evidences, Requirements etc..
5. Transformation: The data read from the CSV files are converted into RDF format, based
on a mapping to CPSV-AP, using the Turtle syntax.
6. Importing to Virtuoso: With the data ready and correctly formatted, Apache Airflow is
setup to use the available Virtuoso API to import the data onto its RDF store.
7. Publish LOD: The data is published as Linked Open Data (LOD) for use on the web.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>This section describes some usage scenarios of the produced LOD for MITOS that highlight the
prospective benefits. The usage scenarios include:
1. Publishing PS descriptions as LOD provide citizens and businesses with structured and
standardized information about PS oferings, potentially improving discoverability and
streamlining the application process. This could ultimately enhance user experience and
promote the utilization of PS catalogs, leading to resource optimization (e.g., time, cost).
2. Utilizing LOD, service providers could create personalized recommendations for relevant
PSs based on individual user profiles and needs, leading to a more eficient and targeted
service experience.
3. LOD representations of PS descriptions empower policymakers to conduct in-depth
analyses and evaluate the potential impact of PS redesign or improvement opportunities.
This enables them to address complex questions such as the efects of abolishing specific
document requirements on a PS.
4. Publishing PS descriptions as LOD fosters transparency and accountability by making
PS-related information publicly accessible, open, shareable, and reusable. This approach
also facilitates more efective and eficient PS portfolio management, allowing for better
resource allocation and service optimization. Furthermore, it enables public organizations
to participate more readily in PS description federation, thereby enhancing interoperability
and collaboration with other public entities.
5. By leveraging LOD, public organizations can track and measure the performance of PSs,
enabling them to identify areas for improvement and optimize service delivery based on
datadriven insights.
6. The availability of LOD datasets containing PS descriptions presents an opportunity for
the IT industry to develop new value-added services and applications that take advantage
of this data.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>This paper demonstrates the feasibility and advantages of using the EU Big Data Test
Infrastructure to publish Public Service Descriptions from MITOS as Linked Open Data leveraging
the CPSV-AP specification. It can serve as a valuable reference for other public entities by
showcasing the practical application of BDTI for hosting public projects. Additionally, our
experience with BDTI also fostered a collaborative relationship, allowing us to provide valuable
feedback on the platform, ensuring its continuous improvement for future users.</p>
      <p>The proposed approach (including the model, process and architecture) to create LOD from
MITOS can easily be adapted and adopted by other public administrations that want to transform
and publish their PSDCs as LOD.</p>
      <p>Going forward, an important area for future development is linking MITOS linked data
with other data in the LOD cloud as well as creating user-friendly visualization tools for these
data. These tools can transform complex LOD datasets into readily understandable formats,
empowering a wider range of users to interact with and leverage public service data. Additionally,
customizable dashboards could be developed to provide policymakers with real-time insights
into service usage and performance metrics for data-driven decision-making. Another area for
future development is optimizing the LOD update process. Currently, a notable performance
improvement could be achieved by minimizing the number of calls to the PSD source. This
would reduce the load placed on their systems and enhance overall eficiency. One potential
approach involves implementing an incremental update strategy. This strategy would focus
on identifying and updating only the portions of the LOD dataset that have actually changed,
rather than retrieving and reprocessing the entire dataset on each update cycle.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The ready-to-use, free of charge analytics cloud stack infrastructure used to implement the pilot
has been provisioned by the Big Data Test Infrastructure Support Team from the European
Commission (Directorate General for Digital services, Unit B1, Data, Artificial Intelligence
&amp; Web) that provided guidance and assistance throughout the duration of the project. The
Big Data Test Infrastructure programme is part of the Digital Europe Programme (DEP). This
work has been partially supported by the GR digiGOV-innoHUB project (Grant Agreement no.
101083646), which is co-funded by the European Union under Competitiveness Programme
(ESPA 2021-2027).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gerontas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peristeras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tambouris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaliva</surname>
          </string-name>
          , I. Magnisalis,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tarabanis</surname>
          </string-name>
          ,
          <article-title>Public service models: a systematic literature review and synthesis</article-title>
          ,
          <source>IEEE Transactions on Emerging Topics in Computing</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>637</fpage>
          -
          <lpage>648</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <source>Core public service vocabulary application profile 3.2.0</source>
          ,
          <year>2024</year>
          . URL: https: //github.com/SEMICeu/CPSV-AP/tree/master/releases/3.2.0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] OPSI, Mitos - the national registry of administrative procedures. observatory of public sector innovation</article-title>
          .,
          <year>2024</year>
          . URL: https://oecd-opsi.
          <article-title>org/innovations/ mitos-the-national-registry-of-administrative-procedures/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <article-title>Linked data: The story so far, in: Semantic services, interoperability and web applications: emerging concepts</article-title>
          ,
          <source>IGI global</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Peristeras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , Linked open government data [guest editors' introduction],
          <source>IEEE Intelligent systems 27</source>
          (
          <year>2012</year>
          )
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zeginis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tarabanis</surname>
          </string-name>
          ,
          <article-title>An event-centric knowledge graph approach for public administration as an enabler for data analytics</article-title>
          ,
          <source>Computers</source>
          <volume>13</volume>
          (
          <year>2024</year>
          )
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gerontas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tambouris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lazopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tarabanis</surname>
          </string-name>
          ,
          <article-title>On using cpsv-ap to publish public service descriptions as linked open data</article-title>
          ,
          <source>Service Oriented Computing and Applications</source>
          <volume>16</volume>
          (
          <year>2022</year>
          )
          <fpage>231</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fragkou</surname>
          </string-name>
          , L. Maglaras,
          <article-title>Transforming points of single contact data into linked data</article-title>
          ,
          <source>Computers</source>
          <volume>11</volume>
          (
          <year>2022</year>
          )
          <fpage>122</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>BDTI</surname>
          </string-name>
          , Bdti success stories,
          <year>2024</year>
          .
          <article-title>URL: https://big-data-test-infrastructure.ec.europa.eu/ success-stories_en.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>