<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Semantic Catalogue for the Data Market Austria</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bernd-Peter Ivanschitz</string-name>
          <email>bernd.ivanschitz@researchstudio.at</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas J. Lampoltshammer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Mireles</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Revenko</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Schlarb</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L}orinc Thurnay</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIT Austrian Institute of Technology</institution>
          ,
          <addr-line>Gie nggasse 4, Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Danube University Krems</institution>
          ,
          <addr-line>Dr.-Karl-Dorrek-Str. 30, Krems an der Donau</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research Studios Austria</institution>
          ,
          <addr-line>Thurngasse 8/16, Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Semantic Web Company</institution>
          ,
          <addr-line>Neubaugasse 1, Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Data Market Austria (DMA) is an ecosystem of federated data and service infrastructures. It aims at making data from various data providers accessible and interoperable by allowing the submission, storage, management and dissemination of static datasets or streaming data services. By creating a metadata vocabulary, standardizing the ingest of data and ensuring the quality and completeness of metadata, it lays the ground to enable participants to share or consume datasets residing in di erent infrastructures. This demo focuses on the mapping services used in the DMA to standardize data from di erent sources using a modi ed version of the DCAT metadata schema. We present tools that enable inter organizational integration of datasets, in a manner that is both user-friendly and powerful enough to handle vast amounts of data.</p>
      </abstract>
      <kwd-group>
        <kwd>Metadata mapping</kwd>
        <kwd>semantic enrichment</kwd>
        <kwd>RDF</kwd>
        <kwd>distributed systems</kwd>
        <kwd>RML</kwd>
        <kwd>Metadata catalogue</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The amount of data produced every day is growing at breathtaking speed { data
has become an important asset that is of high importance in nearly every
industry sector worldwide [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Therefore, a healthy data economy and a successfully
functioning data-services ecosystem enable and ensure sustainable employment
and growth and thereby societal stability and well-being [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]. Several issues have
been identi ed as hindering the data economy in the Austrian case [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], among
them the lack of interconnection between di erent infrastructures hosting data
and data related services.
      </p>
      <p>The Data Market Austria (DMA)5 project addresses these problems by
developing the technological, infrastructural, regulatory, and economic
foundations for a comprehensive, innovation-supporting, sustainable Austrian
dataservices ecosystem. The technological foundation includes Blockchain technology</p>
    </sec>
    <sec id="sec-2">
      <title>5 https://datamarket.at/</title>
      <p>for provenance, smart contracts and security, interconnected clouds, data access,
constraint-preserving processing and analysis algorithms, semi-automated data
quality improvement, and recommender-based brokerage technology.
Additionally, two pilots in the areas of ICT for Mobility and ICT for Earth Observation
are being developed to demonstrate the rst usage scenarios of DMA.</p>
      <p>The DMA is a network of participating (or member) organizations that
contribute to the data market by o ering their products in form of datasets or
services to customers of the DMA. Each participating node must implement a
de ned set of services and mandatory standard interfaces. These are, for
example instances of a Data Crawler a Metadata Mapper, a Blockchain peer, and
Data Management and Storage components. Together with a common
conceptual model, these standard interfaces represent the basis of interoperability for
the use of datasets in the DMA.</p>
      <p>The gateway to this network of nodes containing data and providing services
is the DMA portal which, while not hosting any data or providing major services,
collects information from all nodes to keep an up to date catalogue of available
datasets. The focus of this demo is the design and implementation of this uni ed
catalogue.
2</p>
      <p>A Semantic Catalogue for a Data Market
Since the data in the DMA lies in a set of distributed repositories, it is necessary
to build a uni ed catalogue to enable end users to search all available data sets
and services. Furthermore, a single catalogue can be exploited for
recommendation, deduplication, and various metadata quality measures. In the DMA, the
creation of this uni ed catalogue is approached by creating i) a single metadata
standard for uni ed representation of data sets, including standardized
vocabularies for describing resources, ii) tools for facilitating the compliance of existing
metadata with the previous points and iii) the technological foundation for the
building and maintenance of the catalogue itself.</p>
      <sec id="sec-2-1">
        <title>Metadata standard</title>
        <p>The DMA metadata catalogue is based on DCAT-AP, the DCAT application
pro le for data portals in Europe6 and extends the schema for DMA use cases.
This standardization enables future cooperation with international data portals
and ensures that the DMA is easily accessible for cooperating companies with
a certain data quality standard. The DMA extension of the DCAT-AP, the
Data Maket Core Vocabulary (DMAV), provides more classes and properties for
describing datasets and services that are accessible on the DMA. The extension
focuses on the business use case of the DMA and adds predicates covering topics
like price modeling and dataset exchange, not present in the original
DCATAP catalogue. The dmav:priceModel predicate, for example, allows us to handle
the transaction fees for commercial datasets that are being made available in</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6 https://joinup.ec.europa.eu/release/dcat-ap-v11</title>
      <p>Semantic catalogue for the Data Market Austria
the DMA. The dmav:SLA (Service Level Agreement) class allows to model the
condition of a service contract in more details.</p>
      <p>In the DMA metadata catalogue, every dataset constitutes an RDF7 resource.
There is a set of predicates that link every resource to di erent literals, which
constitute the values of the metadata elds. These values can be of two types:
i) literals, as in the case of dcat:description or owl:versionInfo, or ii) elements of
a controlled vocabulary, as in the case of Language or License. These controlled
vocabularies, which are managed by PoolParty Semantic Suite8, enable
accurate search, ltering and linking of di erent datasets. Additionally, the DMA
includes a series of semantic enrichment services which automatically annotate
free-text elds (such as dcat:description or dcat:title) with elements of controlled
vocabularies.</p>
      <sec id="sec-3-1">
        <title>Tools for adoption of the metadata standards</title>
        <p>Since the DMA aims at making available data which was not originally produced
for commercialization, we must assume that the metadata describing it does not
comply to any particular standard. This is specially true because the data in
each node is managed by a di erent organization. Therefore, the conversion to
the uni ed metadata standard described above must be treated in a case by case
basis.</p>
        <p>
          The DMA provides two tools to facilitate this. The rst is a UI component
in which a node's administrator can upload a sample (in XML or JSON) of the
metadata they wish to make abailable in the DMA. They are then prompted to
select, for each of the metadata elds required by the DMA, which elds of their
metadata schema should be used. This UI tool, called the Metadata Mapping
Builder is, in essence, a user-friendly way to generate XPath and JSONPath
expressions. Once these expressions have been generated, they are arranged into
an RML[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] le, which is then used to produce RDF from similarly structured
XML or JSON les.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Catalogue compilation and maintenance</title>
        <p>Each node in the DMA that wishes to make a series of datasets available, must
implement the following work ow. First, the Data Harvesting Component, which
must be con gured by the node's administrator to nd the di erent datasets
within the node, sends the corresponding metadata les to the Metadata
Mapping Service, which uses the mapping le created as described above to generate,
for each dataset, a set of RDF triples (serialized in Turtle format).</p>
        <p>Afterwards, the dataset, its original metadata, and the corresponding RDF
are ingested into the Data Management component which takes care of the
packaging, versioning and assignment of unique identi ers to all datasets, whose</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7 https://www.w3.org/RDF/</title>
    </sec>
    <sec id="sec-5">
      <title>8 https://www.poolparty.biz/</title>
      <p>Ivanschitz et al.
hashes are furthermore registered in the Blockchain. Next The node's Data
Management component publishes, through a ResourceSync9 interface, links to
metadata les in RDF format of recently added or updated datasets. This way, the
node's metadata management is decoupled from the process of incorporating
metadata into the DMA catalogue.</p>
      <p>
        In the DMA's central node, the Metadata Ingestion component constantly
polls the ResourceSync interfaces of all registered nodes, and when new datasets
are reported, harvests their RDF metadata which, let us recall, already complies
with the DMA metadata vocabulary. This metadata is then enriched
semantically. The enrichment is based on EuroVoc10, which is used in DMA as the main
thesaurus. The NLP interchange format [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is used for annotations, which are
done in stand-o mode. The mapped and enriched metadata is then ingested
into the Search and Recommendation Services. The high quality of the
metadata and its compliance to the chosen scheme guarantees that the datasets and
service are discoverable by the users of DMA.
      </p>
      <p>With small variations, the processes described above are also used for
ingesting publicly available data from goverment portals as well as ingesting small
amounts of data that an individual would like to make available in the DMA.
Acknowledgements The Data Market Austria project is funded by the \ICT
of the Future" program of the Austrian Research Promotion Agency (FFG) and
the Federal Ministry of Transport, Innovation and Technology (BMVIT) under
grant no. 855404</p>
    </sec>
    <sec id="sec-6">
      <title>9 http://www.openarchives.org/rs/1.1/resourcesync</title>
      <p>10 http://eurovoc.europa.eu/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander</surname>
            <given-names>Sande</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>Rml: A generic language for integrated rdf mappings of heterogeneous data</article-title>
          .
          <source>In: LDOW</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Fernandez</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            ,
            <surname>Kiesling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Neuschmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Mizerski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sabou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Thurner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Propelling the potential of enterprise linked data in austria</article-title>
          .
          <source>roadmap and report (</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Brummer, M.:
          <article-title>Integrating nlp using linked data</article-title>
          .
          <source>In: International semantic web conference</source>
          . pp.
          <volume>98</volume>
          {
          <fpage>113</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Hochtl, J.,
          <string-name>
            <surname>Lampoltshammer</surname>
          </string-name>
          , T.J.:
          <article-title>Social Implications of a Data Market</article-title>
          . In: CeDEM17 - Conference for E-Democracy and Open Government. pp.
          <volume>171</volume>
          {
          <fpage>175</fpage>
          .
          <string-name>
            <surname>Edition</surname>
          </string-name>
          Donau-Universitat
          <string-name>
            <surname>Krems</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lampoltshammer</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scholz</surname>
          </string-name>
          , J.:
          <article-title>Open Data as Social Capital in a Digital Society</article-title>
          . In: Kapferer,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Gstach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sedmak</surname>
          </string-name>
          , C. (eds.)
          <source>Rethinking Social Capital: Global Contributions from Theory and Practice</source>
          , pp.
          <volume>137</volume>
          {
          <fpage>150</fpage>
          . Cambridge Scholars Publishing,
          <source>Newcastle upon Tyne</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Manyika</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bughin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobbs</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roxburgh</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Byers</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          :
          <article-title>Big data: The next frontier for innovation, competition, and productivity (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>