<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Semantic Catalogue for the Data Market Austria</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Bernd-Peter</forename><surname>Ivanschitz</surname></persName>
							<email>bernd.ivanschitz@researchstudio.at</email>
							<affiliation key="aff0">
								<orgName type="institution">Research Studios Austria</orgName>
								<address>
									<addrLine>Thurngasse 8/16</addrLine>
									<settlement>Vienna</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thomas</forename><forename type="middle">J</forename><surname>Lampoltshammer</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Danube University Krems</orgName>
								<address>
									<addrLine>Dr.-Karl-Dorrek-Str. 30</addrLine>
									<settlement>Krems an, Donau</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Victor</forename><surname>Mireles</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Semantic Web Company</orgName>
								<address>
									<addrLine>Neubaugasse 1</addrLine>
									<settlement>Vienna</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Artem</forename><surname>Revenko</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Semantic Web Company</orgName>
								<address>
									<addrLine>Neubaugasse 1</addrLine>
									<settlement>Vienna</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sven</forename><surname>Schlarb</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">AIT Austrian Institute of Technology</orgName>
								<address>
									<addrLine>Giefinggasse 4</addrLine>
									<settlement>Vienna</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lőrinc</forename><surname>Thurnay</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Danube University Krems</orgName>
								<address>
									<addrLine>Dr.-Karl-Dorrek-Str. 30</addrLine>
									<settlement>Krems an, Donau</settlement>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Semantic Catalogue for the Data Market Austria</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">78009B36514BA6AD0B6AF4251F35D55C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Metadata mapping</term>
					<term>semantic enrichment</term>
					<term>RDF</term>
					<term>distributed systems</term>
					<term>RML</term>
					<term>Metadata catalogue</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Data Market Austria (DMA) is an ecosystem of federated data and service infrastructures. It aims at making data from various data providers accessible and interoperable by allowing the submission, storage, management and dissemination of static datasets or streaming data services. By creating a metadata vocabulary, standardizing the ingest of data and ensuring the quality and completeness of metadata, it lays the ground to enable participants to share or consume datasets residing in different infrastructures. This demo focuses on the mapping services used in the DMA to standardize data from different sources using a modified version of the DCAT metadata schema. We present tools that enable inter organizational integration of datasets, in a manner that is both user-friendly and powerful enough to handle vast amounts of data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The amount of data produced every day is growing at breathtaking speed -data has become an important asset that is of high importance in nearly every industry sector worldwide <ref type="bibr" target="#b5">[6]</ref>. Therefore, a healthy data economy and a successfully functioning data-services ecosystem enable and ensure sustainable employment and growth and thereby societal stability and well-being <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. Several issues have been identified as hindering the data economy in the Austrian case <ref type="bibr" target="#b1">[2]</ref>, among them the lack of interconnection between different infrastructures hosting data and data related services.</p><p>The Data Market Austria (DMA) <ref type="foot" target="#foot_0">5</ref> project addresses these problems by developing the technological, infrastructural, regulatory, and economic foundations for a comprehensive, innovation-supporting, sustainable Austrian dataservices ecosystem. The technological foundation includes Blockchain technology for provenance, smart contracts and security, interconnected clouds, data access, constraint-preserving processing and analysis algorithms, semi-automated data quality improvement, and recommender-based brokerage technology. Additionally, two pilots in the areas of ICT for Mobility and ICT for Earth Observation are being developed to demonstrate the first usage scenarios of DMA.</p><p>The DMA is a network of participating (or member) organizations that contribute to the data market by offering their products in form of datasets or services to customers of the DMA. Each participating node must implement a defined set of services and mandatory standard interfaces. These are, for example instances of a Data Crawler a Metadata Mapper, a Blockchain peer, and Data Management and Storage components. Together with a common conceptual model, these standard interfaces represent the basis of interoperability for the use of datasets in the DMA.</p><p>The gateway to this network of nodes containing data and providing services is the DMA portal which, while not hosting any data or providing major services, collects information from all nodes to keep an up to date catalogue of available datasets. The focus of this demo is the design and implementation of this unified catalogue.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">A Semantic Catalogue for a Data Market</head><p>Since the data in the DMA lies in a set of distributed repositories, it is necessary to build a unified catalogue to enable end users to search all available data sets and services. Furthermore, a single catalogue can be exploited for recommendation, deduplication, and various metadata quality measures. In the DMA, the creation of this unified catalogue is approached by creating i) a single metadata standard for unified representation of data sets, including standardized vocabularies for describing resources, ii) tools for facilitating the compliance of existing metadata with the previous points and iii) the technological foundation for the building and maintenance of the catalogue itself.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Metadata standard</head><p>The DMA metadata catalogue is based on DCAT-AP, the DCAT application profile for data portals in Europe 6 and extends the schema for DMA use cases. This standardization enables future cooperation with international data portals and ensures that the DMA is easily accessible for cooperating companies with a certain data quality standard. The DMA extension of the DCAT-AP, the Data Maket Core Vocabulary (DMAV), provides more classes and properties for describing datasets and services that are accessible on the DMA. The extension focuses on the business use case of the DMA and adds predicates covering topics like price modeling and dataset exchange, not present in the original DCAT-AP catalogue. The dmav:priceModel predicate, for example, allows us to handle the transaction fees for commercial datasets that are being made available in the DMA. The dmav:SLA (Service Level Agreement) class allows to model the condition of a service contract in more details.</p><p>In the DMA metadata catalogue, every dataset constitutes an RDF<ref type="foot" target="#foot_2">7</ref> resource. There is a set of predicates that link every resource to different literals, which constitute the values of the metadata fields. These values can be of two types: i) literals, as in the case of dcat:description or owl:versionInfo, or ii) elements of a controlled vocabulary, as in the case of Language or License. These controlled vocabularies, which are managed by PoolParty Semantic Suite<ref type="foot" target="#foot_3">8</ref> , enable accurate search, filtering and linking of different datasets. Additionally, the DMA includes a series of semantic enrichment services which automatically annotate free-text fields (such as dcat:description or dcat:title) with elements of controlled vocabularies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tools for adoption of the metadata standards</head><p>Since the DMA aims at making available data which was not originally produced for commercialization, we must assume that the metadata describing it does not comply to any particular standard. This is specially true because the data in each node is managed by a different organization. Therefore, the conversion to the unified metadata standard described above must be treated in a case by case basis.</p><p>The DMA provides two tools to facilitate this. The first is a UI component in which a node's administrator can upload a sample (in XML or JSON) of the metadata they wish to make abailable in the DMA. They are then prompted to select, for each of the metadata fields required by the DMA, which fields of their metadata schema should be used. This UI tool, called the Metadata Mapping Builder is, in essence, a user-friendly way to generate XPath and JSONPath expressions. Once these expressions have been generated, they are arranged into an RML <ref type="bibr" target="#b0">[1]</ref> file, which is then used to produce RDF from similarly structured XML or JSON files.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Catalogue compilation and maintenance</head><p>Each node in the DMA that wishes to make a series of datasets available, must implement the following workflow. First, the Data Harvesting Component, which must be configured by the node's administrator to find the different datasets within the node, sends the corresponding metadata files to the Metadata Mapping Service, which uses the mapping file created as described above to generate, for each dataset, a set of RDF triples (serialized in Turtle format).</p><p>Afterwards, the dataset, its original metadata, and the corresponding RDF are ingested into the Data Management component which takes care of the packaging, versioning and assignment of unique identifiers to all datasets, whose hashes are furthermore registered in the Blockchain. Next The node's Data Management component publishes, through a ResourceSync 9 interface, links to metadata files in RDF format of recently added or updated datasets. This way, the node's metadata management is decoupled from the process of incorporating metadata into the DMA catalogue.</p><p>In the DMA's central node, the Metadata Ingestion component constantly polls the ResourceSync interfaces of all registered nodes, and when new datasets are reported, harvests their RDF metadata which, let us recall, already complies with the DMA metadata vocabulary. This metadata is then enriched semantically. The enrichment is based on EuroVoc 10 , which is used in DMA as the main thesaurus. The NLP interchange format <ref type="bibr" target="#b2">[3]</ref> is used for annotations, which are done in stand-off mode. The mapped and enriched metadata is then ingested into the Search and Recommendation Services. The high quality of the metadata and its compliance to the chosen scheme guarantees that the datasets and service are discoverable by the users of DMA.</p><p>With small variations, the processes described above are also used for ingesting publicly available data from goverment portals as well as ingesting small amounts of data that an individual would like to make available in the DMA.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_0">https://datamarket.at/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1">https://joinup.ec.europa.eu/release/dcat-ap-v11</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_2">https://www.w3.org/RDF/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_3">https://www.poolparty.biz/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements The Data Market Austria project is funded by the "ICT of the Future" program of the Austrian Research Promotion Agency (FFG) and the Federal Ministry of Transport, Innovation and Technology (BMVIT) under grant no. 855404</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Rml: A generic language for integrated rdf mappings of heterogeneous data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>LDOW</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><surname>Fernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Garcia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kiesling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Neuschmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mizerski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sabou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Thurner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wetz</surname></persName>
		</author>
		<title level="m">Propelling the potential of enterprise linked data in austria</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">roadmap and report</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Integrating nlp using linked data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brümmer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International semantic web conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="98" to="113" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Social Implications of a Data Market</title>
		<author>
			<persName><forename type="first">J</forename><surname>Höchtl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Lampoltshammer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Ce-DEM17 -Conference for E-Democracy and Open Government</title>
				<imprint>
			<publisher>Edition Donau-Universität Krems</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="171" to="175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Open Data as Social Capital in a Digital Society</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Lampoltshammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Scholz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Rethinking Social Capital: Global Contributions from Theory and Practice</title>
				<editor>
			<persName><forename type="first">E</forename><surname>Kapferer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Gstach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Koch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Sedmak</surname></persName>
		</editor>
		<meeting><address><addrLine>Newcastle upon Tyne</addrLine></address></meeting>
		<imprint>
			<publisher>Cambridge Scholars Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="137" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Manyika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bughin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dobbs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Roxburgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Byers</surname></persName>
		</author>
		<title level="m">Big data: The next frontier for innovation, competition, and productivity</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
