-

LOD4STAT: a scenario and requirements

Pavel Shvaiko

Michele Mostarda

Marco Amadori

Claudio Giuliano

TasLab

Informatica Trentina S.p.A.

Trento

Italy

Fondazione Bruno Kessler - IRST

Trento

Italy

In this short paper we present a scenario and requirements for ontology matching posed by a statistical eGovernment application, which aims at publishing its data (also) as linked open data. Introduction. Our application domain is eGovernment. By eGovernment we mean an area of application for information technologies to modernize public administration by optimizing work of various public institutions and by providing citizens and businesses with better and new services. More specifically, we focus on statistical applications for eGovernment. The driving idea is to capitalize on the statistical information in order to increase knowledge of the Trentino region. Releasing statistical data (with disclosure control) as linked open data aims at simplifying access to resources in digital formats, at increasing transparency and efficiency of eGovernment services, etc. The main challenge is the realization of a knowledge base, which is natively enabled to work with RDBMS tables. Despite this approach has been tailored specifically to the statistical database domain, there is substantial room for generalization. In this view, there was a number of initiatives aiming at releasing governmental data as linked open data to be taken into account: in GovWILD [1] links were established automatically with specifically developed similarity measures, while in [2], the alignment was done semiautomatically with Google Refine. The currently available matching techniques can be well used for automating this process [3]. Scenario. Figure 1 shows the key component, called Statistical Knowledge Base (SKB), of the LOD4STAT system-to-be. The SKB aims at enabling its users to query statistical data, metadata and relations across them without requiring specific knowledge of the underlying database. Users can issue queries, such as find all data related to population age and employment for the municipality of Trento. Specifically, user query is analyzed in order to extract concepts out of labels. Then, these are matched at run time against the SKB. For the query example, the term population age is connected to Registry Office, while employment is connected to Social Security. The system returns a set of tables, metadata and entities from the Registry Office (with information about population and age) and from the Social Security (with information about employment) containing data for the city of Trento and will suggest possible joins between columns. The SKB is an interconnected aggregation of ontologies (interpreted in a loose sense), such as WordNet, DBpedia, ESMS1 what allows both multi-classification and multiple views on data. These ontologies have to be matched among them to enable navigation across them through the respective correspondences. The SKB is also able to export query results in several formats, such as RDF Data Cube and JSON-Stat. The SKB is represented by three (horizontal) layers. The upper layer is a collection of ontologies specific to the statistics domain, e.g., ESMS. The middle layer is composed 1 http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/metadata

Acknowledgments. The work has been supported by the Autonomous Province of Trento, Italy.

1. C. Bo¨hm,

Freitag ,

Heise ,

Lehmann ,

Mascher ,

Naumann ,

Ercegovac , M. Herna´ndez, P. Haase, and M.l Schmidt. GovWILD: integrating open government data for transparency . In Proceedings of WWW , pages 321 - 324 , 2012 .

Maali ,

Cyganiak , and

Peristeras . A publishing pipeline for linked government data . In Proceedings of ESWC , pages 778 - 792 , 2012 .

Shvaiko and

Euzenat . Ontology matching: state of the art and future challenges . TKDE , 25 ( 1 ): 158 - 176 , 2013 .