=Paper= {{Paper |id=Vol-1690/paper56 |storemode=property |title=Generating Conference Linked Open Data in One Click |pdfUrl=https://ceur-ws.org/Vol-1690/paper56.pdf |volume=Vol-1690 |authors=Andrea Giovanni Nuzzolese,Anna Lisa Gentile,Valentina Presutti,Aldo Gangemi |dblpUrl=https://dblp.org/rec/conf/semweb/NuzzoleseGPG16 }} ==Generating Conference Linked Open Data in One Click== https://ceur-ws.org/Vol-1690/paper56.pdf
     Generating Conference Linked Open Data
                  in One Click

               Andrea Giovanni Nuzzolese1 , Anna Lisa Gentile2 ,
                  Valentina Presutti1 , and Aldo Gangemi1
                  1
                   Semantic Technology Lab, ISTC-CNR. Italy
                          2
                            University of Mannheim
    andrea.nuzzolese@istc.cnr.it, annalisa@informatik.uni-mannheim.de,
              valentina.presutti@cnr.it, aldo.gangemi@cnr.it




      Abstract. In this paper we describe cLODg2 (conference Linked Open
      Data generator - version 2), a tool to collect, refine and produce Linked
      Data about scientific conferences with their associated publications, par-
      ticipants and events. Conference metadata collected from different un-
      structured and semi-structured resources must be expressed with appro-
      priate vocabularies to be exposed as Linked Data. cLODg2 facilitates this
      task by providing a one-click workflow to generate data which is ready
      to be integrated in the ScholarlyData.org dataset. cLODg2 is an open
      source project, which has the aim to foster the publication of scholarly
      Linked Open Data and encourage collaborative efforts in this direction
      between researchers and publishers.



1   Introduction

Scholarlydata [4] is the evolution of the Semantic Web Dog Food (SWDF)
dataset3 . The SWDF corpus was the first considerable effort to offer comprehen-
sive semantic descriptions of conference events [3], collecting linked data about
papers, people, organizations, and events related to academic conferences.
    A comprehensive description of Scholarlydata can be found in [4], while in
this paper we provide technical details about cLODg2, the Open Source tool4
that supports data generation for Scholarlydata. cLODg2 (conference Linked
Open Data generator - version 2) provides a one click process for the conference
metadata publication workflow. cLODg2 has been used to refactor the SWDF
dataset and to gather and publish new conference metadata5 . The tool provides
an easy process to generate Linked Data which can be directly added to the
ScholarlyData dataset.
3
  SWDF: http://data.semanticweb.org
4
  https://github.com/anuzzolese/cLODg2
5
  Amongst other it has been used for ESWC conference since 2014 http://2016.
  eswc-conferences.org
2   cLODg2 - publishing Conference Semantic Data

The main goal of cLODg2 is to facilitate the generation of conference Linked
Data which can be readily integrated in the Scholarlydata6 dataset. Scholarly-
data [4] is the evolution of the SWDF dataset [3] based on an improvement of
the Semantic Web Conference (SWC) Ontology7 , the Conference Ontology8 [5],
which improves SWC adopting best ontology design practices. The necessary
steps to add conference data to Scholarlydata are: (i) Data acquisition, (ii)
Linked Data generation, (iii) Linked Data enrichment and (iv) Linked Data
Publication.
    The Data acquisition step, to be done by the user, consists of acquiring meta-
data about the conference, generally exported from a conference management
system. We currently support data acquisition from CSV files9 . Additionally,
Linked Data represented with the SWC ontology can be used as initial input10 .
    Starting from provided input cLODg2 performs two sequential steps: Linked
Data generation and data enrichment. Figure 1 shows the system architecture,
including all accessed services and technologies, modelled as an UML activity
diagram. The initialisation step merely consists of configuring a property file to
point to (i) the collected CSV files containing the input data and (ii) the D2RQ
mapping that will serve for converting CSV files to RDF. A D2RQ mapping for
dealing with easychair data is provided by default, but expert users can change
this to import ad hoc CSV files.
    The Linked Data generation activity is composed of the following steps:

 – Data gathering. This action merely represents the system fetching data
   from the specified location. We remark that for the sake of simplicity we fix
   the easychair model for the input data, but that this can be easily configured
   for multiple data gathering support.
 – RDB population. This action aims at populating a relational database
   (RDB) from the CSV files gathered from the previous action. The RDB is
   based on HyperSQL (HSQLDB)11 , which is a lightweight open-source Java
   database;
 – D2R conversion. The previous action, i.e., RDB population, is preparatory
   to this step. In fact, cLODg2 relies on the D2R framework [1] to perform the
   conversion of a non-RDF source to RDF. The conversion is guided by the
   mapping provided as input. This mapping is described by using the D2RQ
6
   http://w3id.org/scholarlydata
7
   http://data.semanticweb.org/ns/swc/swc_2009-05-09.html
 8
   Refer to http://w3id.org/scholarlydata/ontology/conference-ontology.owl
   to obtain the OWL source code and to http://goo.gl/4lOHSk to obtain the HTML
   documentation of the Conference Ontology.
 9
   A simplified example of such data, exported from easychair.org can be found at
   https://github.com/anuzzolese/cLODg2/tree/master/csv_samples
10
   Example dump at https://github.com/AnLiGentile/cLODg/tree/master/
   resources/swdf_samples
11
   http://hsqldb.org
        Fig. 1. cLODg2 architecture represented as an UML activity diagram.


     mapping language [2]. cLODg2 is released along with a default mapping for
     easychair data and targets two distinct alternative datasets: the SWDF and
     Scholarlydata.
The Linked Data enrichment activity is composed of the following actions:
 – Reasoning-based alignment. Input of this action are the RDF triples pro-
   duced by the Linked Data generation activity. The output is the materialisa-
   tion of a set of RDF triples that enable the alignment to other ontologies and
   vocabularies, i.e., the SWDF ontology, SPAR12 , Dolce D013 , the Organiza-
   tion Ontology14 , FOAF, SKOS, icatzd , and the Collections Ontology15 . The
   alignment triples are materialised by means of OWL-DL reasoning, which is
   enabled by the Apache Jena inference layer;
 – Linking to other Linked Datasets. This action is aimed at producing in-
   stance level alignments, expressed via owl:sameAs axioms. The target linked
   datasets are ORCID16 and DOI17 . ORCID provides persistent digital identi-
   fiers for scientific researchers and academic authors. A digital object identifier
12
   http://www.sparontologies.net
13
   http://www.ontologydesignpatterns.org/ont/dul/d0.owl
14
   https://www.w3.org/TR/vocab-org
15
   http://purl.org/co
16
   http://orcid.org
17
   https://www.doi.org
      (DOI) is a serial code used to uniquely identify digital objects, particularly
      used for electronic documents. The alignments to ORCID are produced by
      relying on the public API provided by ORCID18 . The references to DOI are
      produced by relying on the API provided by Crossref19 , performing a search
      on each article title.
The Linked Data Publication step, which is the last action in the cLODg2 work-
flow, has to be done by the user and consists of submitting produced data to
Scholarlydata.org.


3      Conclusions
This paper describes cLODg2, a tool to collect, refine and produce Linked Data
to describe scientific conferences and their publications, participants and events.
The main contribution of this work is an open source tool to support the pro-
duction of metadata for conferences and scholarly data which is ready to be
integrate in the ScholarlyData dataset, with minimal user effort. Future work
will be mainly focused at addressing data quality and reduce duplications and
misspelling in the data.


References
1. C. Bizer and R. Cyganiak. D2R Server - Publishing Relational Databases on the
   Semantic Web. In Proc. of ISWC2006 Poster&Demo, 2006.
2. C. Bizer and A. Seaborne. D2RQ - Treating Non-RDF Databases as Virtual RDF
   Graphs. In Proc. of ISWC2004 posters, 2004.
3. K. Möller, T. Heath, S. Handschuh, and J. Domingue. Recipes for semantic web
   dog food: The eswc and iswc metadata projects. In Proc. of ISWC’07/ASWC’07,
   pages 802–815, Berlin, Heidelberg, 2007. Springer-Verlag.
4. A. G. Nuzzolese, A. L. Gentile, V. Presutti, and A. Gangemi. Conference Linked
   Data Our Web Dog Food has gone gourmet. In Proc. of ISWC2016 Resource Track,
   page to appear, 2016.
5. A. G. Nuzzolese, A. L. Gentile, V. Presutti, and A. Gangemi. Semantic web confer-
   ence ontology - a refactoring solution. In The Semantic Web: ESWC 2016 Satellite
   Events, page to appear. Springer, 2016.




18
     http://members.orcid.org/api/introduction-orcid-public-api
19
     http://www.crossref.org/guestquery