<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The OBDA-based “Observatory of Research and Innovation” of the Tuscany Region</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Mosca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Rondelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillem Rull</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SIRIS Lab, Research division of SIRIS Academic</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Tuscany's Observatory of Research and Innovation portal is an instrument to promote more transparent and inclusive governance in the region. We show its interactive dashboard and underlying SPARQL endpoint, powered by SIRIS Academic's UNiCS platform, which integrates Open Data on the Higher Education &amp; Research field, following the Ontology-Based Data Access approach.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>OBDA</kwd>
        <kwd>Higher Education &amp; Research</kwd>
        <kwd>Data-driven policies</kwd>
        <kwd>Interactive Data Visualisation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In line with the orientation adopted by the EU in its Europe 2020 strategy, the Italian
region of Tuscany has defined a set of policies aimed at supporting the Higher Education
and Research (HE&amp;R) system, and promoting innovation in the Tuscan territory.
Tuscany has provided the Regional Research and Innovation Observatory as a tool to
support the implementation of the Regional Development Program (PRS 20016/201). The
PRS is the cornerstone of regional policies: “[A] tool that expresses a vision for the
future of Tuscany and proposes constructive dialogue with the actors of the territory” (E.
Rossi, Region’s President, PRS introductory speech). This tool has the ambition to
communicate and enhance the strengths of the research system, and to host information on
research, innovation and higher education.</p>
      <p>
        Within the Observatory, and in support of the PRS, the Tuscany Region has decided
to have an information dashboard capable of integrating HE&amp;R data, keeping them up
to date, and supporting policy makers in designing their policies. This paper shows the
Observatory’s interactive dashboard, currently located at toscanaopenresearch.it2,
and the underlying SPARQL endpoint, which is powered by SIRIS Academic’s UNiCS
platform [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a system that integrates Open Data on the HE&amp;R field and makes them
accessible to users through a unified domain ontology, following the so-called
OntologyBased Data Access (OBDA) approach [
        <xref ref-type="bibr" rid="ref3 ref6">6,3</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1https://goo.gl/6VY4Co</title>
        <p>2Guest users can login with username/password: dao2017/dao2017</p>
        <p>Local Governments &amp;
Public Administration</p>
        <p>Department
Regional Government</p>
        <p>Directorate</p>
        <p>Citizens
KPIs EXPLORER
FULL-FLEDGE ACCESS POINT
Compliant SPARQL protocol service</p>
        <p>Ontology
Mappings
ryLaaeenod
itr
e
F</p>
        <p>HE&amp;R Open Data
Tuscany Internal data
END-USERs</p>
        <p>DATA VISUALISATIONs, QUERY SYSTEM</p>
        <p>OBDA LAYER</p>
        <p>DATA SOURCEs</p>
        <p>The paper is structured as follows. First, Section 2 introduces the underlying
platform that provides data to the Observatory, then Section 3 describes the Observatory’s
interactive dashboard.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The UNiCS Platform</title>
      <p>
        University Analytics (UNiCS) integrates open data repositories about HE&amp;R in Europe
and makes them available via a dedicated SPARQL endpoint [
        <xref ref-type="bibr" rid="ref8 ref9">9,8</xref>
        ]. Queries are posed in
terms of a domain ontology that provides a homogeneous view of the otherwise disparate
integrated datasets. While data are originally stored in relational databases (DBs), UNiCS
users see them as RDF data, the standard data model in Linked Data3. This is made
possible by -ontop- [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an OBDA system that allows querying relational DBs as virtual
RDF graphs using SPARQL. Given the domain ontology, and an R2RML [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] mapping
description that connects the ontology with the underlying DBs, -ontop- translates the
users’ SPARQL queries into SQL ones that are then run on the federated DB.
      </p>
      <p>
        In the context of the Tuscany’s Observatory, the core architecture of UNiCS includes
a relational DB into which the different Italian and European open data repositories have
been integrated, as per the Data Exchange [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] approach. The reason why these datasets
have been copied into a single relational DB is that they are not available as proper,
queryable DBs, but only as downloadable CSVs, so moving them into a more suitable
storage is required. The datasets include official Italian student and researcher data4
coming from the MIUR (Ministero dell’Istruzione, dell’Universita` e della Ricerca), and
European data on FP7 and H2020 research projects [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. An extension to this architecture
is planned to incorporate internal data managed by the Tuscany Region, which will be
federated with the UNiCS DB, and mapped into the UNiCS domain ontology. The
architecture is depicted in Figure 1. (Being able to integrate not just relational sources but
      </p>
      <sec id="sec-2-1">
        <title>3https://www.w3.org/DesignIssues/LinkedData.html</title>
        <p>4Ministero dell’Istruzione, dell’Universita` e della Ricerca: (i) Anagrafe nazionale studenti:
anagrafe.miur.it; (ii) Cerca universita`: cercauniversita.cineca.it
also datasets that are behind given SPARQL endpoints is an -ontop- extension we are
currently working on in collaboration with the KRDB Research Centre5).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data visualisation</title>
      <p>Currently, the Observatory consists mainly of an interactive dashboard, hosting data
visualisations (co-designed with the relevant stakeholders) fed by the underlying UNiCS
SPARQL endpoint. The visualisations are generated in real time by Javascript code
running on the user’s browser, which interrogates the endpoint for the necessary data.
Visualisations are interactive: the user can click on different components to drill down on
the results being displayed, as well as apply filters to focus on a particular data subset.
Pop-up windows are also displayed with additional information that is not originally
provided by the visual representation of the data, once the user selects a specific item in it.
As an example, Figure 2 shows the distribution of graduated students per Italian
bachelor faculty, and the different colours are meant to represent the overall number of years
they spent at the university. Once the user mouses over the graphic of a given faculty, a
window appears with a summarized view of the data behind, showing numbers that are
not visible in the original visualisation. Users can either download the data behind each
visualisation or copy and paste the queries which generate those data (see Figure 3), and
execute them, possibly modified according to new specific needs.</p>
      <p>The dashboard is now conceptually divided into four main sections: Teachers &amp;
researchers (showing distributions per gender, age, and disciplinary sectors), Teaching
(on student data, with provenance, success rates per bachelor and master degrees),
Research at universities (on EU funded projects, relative/absolute budgets, targeted to
public organisations), and Research at private companies (on EU/regional funded projects,
targeted to private organisations).</p>
      <p>The Observatory’s portal also includes a dedicated SPARQL endpoint and the
LODE-powered documentation of the relative domain ontology6. The endpoint includes
5http://www.inf.unibz.it/krdb/, Free University of Bozen-Bolzano, Italy.</p>
      <p>6http://34.250.237.252/toscana/sparql/docs/index.html
a library of pre-defined queries that either refer to the dashboard visualisations or have
been collaboratively specified with the managers of the regions to satisfy specific needs
and strategic demands. Users that are not familiar with SPARQL can then profit of the
library, modify existing queries and execute them. The portal’s visualisations and the
SPARQL endpoint can be used by policy makers to get a better understanding of the
current situation, and both monitor the effectiveness of recent policies and also be able
to design new policies based on evidence rather than intuition.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cogrel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komla-Ebri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          , G.:
          <article-title>Ontop: Answering SPARQL queries over relational databases</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <fpage>471</fpage>
          -
          <lpage>487</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: semantics and query answering</article-title>
          .
          <source>Theor. Comput. Sci</source>
          .
          <volume>336</volume>
          (
          <issue>1</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>124</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The combined approach to query answering in DL-Lite</article-title>
          . pp.
          <fpage>247</fpage>
          -
          <lpage>257</lpage>
          . KR'10, AAAI Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>EU</given-names>
            <surname>Publications</surname>
          </string-name>
          <article-title>Office: CORDIS</article-title>
          . http://cordis.europa.eu/ Accessed 12 Sept 2017
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>SIRIS</given-names>
            <surname>Academic</surname>
          </string-name>
          <article-title>: UNiCS</article-title>
          . http://university-analytics.
          <source>com/ Accessed 12 Sept 2017</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <source>Journal on data semantics. chap. Linking Data to Ontologies</source>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          . Springer-Verlag, Berlin, Heidelberg (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] W3C: R2RML: RDB to RDF Mapping Language</article-title>
          . https://www.w3.org/TR/r2rml/ Accessed 12 Sept 2017
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[8] W3C: SPARQL 1.1 Protocol</source>
          . https://www.w3.org/TR/sparql11-protocol/ Accessed 12 Sept 2017
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] W3C: SPARQL 1.1 Query Language</article-title>
          . https://www.w3.org/TR/sparql11-query/ Accessed 12 Sept 2017
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>