<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Visual Summary for Linked Open Data sources</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fabio</forename><surname>Benedetti</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Ingegneria</orgName>
								<orgName type="institution" key="instit1">Università di Modena e Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">Enzo Ferrari</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sonia</forename><surname>Bergamaschi</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Ingegneria</orgName>
								<orgName type="institution" key="instit1">Università di Modena e Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">Enzo Ferrari</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Laura</forename><surname>Po</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Ingegneria</orgName>
								<orgName type="institution" key="instit1">Università di Modena e Reggio Emilia</orgName>
								<orgName type="institution" key="instit2">Enzo Ferrari</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Visual Summary for Linked Open Data sources</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2800E1B3B2EFA7E66ECBF6F05789C380</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we propose LODeX, a tool that produces a representative summary of a Linked open Data (LOD) source starting from scratch, thus supporting users in exploring and understanding the contents of a dataset. The tool takes in input the URL of a SPARQL endpoint and launches a set of predefined SPARQL queries, from the results of the queries it generates a visual summary of the source. The summary reports statistical and structural information of the LOD dataset and it can be browsed to focus on particular classes or to explore their properties and their use. LODeX was tested on the 137 public SPARQL endpoints contained in Data Hub (formerly CKAN) 1 , one of the main Open Data catalogues. The statistical and structural information extraction was successfully performed on 107 sources, among these the most significant ones are included in the online version of the tool 2 .</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The RDF Data Model plays a key role in the birth and continuous expansion of the Web of data since it allows to represent structured and semi-structured data. However, while the LOD cloud is still growing, we assist to a lack of tools able to produce a meaningful, high level representation of these datasets.</p><p>Quite a lot of portals catalog datasets that are available as LOD on the Web and permit users to perform keyword search over their list of sources. Nevertheless, when a user starts exploring in details an unknown LOD dataset, several issues arise: <ref type="bibr" target="#b0">(1)</ref> the difficulty in finding documentation and, in particular, a high level description of classes and properties of the dataset; (2) the complexity of understanding the schema of the source, since there are no fixed modeling rules; (3) the effort to explore a source with a high number of instances; (4) the impossibility, for non skilled users, to write specific SPARQL queries in order to explore the content of the dataset.</p><p>To overcome the above problems, we devise LODeX, a tool able to automatically provide a high level summarization of a LOD dataset, including its inferred schema. It is composed by several algorithms that discern between intensional and extensional knowledge. Moreover, it handles the problem of long running queries, that are subject to timeout failures, by generating a pool of low complexity queries able to return the same information.</p><p>As presented in <ref type="bibr" target="#b2">[3]</ref>, the majority of the tools for data visualization is not able to provide a synthetic view of the data (instances) contained in a single source. Payola<ref type="foot" target="#foot_0">3</ref>  <ref type="bibr" target="#b3">[4]</ref> and LOD Visualization<ref type="foot" target="#foot_1">4</ref>  <ref type="bibr" target="#b1">[2]</ref> are two recent tools that exploits analysis functionalities for guiding the process of visualization. However, these tools always need some querying parameters to start the analysis of a LOD dataset. Conversely, LODeX neither requires any a priori knowledge of the dataset, nor asks users to set any parameters; it focuses on extracting the schema from a LOD endpoint and producing a summarized view of the concepts contained in the dataset.</p><p>The paper is structured as follows. Section 2 describes the architecture of LODeX, while a use case and demonstration scenario is described in Section 3. Conclusions and some ideas for future work are described in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">LODeX -Overview</head><p>LODeX aims to be totally automatic in the production of the schema summary.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> depicts the architecture of LODeX. The tool is composed by three main processes: Index Extraction, Post-processing and Visualization. The goal of the first two steps is to automatically extract from a SPARQL endpoint the information needed to produce its schema summary, while the third step aims to produce a navigable view of schema summary for the users. For an easy reuse, all the contents extracted and processed by LODeX are stored in a NoSQL document database, since it allows a flexible representation of the indexes. The Index Extraction (IE) takes as input the URL of a SPARQL endpoint and generates the queries needed to extract structural and statistical information about the source. Major details about the IE process can be found in <ref type="bibr" target="#b0">[1]</ref>. The IE component has been designed in order to maximize the compatibility with LOD sources and minimize the costs in terms of time and computational complexity. The intensional and extensional knowledge are extracted and collected in a set of statistical indexes, stored in the NoSQL Database.</p><p>The Post-processing (PP) combines the information contained in the statistical indexes to produce the schema summary of a specific dataset. The summary is induced from the distribution of the instances in the dataset. The PP also collects synthetic information regarding the endpoint. Also the schema summary is stored in the NoSQL database.</p><p>The Visualization of the schema summary is performed through a web application written in Python that uses NoSQL database as backend. We used Data Driven Documents<ref type="foot" target="#foot_2">5</ref> to create a visual representation of the dataset with which the user can interact to navigate the schema and discover the information that he/she is looking for.</p><p>The tool has been tested on the entire set of sources described in SPARQL Endpoint Status(SPARQLES)<ref type="foot" target="#foot_3">6</ref> , a specialized application that recursively monitors the availability of public SPARQL Endpoints contained in DataHub. At the time of our evaluation (May 2014), SPARQLES indicated that the 52% of SPARQL endpoints (244/469) were available and only the 13% of the endpoints presented a documentation, i.e. VoID and/or Service descriptions. LODeX was able to complete the extraction phase, thus building the visual summaries, for 107 LOD sources (78% of the 137 dataset that were compliant with the necessary SPARQL operators) that are now collected and shown in the online demo.</p><p>Fig. <ref type="figure">2</ref>. Visual Summary of the Linked Clean Energy Data source and a particular of the "Sector" property.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Use Case and Demonstration Scenario</head><p>We refer to an hypothetical use-case involving a company in the clean energy sector. The company has its own products and services and attempts to discover new information on renewable energy and energy efficiency in the country where it is located. While searching the key datasets in the energy field, the company will likely find the Linked Clean Energy Data dataset<ref type="foot" target="#foot_4">7</ref> . This dataset, composed of 60140 triples, is described as a "Comprehensive set of linked clean energy data including: policy and regulatory country profiles, key stakeholders, project outcome documents and a thesaurus on renewable, energy efficiency and climate change for public re-use".</p><p>By using our application to explore this dataset (see Figure <ref type="figure">2</ref>)<ref type="foot" target="#foot_5">8</ref> , the user can, at a glance, have the intuition of all the instantiated classes (the nodes in the graph) and the connections among them (the arcs), besides the number of instances defined for each class (reflected in the dimension of the node). Focusing on the color of the nodes in the graph, a user can understand which classes are defined by the provider of the source and which others are taken from external vocabularies (in this case we can see that some of the class definitions are acquired from Foaf, Geonames.org and Skos). By positioning the mouse on a node, more information about the class are shown (as depicted in Figure <ref type="figure">2</ref> on the left). Since classes are linked to each others by some properties, it is possible to explore the property details. Thus, by clicking on a property another visual representation of the intensional knowledge is shown (see the right part of Figure <ref type="figure">2</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions and Future Work</head><p>This paper has shown how LODeX is able to provide a visual and navigable summary of a LOD dataset including its inferred schema starting from the URL of a SPARQL Endpoint. The result gained by LODeX could also be useful to enrich LOD sources' documentation, since the schema summary can be easily translated with respect to a vocabulary and inserted into the LOD source. LODex is currently limited to display the contents of a source proposing a graph. However, new developments are being implemented in order to facilitate the query definition by exploiting the visual summary.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. LODeX Architecture</figDesc><graphic coords="2,200.26,409.43,212.72,72.56" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="3,136.47,326.45,340.20,177.86" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">http://live.payola.cz/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">http://lodvisualization.appspot.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">http://d3js.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">http://sparqles.okfn.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">http://data.reegle.info/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">The visual summary of this source is available at http://dbgroup.unimo.it/lodex/157</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This work has been accomplished in the framework of a PhD program organized by the Global Grant Spinner 2013, and funded by the European Social Fund and the Emilia Romagna Region.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Online index extraction from linked open data sources. To appear in Linked Data for Information Extraction</title>
		<author>
			<persName><forename type="first">F</forename><surname>Benedetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bergamaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Po</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LD4IE) Workshop held at International Semantic Web Conference</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The linked data visualization model</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Brunetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Garca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference (Posters &amp; Demos)</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Approaches to visualising linked data: A survey</title>
		<author>
			<persName><forename type="first">A.-S</forename><surname>Dadzie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="89" to="124" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Payola: Collaborative linked data analysis and visualization framework</title>
		<author>
			<persName><forename type="first">J</forename><surname>Klímek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Helmich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nečaskỳ</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: ESWC 2013 Satellite Events</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="147" to="151" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
