<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">openChart: Charting Quantitative Properties in LOD</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Filip</forename><surname>Zembowicz</surname></persName>
						</author>
						<author>
							<persName><forename type="first">David</forename><surname>Opolon</surname></persName>
							<email>opolon@alum.mit.edu</email>
						</author>
						<author>
							<persName><forename type="first">Stephen</forename><surname>Miles</surname></persName>
							<email>s_miles@mit.edu</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Harvard University</orgName>
								<address>
									<addrLine>414 Quincy Mailing Center</addrLine>
									<postCode>02138</postCode>
									<settlement>Cambridge</settlement>
									<region>MA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">MIT ESD</orgName>
								<address>
									<addrLine>77 Massachusetts Avenue</addrLine>
									<postCode>E40-286, 02139</postCode>
									<settlement>Cambridge</settlement>
									<region>MA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">MIT AutoID Labs</orgName>
								<address>
									<addrLine>77 Massachusetts Avenue</addrLine>
									<postCode>35-014, 02139</postCode>
									<settlement>Cambridge</settlement>
									<region>MA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<address>
									<addrLine>LDOW2010, April 27</addrLine>
									<postCode>2010</postCode>
									<settlement>Raleigh</settlement>
									<region>North Carolina</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">openChart: Charting Quantitative Properties in LOD</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0F035A6114A3EB190C5D9B73D9C41F15</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T17:51+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3.4 [Semantic Web]: Visualization</term>
					<term>charting</term>
					<term>search Measurement</term>
					<term>Documentation</term>
					<term>Design</term>
					<term>Human Factors Linked Open Data</term>
					<term>Visualization</term>
					<term>Charting</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we discuss the development of openChart, a quantitative Linked Open Data charting tool. It targets novice semantic web users by generating SPARQL queries to present interesting information. We also acknowledge the problems encountered in development and suggest improvements.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>The wealth of information in the Linked Open Data cloud (hereafter LOD) is large and growing, enabling comparisons between previously isolated datasets to be made <ref type="bibr" target="#b3">[3]</ref>. However, exploring the linked data cloud is difficult for users unfamiliar with semantic web concepts such as SPARQL and RDF. Webbased visualization tools such as IBM's Many Eyes have shown promise in allowing collaborative data exploration <ref type="bibr">[7]</ref>. We have developed a tool that allows users to similarly plot quantitative data found on the LOD cloud, with minimal knowledge of semantic web syntax. This tool enables users to explore, share, and expand upon the data found in the LOD cloud.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">STRUCTURE</head><p>Finding data on the LOD cloud using openChart 1 consists of identifying an entity of interest, choosing two of its quantitative properties, and selecting a peer group with which to compare values. To enable entry into the semantic web, we use Wikipedia's autosuggest API to determine an entity's Wikipedia address. This is then matched with a corresponding semantic web resource using a SPARQL query on the DBpedia database, which is a central hub of the LOD cloud with many owl:sameAs linkages to other sources of data <ref type="bibr" target="#b2">[2]</ref>. While other endpoints could be used with the openChart framework, DBpedia has a high number of links to other LOD sources, making it useful for a general purpose tool.</p><p>.</p><p>Following the identification of an entity of interest, for example Bangladesh, we find the quantitative properties from the RDF resource by using regular expressions to remove non-quantitative information. Two of these are selected by the user, for example hdi and population density. Then, peer groups are found through a SPARQL query that looks for distinct rdfs:type that contains objects with both of the quantitative properties. These peer groups may or may not contain the users' original search term-but the selection of one, for example Country, will display a scatter-plot of the two variables. This peer group feature is an important aspect of our application because it allows the user to branch out when navigating information on the semantic web rather than fixate on answering one question in particular. . At all levels of the exploration, the data is locally cached in a MySQL server. The frontend is written using JavaScript and the jQuery library, while the backend is written in PHP 5 and the ARC2 library. The plotting is done using the Protovis JavaScript </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Easy Exploration with Structured Queries</head><p>A focus on a simple user interface has made openChart an easy to use introduction for WWW users unfamiliar with Linked Open Data. By focusing on peer groups and not only information directly relevant to a users' query, the openChart tool emphasizes a broad exploration of available data rather than merely answering a specific question. Additionally, we incorporate a social component into openChart, where interesting relationships between concepts can be shared. This is new knowledge that is being created, and eventually will be integrated into the LOD cloud itself by defining such shared charts as RDF objects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Identification of Errors in the Data 2</head><p>An additional benefit of displaying data visually in openChart is the ability to quickly identify errors within the data contained in the LOD cloud. In isolation, it is often difficult to see errors in scale or other such mistakes-displaying them as outliers enables mistakes to be rapidly identified. These data points can then be flagged for review in order to improve the quality of the source data, or any scripts that are used to parse the data into the RDF format in the first place. Such flagging could be achieved by defining a quality ontology and publishing triples for user identified errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">PROBLEMS ENCOUNTERED 4.1 Lack of Range Descriptors</head><p>When searching the LOD cloud through a SPARQL query, it would be economical to restrict SPARQL queries to retrieve only properties with ranges limiting them to numerical values. However, we found that many of the properties lack associated rdfs:range and/or rdfs:domain values. This resulted in a need to retrieve all results and then parse them using regular expressions, increasing the overhead of the application. Thus, we suggest that RDF authors take the time to specify rdfs:range and rdfs:domain values such as xsd:integer and xsd:decimal to facilitate statistical work using Linked Open Data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Lack of Unit Descriptors</head><p>Another aspect often missing from data sources, especially from DBPedia, is units of measure. Particularly when comparing across endpoints, it is imperative that the units of measurements are understood, in order to prevent scaling errors when comparing data from different sources. We suggest that creators of RDF data take the time to include unit specifications, either through ontologies such as Quantities, Units, Dimensions and Data Types in OWL and XML <ref type="bibr" target="#b6">[6]</ref>, or by agreeing on standardized unit abbreviations and distributing unit-aware parsers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">FUTURE DEVELOPMENT 5.1 Automated Provenance</head><p>Since the data in openChart is coming from multiple sources, tracking the sources of a chart's data would be important in enabling the use of the charts in research. As a result, we plan to implement a feature by which the origins of the data contained within a chart will be displayed concurrently with the chart.</p><p>Although RDF quadruples (such as <ref type="bibr" target="#b4">[4]</ref>) would allow this to be easily implemented, methods that determine authorship based on particular endpoint characteristics could be implemented currently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Integration with Existing LOD Browsers</head><p>There exist many existing browsers of semantic web data, such as Tabulator, which offer capabilities similar to our system <ref type="bibr" target="#b1">[1]</ref>.</p><p>Although openChart is easier to use than these programs, due to the restrictive nature of the queries permitted on our system, we are working to enable the switching back and forth between Tabulator and openChart, to allow more technical users to experience the full potential of the semantic web, using openChart as a starting point.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Publishing of Results</head><p>As mentioned previously, the information gleaned from openChart can be published for others to access. Statistical relationships can be described using the SCOVO ontology, which allows the specification of statistics with reference to a particular dataset over a range of time <ref type="bibr" target="#b5">[5]</ref>. Care must be taken to ensure the completeness of the data, however, since the statistics generated only represent the data published to the LOD cloud. Two groups of statistics are generated -one describing the local cloud itself, such as describing the number of triples, and another describing the data contained therein.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. The openChart workflow</figDesc><graphic coords="1,317.88,310.74,239.82,108.30" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">ACKNOWLEDGMENTS</head><p>We would like to thank Tim Berners-Lee, K. Krasnow Waterman, Reed Stuyvesant, Ian Jacobi, Oshani Seneviratne, and everyone else who participated in and organized MIT's Linked Data week in January of 2010.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Tabulator: Exploring and Analyzing linked data on the Semantic Web</title>
		<author>
			<persName><surname>Berners-Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Procedings of the The 3rd International Semantic Web User Interaction Workshop (SWUI06) workshop</title>
				<meeting>edings of the The 3rd International Semantic Web User Interaction Workshop (SWUI06) workshop<address><addrLine>Athens, Georgia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006-11-06">6 Nov 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">DBpedia -A Crystallization Point for the Web of Data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics: Science, Services and Agents on the World Wide Web</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="154" to="165" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Linked Data -The Story So Far</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Heath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Semantic Web and Information Systems</title>
		<imprint/>
	</monogr>
	<note>in press. Special Issue on Linked Data</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Quads: Extending N-Triples with Context</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Harth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><surname>N-</surname></persName>
		</author>
		<ptr target="http://sw.deri.org/2008/07/n-quads/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Using Statistics on the Web of Data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hausenblas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Halb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Raimond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Feigenbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ayers</surname></persName>
		</author>
		<author>
			<persName><surname>Scovo</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Masters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hodgson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Keller</surname></persName>
		</author>
		<author>
			<persName><surname>Quantities</surname></persName>
		</author>
		<ptr target="http://qudt.org/" />
		<title level="m">Units, Dimensions and Data Types in OWL and XML</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Many Eyes: A Site for Visualization at Internet Scale</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Viégas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wattenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Van Ham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kriss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mckeon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Infovis</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
