=Paper=
{{Paper
|id=None
|storemode=property
|title=OpenChart: Charting Quantitative Properties in LOD
|pdfUrl=https://ceur-ws.org/Vol-628/ldow2010_paper06.pdf
|volume=Vol-628
|dblpUrl=https://dblp.org/rec/conf/www/ZembowiczOM10
}}
==OpenChart: Charting Quantitative Properties in LOD==
openChart: Charting Quantitative Properties in LOD
Filip Zembowicz David Opolon Stephen Miles
Harvard University MIT ESD MIT AutoID Labs
414 Quincy Mailing Center 77 Massachusetts Avenue, E40-286 77 Massachusetts Avenue, 35-014
Cambridge, MA 02138 Cambridge, MA 02139 Cambridge, MA 02139
fzembow@fas.harvard.edu opolon@alum.mit.edu s_miles@mit.edu
ABSTRACT resource by using regular expressions to remove non-quantitative
information. Two of these are selected by the user, for example
In this paper, we discuss the development of openChart, a hdi and population density. Then, peer groups are found through a
quantitative Linked Open Data charting tool. It targets novice SPARQL query that looks for distinct rdfs:type that contains
semantic web users by generating SPARQL queries to present objects with both of the quantitative properties. These peer groups
interesting information. We also acknowledge the problems may or may not contain the users’ original search term—but the
encountered in development and suggest improvements. selection of one, for example Country, will display a scatter-plot
of the two variables. This peer group feature is an important
Categories and Subject Descriptors aspect of our application because it allows the user to branch out
H.3.4 [Semantic Web]: Visualization, charting, search. when navigating information on the semantic web rather than
fixate on answering one question in particular. .
General Terms
Measurement, Documentation, Design, Human Factors
Keywords
Linked Open Data, Visualization, Charting
1. INTRODUCTION
The wealth of information in the Linked Open Data cloud
(hereafter LOD) is large and growing, enabling comparisons
between previously isolated datasets to be made [3]. However,
exploring the linked data cloud is difficult for users unfamiliar
with semantic web concepts such as SPARQL and RDF. Web-
based visualization tools such as IBM’s Many Eyes have shown Figure 1. The openChart workflow
promise in allowing collaborative data exploration [7]. We have
developed a tool that allows users to similarly plot quantitative At all levels of the exploration, the data is locally cached in a
data found on the LOD cloud, with minimal knowledge of MySQL server. The frontend is written using JavaScript and the
semantic web syntax. This tool enables users to explore, share, jQuery library, while the backend is written in PHP 5 and the
and expand upon the data found in the LOD cloud. ARC2 library. The plotting is done using the Protovis JavaScript
2. STRUCTURE 3. RESULTS
Finding data on the LOD cloud using openChart1 consists of
identifying an entity of interest, choosing two of its quantitative
3.1 Easy Exploration with Structured Queries
A focus on a simple user interface has made openChart an easy to
properties, and selecting a peer group with which to compare
use introduction for WWW users unfamiliar with Linked Open
values. To enable entry into the semantic web, we use
Data. By focusing on peer groups and not only information
Wikipedia’s autosuggest API to determine an entity’s Wikipedia
directly relevant to a users’ query, the openChart tool emphasizes
address. This is then matched with a corresponding semantic web
a broad exploration of available data rather than merely answering
resource using a SPARQL query on the DBpedia database, which
a specific question. Additionally, we incorporate a social
is a central hub of the LOD cloud with many owl:sameAs linkages
component into openChart, where interesting relationships
to other sources of data [2]. While other endpoints could be used
between concepts can be shared. This is new knowledge that is
with the openChart framework, DBpedia has a high number of
being created, and eventually will be integrated into the LOD
links to other LOD sources, making it useful for a general purpose
cloud itself by defining such shared charts as RDF objects.
tool. .
3.2 Identification of Errors in the Data2
Following the identification of an entity of interest, for example
An additional benefit of displaying data visually in openChart is
Bangladesh, we find the quantitative properties from the RDF
the ability to quickly identify errors within the data contained in
the LOD cloud. In isolation, it is often difficult to see errors in
Copyright is held by the author/owner(s). ___________________
LDOW2010, April 27, 2010, Raleigh, North Carolina, USA. 1
The demonstration can be found at
http://openchart.mit.edu
2
An example may be seen:
http://hcs.harvard.edu/datavis/linkeddata/gallery/index.php?chart
=19
scale or other such mistakes—displaying them as outliers enables experience the full potential of the semantic web, using openChart
mistakes to be rapidly identified. These data points can then be as a starting point.
flagged for review in order to improve the quality of the source
data, or any scripts that are used to parse the data into the RDF 5.3 Publishing of Results
format in the first place. Such flagging could be achieved by As mentioned previously, the information gleaned from
defining a quality ontology and publishing triples for user openChart can be published for others to access. Statistical
identified errors. relationships can be described using the SCOVO ontology, which
allows the specification of statistics with reference to a particular
dataset over a range of time [5]. Care must be taken to ensure the
4. PROBLEMS ENCOUNTERED completeness of the data, however, since the statistics generated
4.1 Lack of Range Descriptors only represent the data published to the LOD cloud. Two groups
When searching the LOD cloud through a SPARQL query, it of statistics are generated – one describing the local cloud itself,
would be economical to restrict SPARQL queries to retrieve only such as describing the number of triples, and another describing
properties with ranges limiting them to numerical values. the data contained therein.
However, we found that many of the properties lack associated
rdfs:range and/or rdfs:domain values. This resulted in a need to 6. ACKNOWLEDGMENTS
retrieve all results and then parse them using regular expressions, We would like to thank Tim Berners-Lee, K. Krasnow Waterman,
increasing the overhead of the application. Thus, we suggest that Reed Stuyvesant, Ian Jacobi, Oshani Seneviratne, and everyone
RDF authors take the time to specify rdfs:range and rdfs:domain else who participated in and organized MIT’s Linked Data week
values such as xsd:integer and xsd:decimal to facilitate statistical in January of 2010.
work using Linked Open Data.
4.2 Lack of Unit Descriptors 7. REFERENCES
[1] Berners-Lee et. al., Tabulator: Exploring and Analyzing
Another aspect often missing from data sources, especially from
linked data on the Semantic Web, Procedings of the The
DBPedia, is units of measure. Particularly when comparing across
3rd International Semantic Web User Interaction
endpoints, it is imperative that the units of measurements are
Workshop (SWUI06) workshop, Athens, Georgia, 6
understood, in order to prevent scaling errors when comparing
Nov 2006.
data from different sources. We suggest that creators of RDF data
take the time to include unit specifications, either through [2] Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker,
ontologies such as Quantities, Units, Dimensions and Data Types C., Cyganiak, R., and Hellmann, S. DBpedia – A
in OWL and XML [6], or by agreeing on standardized unit Crystallization Point for the Web of Data. Journal of
abbreviations and distributing unit-aware parsers. Web Semantics: Science, Services and Agents on the
World Wide Web, Issue 7, Pages 154–165, 2009.
5. FUTURE DEVELOPMENT [3] Bizer, C., Heath, T., and Berners-Lee, T. Linked Data -
5.1 Automated Provenance The Story So Far (in press). International Journal on
Semantic Web and Information Systems, Special Issue
Since the data in openChart is coming from multiple sources,
on Linked Data.
tracking the sources of a chart’s data would be important in
enabling the use of the charts in research. As a result, we plan to [4] Cyganiak, R., Harth, A., and Hogan, A. N-Quads:
implement a feature by which the origins of the data contained Extending N-Triples with Context.
within a chart will be displayed concurrently with the chart. (http://sw.deri.org/2008/07/n-quads/)
Although RDF quadruples (such as [4]) would allow this to be [5] Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum,
easily implemented, methods that determine authorship based on L., and Ayers, D. SCOVO: Using Statistics on the Web
particular endpoint characteristics could be implemented of Data.
currently.
[6] Masters, J, Hodgson, R., and Keller, P. Quantities,
5.2 Integration with Existing LOD Browsers Units, Dimensions and Data Types in OWL and XML
There exist many existing browsers of semantic web data, such as (http://qudt.org/)
Tabulator, which offer capabilities similar to our system [1]. [7] Viégas, F.B., Wattenberg, M., van Ham, F., Kriss, J.,
Although openChart is easier to use than these programs, due to and McKeon, M. Many Eyes: A Site for Visualization
the restrictive nature of the queries permitted on our system, we at Internet Scale . Infovis, 2007.
are working to enable the switching back and forth between
Tabulator and openChart, to allow more technical users to