=Paper= {{Paper |id=None |storemode=property |title=OpenChart: Charting Quantitative Properties in LOD |pdfUrl=https://ceur-ws.org/Vol-628/ldow2010_paper06.pdf |volume=Vol-628 |dblpUrl=https://dblp.org/rec/conf/www/ZembowiczOM10 }} ==OpenChart: Charting Quantitative Properties in LOD== https://ceur-ws.org/Vol-628/ldow2010_paper06.pdf
        openChart: Charting Quantitative Properties in LOD
           Filip Zembowicz                                  David Opolon                              Stephen Miles
          Harvard University                                MIT ESD                                MIT AutoID Labs
       414 Quincy Mailing Center                 77 Massachusetts Avenue, E40-286           77 Massachusetts Avenue, 35-014
         Cambridge, MA 02138                          Cambridge, MA 02139                        Cambridge, MA 02139
   fzembow@fas.harvard.edu                             opolon@alum.mit.edu                          s_miles@mit.edu


ABSTRACT                                                              resource by using regular expressions to remove non-quantitative
                                                                      information. Two of these are selected by the user, for example
In this paper, we discuss the development of openChart, a             hdi and population density. Then, peer groups are found through a
quantitative Linked Open Data charting tool. It targets novice        SPARQL query that looks for distinct rdfs:type that contains
semantic web users by generating SPARQL queries to present            objects with both of the quantitative properties. These peer groups
interesting information. We also acknowledge the problems             may or may not contain the users’ original search term—but the
encountered in development and suggest improvements.                  selection of one, for example Country, will display a scatter-plot
                                                                      of the two variables. This peer group feature is an important
Categories and Subject Descriptors                                    aspect of our application because it allows the user to branch out
H.3.4 [Semantic Web]: Visualization, charting, search.                when navigating information on the semantic web rather than
                                                                      fixate on answering one question in particular.                   .
General Terms
Measurement, Documentation, Design, Human Factors

Keywords
Linked Open Data, Visualization, Charting

1. INTRODUCTION
The wealth of information in the Linked Open Data cloud
(hereafter LOD) is large and growing, enabling comparisons
between previously isolated datasets to be made [3]. However,
exploring the linked data cloud is difficult for users unfamiliar
with semantic web concepts such as SPARQL and RDF. Web-
based visualization tools such as IBM’s Many Eyes have shown              Figure 1. The openChart workflow
promise in allowing collaborative data exploration [7]. We have
developed a tool that allows users to similarly plot quantitative     At all levels of the exploration, the data is locally cached in a
data found on the LOD cloud, with minimal knowledge of                MySQL server. The frontend is written using JavaScript and the
semantic web syntax. This tool enables users to explore, share,       jQuery library, while the backend is written in PHP 5 and the
and expand upon the data found in the LOD cloud.                      ARC2 library. The plotting is done using the Protovis JavaScript

2. STRUCTURE                                                          3. RESULTS
Finding data on the LOD cloud using openChart1 consists of
identifying an entity of interest, choosing two of its quantitative
                                                                      3.1 Easy Exploration with Structured Queries
                                                                      A focus on a simple user interface has made openChart an easy to
properties, and selecting a peer group with which to compare
                                                                      use introduction for WWW users unfamiliar with Linked Open
values. To enable entry into the semantic web, we use
                                                                      Data. By focusing on peer groups and not only information
Wikipedia’s autosuggest API to determine an entity’s Wikipedia
                                                                      directly relevant to a users’ query, the openChart tool emphasizes
address. This is then matched with a corresponding semantic web
                                                                      a broad exploration of available data rather than merely answering
resource using a SPARQL query on the DBpedia database, which
                                                                      a specific question. Additionally, we incorporate a social
is a central hub of the LOD cloud with many owl:sameAs linkages
                                                                      component into openChart, where interesting relationships
to other sources of data [2]. While other endpoints could be used
                                                                      between concepts can be shared. This is new knowledge that is
with the openChart framework, DBpedia has a high number of
                                                                      being created, and eventually will be integrated into the LOD
links to other LOD sources, making it useful for a general purpose
                                                                      cloud itself by defining such shared charts as RDF objects.
tool.                                                             .
                                                                      3.2 Identification of Errors in the Data2
Following the identification of an entity of interest, for example
                                                                      An additional benefit of displaying data visually in openChart is
Bangladesh, we find the quantitative properties from the RDF
                                                                      the ability to quickly identify errors within the data contained in
                                                                      the LOD cloud. In isolation, it is often difficult to see errors in
 Copyright is held by the author/owner(s).                            ___________________
 LDOW2010, April 27, 2010, Raleigh, North Carolina, USA.              1
                                                                        The demonstration can be found at
                                                                      http://openchart.mit.edu
                                                                      2
                                                                        An example may be seen:
                                                                      http://hcs.harvard.edu/datavis/linkeddata/gallery/index.php?chart
                                                                      =19
scale or other such mistakes—displaying them as outliers enables       experience the full potential of the semantic web, using openChart
mistakes to be rapidly identified. These data points can then be       as a starting point.
flagged for review in order to improve the quality of the source
data, or any scripts that are used to parse the data into the RDF      5.3 Publishing of Results
format in the first place. Such flagging could be achieved by          As mentioned previously, the information gleaned from
defining a quality ontology and publishing triples for user            openChart can be published for others to access. Statistical
identified errors.                                                     relationships can be described using the SCOVO ontology, which
                                                                       allows the specification of statistics with reference to a particular
                                                                       dataset over a range of time [5]. Care must be taken to ensure the
4. PROBLEMS ENCOUNTERED                                                completeness of the data, however, since the statistics generated
4.1 Lack of Range Descriptors                                          only represent the data published to the LOD cloud. Two groups
When searching the LOD cloud through a SPARQL query, it                of statistics are generated – one describing the local cloud itself,
would be economical to restrict SPARQL queries to retrieve only        such as describing the number of triples, and another describing
properties with ranges limiting them to numerical values.              the data contained therein.
However, we found that many of the properties lack associated
rdfs:range and/or rdfs:domain values. This resulted in a need to       6. ACKNOWLEDGMENTS
retrieve all results and then parse them using regular expressions,    We would like to thank Tim Berners-Lee, K. Krasnow Waterman,
increasing the overhead of the application. Thus, we suggest that      Reed Stuyvesant, Ian Jacobi, Oshani Seneviratne, and everyone
RDF authors take the time to specify rdfs:range and rdfs:domain        else who participated in and organized MIT’s Linked Data week
values such as xsd:integer and xsd:decimal to facilitate statistical   in January of 2010.
work using Linked Open Data.

4.2 Lack of Unit Descriptors                                           7. REFERENCES
                                                                       [1]       Berners-Lee et. al., Tabulator: Exploring and Analyzing
Another aspect often missing from data sources, especially from
                                                                                 linked data on the Semantic Web, Procedings of the The
DBPedia, is units of measure. Particularly when comparing across
                                                                                 3rd International Semantic Web User Interaction
endpoints, it is imperative that the units of measurements are
                                                                                 Workshop (SWUI06) workshop, Athens, Georgia, 6
understood, in order to prevent scaling errors when comparing
                                                                                 Nov 2006.
data from different sources. We suggest that creators of RDF data
take the time to include unit specifications, either through           [2]       Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker,
ontologies such as Quantities, Units, Dimensions and Data Types                  C., Cyganiak, R., and Hellmann, S. DBpedia – A
in OWL and XML [6], or by agreeing on standardized unit                          Crystallization Point for the Web of Data. Journal of
abbreviations and distributing unit-aware parsers.                               Web Semantics: Science, Services and Agents on the
                                                                                 World Wide Web, Issue 7, Pages 154–165, 2009.
5. FUTURE DEVELOPMENT                                                  [3]       Bizer, C., Heath, T., and Berners-Lee, T. Linked Data -
5.1 Automated Provenance                                                         The Story So Far (in press). International Journal on
                                                                                 Semantic Web and Information Systems, Special Issue
Since the data in openChart is coming from multiple sources,
                                                                                 on Linked Data.
tracking the sources of a chart’s data would be important in
enabling the use of the charts in research. As a result, we plan to    [4]       Cyganiak, R., Harth, A., and Hogan, A. N-Quads:
implement a feature by which the origins of the data contained                   Extending          N-Triples       with Context.
within a chart will be displayed concurrently with the chart.                    (http://sw.deri.org/2008/07/n-quads/)
Although RDF quadruples (such as [4]) would allow this to be           [5]       Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum,
easily implemented, methods that determine authorship based on                   L., and Ayers, D. SCOVO: Using Statistics on the Web
particular endpoint characteristics could be implemented                         of Data.
currently.
                                                                       [6]       Masters, J, Hodgson, R., and Keller, P. Quantities,
5.2 Integration with Existing LOD Browsers                                       Units, Dimensions and Data Types in OWL and XML
There exist many existing browsers of semantic web data, such as                 (http://qudt.org/)
Tabulator, which offer capabilities similar to our system [1].         [7]       Viégas, F.B., Wattenberg, M., van Ham, F., Kriss, J.,
Although openChart is easier to use than these programs, due to                  and McKeon, M. Many Eyes: A Site for Visualization
the restrictive nature of the queries permitted on our system, we                at Internet Scale . Infovis, 2007.
are working to enable the switching back and forth between
Tabulator and openChart, to allow more technical users to