<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sparqlines: SPARQL to Sparkline</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sarven Capadisli</string-name>
          <email>info@csarven.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Enterprise Information Systems Department, University of Bonn</institution>
          ,
          <addr-line>Bonn</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents sparqlines: statistical observations fetched from SPARQL endpoints and displayed as inline-charts. An inline-chart, also known as a sparkline, is concise, and located where it is discussed in the text, complementing the supporting text without breaking the reader's flow. For example, the   GDP per capita growth (annual %) [Canada] claimed by the World Bank Linked Dataspace. We demonstrate an implementation which allows scientists or authors to easily enhance their work with sparklines generated from their own or public statistical linked datasets. This article includes an active demonstration accessible at http://csarven.ca/sparqlines-sparql-to-sparkline.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data • Semantic publishing • Sparkline • SPARQL •</kwd>
        <kwd>Statistics • User interface</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this article we introduce sparqlines, an integration of statistical data re‐
trieval using SPARQL with displaying observations in the form of word-size
graphics: sparklines. We describe an implementation which is part of a Web
based authoring tool (dokieli). We cover how the data is modelled and ex‐
posed in order to be suitable for embedding; demonstrate how to embed data
as both static and dynamic sparklines and discuss the technical requirements
of each; and walk through the user interactions to do so.</p>
      <p>Our contribution is the generation of a well-established visual aid to reading
statistical data (the sparkline) directly from the dataset itself, at the time of
authoring the supporting text as part of the writing workflow. This enables
authors who are already publishing data to use it directly, as well as encourag‐
ing them to make their data available for others to use, and offers an easy way
to present the reader with a way to better understand the information.</p>
      <p>We conclude with a discussion, including design considerations. The code of
our implementation is open source, and we invite you to try it out and make
requests for more advanced features: https://github.com/linkeddata/dokieli.</p>
    </sec>
    <sec id="sec-2">
      <title>2    Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <sec id="sec-3-1">
        <title>2.1   Sparklines</title>
        <p>
          The earliest known implementation of an inline-chart was designed and pro‐
grammed by Peter Zelchenko and Mike Medved to represent historical charts
efficiently in the QuoteTracker software in early 1999 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. They are “data‐
words”, carrying dense information with the resolution of typography, particu‐
larly useful in places where the available screen real estate is minimal. Edward
Tufte describes sparkline as “small intense, simple, word-sized graphic with ty‐
pographic resolution” [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. They are designed to be included anywhere, for ex‐
ample embedded in a sentence, table or even a map, within the relevant con‐
text. When embedded in a sentence, they support the text, allowing continu‐
ous reading without the need to refer to a figure disjoint
from the original context, whilst still providing an opportunity for the reader
to investigate further by clicking on the data line to access each point of
source, per Figure 1.
        </p>
        <p>Sparkline graphics typically have a variable long dimension and a con‐
strained short dimension. In the case of a typographic line, the constraint can
be fixed to the height of the font-size of the encapsulating component. For ex‐
ample, the computed CSS height value of the embed HTML element that
contains the sparkline on the current viewing device is 20px, and so the em‐
bedded sparkline in this paragraph will have a vertical aspect ratio as such.</p>
        <p>
          Sparklines appear in many places where small datafeeds are useful; pro‐
grammatical insertion in text-editors and spreadsheets, fitness feeds from
wearable watches, social media analytics, streaming real-time quotes, elec‐
troencephalograms, system dashboards and trays, temperature and stock ac‐
tivity, to name a few. Studies show that novice and experienced investors us‐
ing stock reports with Sparklines will experience reduced cognitive load [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Sparklines in line, bar, column or win/loss graphs can be programatically
included in Google Drive documents by including data from an embedded ta‐
ble or sequence of numbers via the Google Charts API [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>The Wayback Machine uses sparklines to show an application of the snap‐
shots of a URL through time:</p>
        <p>There are sparkline implementations in JavaScript libraries like d3.js and
jQuery. Sparkline implementations also exist for command-line interfaces.
These tools tend to take input data in tabular form (CSV). Sparklines can
also be created by simple use of Unicode characters: ▂▁▄▃▆▅█▇.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2   RDF Data Cube and SPARQL</title>
        <p>
          Statistical data that is modelled with the RDF Data Cube vocabulary [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
makes it possible to discover and identify artefacts in a uniform way. This is in
contrast to writing applications to consume data from endpoints with hetero‐
geneous data models. For front-end Web applications, data can be fetched, ex‐
plored, and filtered from statistical linked dataspaces with SPARQL end‐
points, e.g., http://270a.info/. Utilising this method of access from within var‐
ious types of articles on the Web, makes it possible to build applications which
put more focus on user-interfaces rather than handling different data models
case by case, or burdensome data integration tasks. Furthermore, having easy
access to highly structured multidimensional data - essentially through an
HTTP GET request - makes it desirable to create static and real-time visualisa‐
tions.
        </p>
        <p>
          Sgvizler is a SPARQL result set visualisation JavaScript library that uses
Google Charts API to create sparkline images [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These are block-level raster
images.
        </p>
        <p>
          Investigation of analysis and visualisation of piracy reports have been con‐
ducted through endpoint querying with a SPARQL client for R [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          CubeViz was developed to visualise multidimensional statistical data. It is a
faceted browser, which utilizes the RDF Data Cube vocabulary, with a chart
visualisation component [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          Linked Statistical Data Analysis [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], presents a way to reuse data through
federated SPARQL queries, and generation of statistical analyses and scatter
plots. The stats.270a.info service stores computed analysis, and makes it possi‐
ble for future discovery.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3    Data Provision</title>
      <p>In order to use sparqlines, data has to be both well-formed and available over
a SPARQL endpoint. Here we briefly discuss both of these requirements.</p>
      <p>The RDF Data Cube vocabulary is used to describe multidimensional sta‐
tistical data. It makes it possible to represent significant amounts of heteroge‐
neous statistical data as Linked Data which can be discovered and identified
in a uniform way. To qualify for consumption as a sparqline, the data must
conform with some of the integrity constraints of the RDF Data Cube model,
e.g., IC-1 (Unique DataSet), IC-11 (All dimensions required), IC-12 (No dupli‐
cate observations), IC-14 (All measures present).</p>
      <p>Additional enrichments on the data cubes can improve their discovery and
reuse. Examples include but not limited to; providing human-readable labels
for the datasets (with language tags), classifications, and data structure defini‐
tion, as well as provenance level data like license, last updated.</p>
      <p>In order to allow user interfaces which can utilise a group of observations in
a dataset, slices should be made available in the data. This enables consuming
applications to dissect datasets (through SPARQL queries) for arbitrary sub‐
sets of observations. For example, while it is possible to construct a general
query to get all of the observations in a dataset which have a particular di‐
mension, it may be preferable to only query for such subsets provided that
their structures can be identified and externally referenced. In the case of
sparklines, one common use case for slices is to present data in time-series.</p>
      <p>SPARQL queries are used to filter for graph patterns in the RDF Data
Cube datasets. Depending on the user interface application, there may be mul‐
tiple queries made to the SPARQL endpoints in order to filter the data based
on user input. For example, an initial query may be a cursory inspection to
discover suitable datasets with given parameters, e.g., what the dataset is
about, the type of dimensions and their values, and subsequent queries may be
to retrieve the matching datasets or slices with observations and their measure
values.</p>
    </sec>
    <sec id="sec-5">
      <title>4    Static and Dynamic Sparqlines</title>
      <p>The data behind a sparqline can be static: a fixed historical set to which no
new points are added; or dynamic: subject to change as new data is gathered.
Both of these cases are accommodated by our implementation.
Our implementation allows authors to select text they have written which de‐
scribes the data they want to visualise; it searches available datasets for those
relevant to the text, and lets the user choose the most appropriate if there’s
more than one. The sparqline is inserted along with a reference to the source.</p>
      <p>A specific example workflow is demonstrated when this article is viewed in a
Web browser (at its canonical URL:
http://csarven.ca/sparqlines-sparqlto-sparkline). Enable the Edit mode from the ☰ menu and highlight the text
GDP of Canada. What occurs is as follows:
1. User enters text in a sentence e.g., GDP of Canada.
2. User selects text GDP of Canada with their mouse or keyboard.
3. The user select the “sparkline” option from presented authoring toolbar.
4. The input text is split into two: 1) GDP and 2) Canada segments, whereby
the first term is the concept, and the second is a reference area. Reference ar‐
eas are disambiguated against an internal dictionary.
5. System constructs a SPARQL query URL and sends it to the World Bank
Linked Dataspace endpoint, looking for a graph pattern where the datasets of
labels have “GDP” in them in which there is at least one observation for the
reference area “Canada”.
6. User is given a list of datasets to select from which match the above crite‐
ria, and the user selects desired dataset.
7. System sends a SPARQL query to get the observations of the selected
dataset for Canada.
8. A sparkline is created and displayed for the user, also indicating the number
of observations it has.
9. If the user is happy with this visualisation they include it in the text. A hy‐
perlink to the dataset, and a sparkline SVG is inserted back into the sentence
replacing GDP of Canada with GDP per capita growth (annual %).</p>
    </sec>
    <sec id="sec-6">
      <title>6    Semantic Publishing</title>
      <p>Our implementation in dokieli automatically includes semantic annotations
within the embedded sparqlines. The sparqline resource has its own URI that
can be used for global referencing. The RDF statements are represented using
the HTML+RDFa syntax, and they preserve the following information:</p>
      <p>The part of the document to which the sparqline belongs
(rel="schema:hasPart").</p>
      <p>The human-readable name for the figure (based on the dataset used), where
it was derived from (the qb:DataSet instance), and the generated SVG.</p>
      <p>The SVG resource has statements to indicate:
linked statistical dataset which was used (rel="prov:wasDerivedFrom").
human-readable name of the dataset (property="schema:name").
license for the generated SVG (rel="schema:license").</p>
      <p>further information for each qb:Observation (rel="rdfs:seeAlso").
This information can be discovered and parsed as RDF, thus making easy to
access and reuse by third-party applications. For example, another author can
cite or include these sparqlines in their work.</p>
    </sec>
    <sec id="sec-7">
      <title>7    Discussion and Conclusions</title>
      <p>We have presented a preliminary implementation of sparklines generated from
SPARQL endpoints and embedded directly through authoring tool. This al‐
lows authors to visualise their data in an optimal way without breaking their
workflow. However, there is a lot of scope for future work in this area. We now
discuss some areas for further development.</p>
      <p>
        Design principles: Tufte makes recommendations on readability, as well as
applying Cleveland’s analytical method of choosing aspect ratios banking to
45° [
        <xref ref-type="bibr" rid="ref10 ref11 ref2">2, 10, 11</xref>
        ]. Cleveland’s method has been extended to generate banked
sparklines by providing the vertical dimension to fit a typographical line.
These approaches help maximize the clarity of the line segments [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Apply‐
ing these methods is a future implementation in dokieli (issue 159).
      </p>
      <p>Dataset interaction: Building on existing work in faceted searching and
browsing of RDF data, authors can explore suitable datasets with a combina‐
tion of searching using natural-language and filtering through available
datasets and dimensions of interest. This approach is convenient for datasets
in RDF Data Cubes since they are highly structured and classified. Further
work is needed to improve the process for disambiguation of the author’s input
in natural language in order to discover appropriate URIs in the dataset.</p>
      <p>Privacy considerations: Many researchers collect experimental data
which has sensitive or identifiable information. This information should not be
exposed through public SPARQL endpoints. Measures such as access control
lists can allow researchers to generate sparqlines over sensitive data.</p>
      <p>Data availability: SPARQL endpoints are notoriously unreliable and they
may have high setup costs for new datasets. Applications which rely on end‐
points to generate sparqlines with dynamic data, may want to initially include
a local cached copy from the last access point in the article. The application
can then asynchronously fetch or subscribe for new updates.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The motivation and work on sparqlines was inspired by Edward Tufte’s educa‐
tion and popularisation of sparklines. Special thanks to Amy Guy and Ilaria
Liccardi for their great support and tireless nagging to get this written up, as
well as Jindřich Mynarz for help with SPARQL query optimisations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Zelchenko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medved</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : QuoteTracker, http://pete.zelchenko.com/portfolio /screen/2gk.htm
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Tufte</surname>
          </string-name>
          , E.: Beautiful Evidence, Graphics Press,
          <year>2006</year>
          , ISBN 9781930824164, http://www.worldcat.org/title/beautiful-evidence /oclc/70203994&amp;referer=brief_results
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. P. Meharia:
          <article-title>Use of Visualization in Digital Financial Reporting: The effect of Sparkline (</article-title>
          <year>2012</year>
          ). Theses and Dissertations--
          <source>Business Administration. Paper</source>
          <volume>1</volume>
          , http://uknowledge.uky.edu/cgi/viewcontent.cgi?article=1000&amp; context=busadmin_etds
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. Google Docs Sparklines, https://support.google.com/docs/answer/3093289?hl=en</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The RDF Data Cube vocabulary</article-title>
          ,
          <source>W3C Recommendation</source>
          ,
          <year>2014</year>
          , https://www.w3.org/TR/vocab
          <article-title>-data-cube/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Skjaeveland</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          :
          <article-title>Sgvizler: A JavaScript Wrapper for Easy Visualization of SPARQL Result Sets</article-title>
          ,
          <year>2012</year>
          , http://2012.eswc-conferences.org/sites/default/files /eswc2012_submission_303.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hage</surname>
            ,
            <given-names>W. R. v.</given-names>
          </string-name>
          ,
          <source>Marieke</source>
          v.,
          <string-name>
            <surname>Malaisé.</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Linked Open Piracy: A story about e-Science, Linked Data, and statistics (</article-title>
          <year>2012</year>
          ), http://www.few.vu.nl/~wrvhage /papers/LOP_JoDS_
          <year>2012</year>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Percy</surname>
            <given-names>E. Rivera</given-names>
          </string-name>
          <string-name>
            <surname>Salas</surname>
            ,
            <given-names>P. E. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>F. M. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breitman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casanova</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          :
          <article-title>Publishing Statistical Data on the Web, ISWC (</article-title>
          <year>2012</year>
          ), http://svn.aksw.org/papers/2012/ESWC_PublishingStatisticData/public.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Capadisli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Riedl</surname>
          </string-name>
          , R.:
          <article-title>Linked Statistical Data Analysis</article-title>
          ,
          <source>ISWC SemStats</source>
          (
          <year>2013</year>
          ), http://csarven.ca/linked-statistical
          <article-title>-data-analysis</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Edward</surname>
          </string-name>
          <article-title>Tufte forum: Sparkline theory and practice Edward Tufte</article-title>
          , http://www.edwardtufte.com/bboard/q-and
          <article-title>-a-fetch-msg?msg_id=0001OR&amp; topic_id=1</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cleveland</surname>
          </string-name>
          , W.: Visualizing Data, Hobart Press,
          <year>1993</year>
          , ISBN 9780963488404, http://dl.acm.org/citation.cfm?id=
          <fpage>529269</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Heer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maneesh</surname>
          </string-name>
          , A.:
          <string-name>
            <surname>Multi-Scale Banking</surname>
          </string-name>
          to
          <volume>45</volume>
          °,
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          , Vol.
          <volume>12</volume>
          , No.
          <volume>5</volume>
          ,
          <year>2006</year>
          , http://vis.berkeley.edu /papers/banking/2006-Banking-InfoVis.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>