=Paper= {{Paper |id=Vol-1515/demo5 |storemode=property |title=EDN-LD: A simple linked data tool |pdfUrl=https://ceur-ws.org/Vol-1515/demo5.pdf |volume=Vol-1515 |dblpUrl=https://dblp.org/rec/conf/icbo/Overton15 }} ==EDN-LD: A simple linked data tool== https://ceur-ws.org/Vol-1515/demo5.pdf
                                    EDN-LD: A simple linked data tool
                                                           James A. Overton1∗
                                                   1
                                                       Knocean, Toronto, Ontario, Canada




ABSTRACT                                                                    programming language, and JSON is now widely used for data
  EDN-LD is a set of conventions for representing linked data using         transfer between programs. It has replaced heavier formats such as
Extensible Data Notation (EDN) and a library for conveniently working       XML in many applications.
with those representations in the Clojure programming language. It             JSON has many limitations, including a lack of comments,
provides a lightweight alternative to existing linked data tools for many   ambiguous numbers, and the lack of any mechanism for extending
common use cases, much in the spirit of JSON-LD. We present the             its types. In practise, strings are used to represent most types of
motivation and design of EDN-LD, and demonstrate how it can clearly         data, but since it is difficult to attach type information to aid in their
and concisely transform tables into triples.                                interpretation, this can quickly lead to ambiguity.
                                                                               The ubiquity of JSON was one motivation for the JSON-LD W3C
1     INTRODUCTION                                                          Recommendation: “A JSON-based Serialization for Linked Data”.7
                                                                            In JSON-LD strings are used to represent IRIs (and compact IRIs)
EDN-LD is a set of conventions for representing linked data using
                                                                            for resources, plain literals can be strings, and typed literals are
Extensible Data Notation (EDN),1 and a library for conveniently
                                                                            objects (maps) with a special @value, @type, and @language
working with those representations in the Clojure programming
                                                                            keys. Graphs and datasets are represented as nested objects (maps)
language.2 Clojure is a modern Lisp that runs on the Java Virtual
                                                                            and sets are represented by arrays, with details depending on the
Machine (JVM) and has full access to the vast ecosystem of Java
                                                                            chosen “Document Form”.
libraries. Since many linked data libraries and tools also target the
                                                                               The core of JSON-LD is the @context map, which can be
JVM, Clojure is a tempting alternative to Java for working with
                                                                            specified inside a JSON record, externally using a link, or provided
linked data. Tawny-OWL is another example of a linked data tool
                                                                            by the consuming application. The context allows for strings to be
written in Clojure (Lord, 2013), however it is focused on ontology
                                                                            interpreted as IRIs, for compact IRI strings to be expanded, and for
development and takes quite a different approach from EDN-LD.
                                                                            types to be attached to literals. Since the context can be supplied
With this project our goal is to provide a lightweight alternative to
                                                                            externally, existing JSON data can be reinterpreted as JSON-LD by
existing linked data tools for many common use cases, much in the
                                                                            providing an appropriate context.
spirit of JSON-LD.3 In this presentation we discuss the motivation
                                                                               JSON-LD is an exciting addition to the ecosystem of linked
and design of EDN-LD, and demonstrate how it can clearly and
                                                                            data tools, but it is constrained by the limitations of the JSON
concisely transform tables into triples.
                                                                            format. The heavy use of strings, in particular, can make it difficult
   EDN-LD is open source software, published under a BSD license.
                                                                            to distinguish between a literal string, a compact IRI, or a fully
The source code is written in a literate style, with extensive unit
                                                                            resolved IRI. The complex context processing8 and expansion
tests. It is available on GitHub4 with a tutorial that also serves
                                                                            algorithms9 are indicative of this problem, as is the need for several
as an automated integration test. Our interactive online tutorial
                                                                            similar-but-different “Document Forms”. EDN-LD uses the richer
can be used without needing to install Clojure.5 Feedback and
                                                                            elements and structures available in EDN to reduce these problems.
contributions are welcome on our GitHub site.

2     JSON-LD                                                               3   EXTENSIBLE DATA NOTATION
EDN-LD shares many of the motivations and goals of JSON-LD.                 Like JSON and JavaScript, Extensible Data Notation (EDN) is the
First we will discuss the benefits and shortcomings of JSON-LD,             a data format at the core of Clojure. The basic EDN elements are:
then show how EDN-LD improves on it in several respects.                    nil, booleans, strings, characters, symbols, keywords, integers, and
   JavaScript Object Notation (JSON)6 is a subset of the JavaScript         floating point numbers. These can be combined into lists, vectors,
programming language that is widely used for expressing literal data        maps, and sets. Any element can serve as the key or value of a map.
within JavaScript programs. JSON’s elements are: null, booleans,            EDN is extensible in the sense that it allows for tagged elements,
strings, and numbers. Elements can be combined into arrays and              indicated by a special tag followed by an EDN element. EDN
objects, where the latter are effectively maps from strings to other        also allows two kinds of comments. Multiple alternatives to strings
values. These simple elements are common to virtually every                 (i.e. keywords and symbols), more carefully defined numbers, sets,
                                                                            and more flexible maps all make it easier to express complex data
∗ To whom correspondence should be addressed: james@overton.ca              efficiently and unambiguously in EDN than in JSON.
1   https://github.com/edn-format/edn
2 http://clojure.org                                                        7 http://www.w3.org/TR/json-ld/
3 http://json-ld.org                                                        8 http://www.w3.org/TR/json-ld-api/
4 https://github.com/ontodev/edn-ld
                                                                            #context-processing-algorithms
5 http://try.edn-ld.com                                                     9 http://www.w3.org/TR/json-ld-api/
6 http://json.org                                                           #expansion-algorithms



    Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                             1
Overton



  EDN does not have a type system and does not include schemas.           prefixes. By using keywords to distinguish contracted IRIs from full
However several schema systems have been created for validating           IRIs and literal data, and consistently using maps for literal data, we
EDN data structures. EDN-LD uses Prismatic’s Schema library10 to          gain more control over the interpretation of strings than JSON-LD,
specify the required “shapes” for various elements.                       without loss of concision.

4    EDN FOR LINKED DATA                                                  (def resources
In EDN-LD as in JSON-LD, IRIs and blank node identifiers are                {"Homer" :Homer})
represented by strings. IRIs can be contracted to keywords using          (def prefixes
a context: a map from keywords to IRIs or other contractions.               (merge
Contractions can be expanded to IRIs using the same context.                  default-prefixes
Literals are always represented as maps with a special :value key             context))
for the lexical value, and optional :type and :lang keys. Discrete        (->> "books.tsv"
triples and quads are represented with vectors. Graphs and datasets            read-tsv
are represented as nested maps from graph IRI to subject IRI to                (map assign-subject-iri)
predicate IRI, ending with a set of objects. These two “document               (mapcat #(triplify resources %))
forms” have very different shapes, suited to different processing              (map #(expand-all context %))
goals, e.g. sequences of triples for streaming and filtering, and              (write-triples "books.tsv"
nested maps for sorting and selecting. EDN-LD uses Apache Jena11                              prefixes))
to read from and write to a wide range of linked data formats.
   Figure 1 shows an EDN-LD context. It includes a :dc prefix                       Fig. 2. An example of an EDN-LD conversion pipeline
for Dublin Core metadata elements, and an :ex prefix for the
example domain. The nil key indicates that its value :ex is the
default prefix. The :title and :author contractions expand
(recursively) to Dublin Core IRIs.                                        5    FUTURE WORK
                                                                          EDN-LD is still in development, but available for use. We plan
(def context                                                              to implement convenient syntax for RDF collections (linked lists),
  {:dc     "http://purl.org/dc/elements/1.1/"                             and for various OWL constructs including annotation axioms
   :ex     "http://example.com/"                                          and class expressions. We are also considering a ClojureScript
   nil     :ex                                                            implementation of EDN-LD. ClojureScript is a language that is
   :title :dc:title                                                       closely related to Clojure, compiling to JavaScript rather than JVM
   :author :dc:author})                                                   bytecode. Dual Clojure and ClojureScript libraries are becoming
(expand context :title)                                                   increasingly common. But a ClojureScript version of EDN-LD
; "http://purl.org/dc/elements/1.1/title"                                 could not use Jena, and would need alternative methods for reading
                                                                          and writing linked data files.
Fig. 1. An example of an EDN-LD context, showing an expand function
  call on the :title contraction, and the expanded IRI that is returned   6    DEMONSTRATION
                                                                          At ICBO we plan to demonstrate the use of EDN-LD for
   Figure 2 shows an example of a simple data conversion pipeline         transforming tables to triples, and for efficiently filtering large
using EDN-LD. First we define a map from names (strings) to               linked data files to specified subsets.
contracted resource IRIs and merge our context with the default
prefixes. The ->> is a “threading macro” that inserts the first value     7    CONCLUSION
as the last argument to the second function, and so on, letting           EDN-LD was developed for the Immune Epitope Database (IEDB),
deeply nested function calls be clearly expressed as “pipelines”.         and was preceded by several related systems for working with linked
Here “books.tsv” is the name of a file in tab-separated values            data and ontologies using Clojure. These techniques have proved
format and read-tsv is a function that returns a sequence                 valuable for rapid development of data processing workflows,
of maps for each row, each with column names as keys. The                 merging disparate sources of biological data. EDN-LD improves on
assign-subject-iri function is called on each of the maps                 JSON-LD in several respects, and is well suited to working with
to add a :subject-iri key with appropriate value. Then                    linked data in Clojure.
triplify is used to convert the maps to triples, represented as
vectors: subject keyword, predicate keyword, and object keyword or        ACKNOWLEDGEMENTS
literal map as determined by the resources map. The keywords              The author was supported in this work by the Immune Epitope
represent contracted IRIs, and the expand-all function converts           Database and Analysis Project, funded by the National Institutes
them to full IRI strings. Finally the write-triples function              of Health [HHSN272201200010C].
writes the results to the “books.ttl” Turtle file using our specified
                                                                          REFERENCES
10 https://github.com/Prismatic/schema                                    Lord, P. (2013). The Semantic Web takes wing: Programming ontologies with Tawny-
11 http://jena.apache.org                                                    OWL. OWLED 2013.



2                           Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes