=Paper=
{{Paper
|id=Vol-1515/demo5
|storemode=property
|title=EDN-LD: A simple linked data tool
|pdfUrl=https://ceur-ws.org/Vol-1515/demo5.pdf
|volume=Vol-1515
|dblpUrl=https://dblp.org/rec/conf/icbo/Overton15
}}
==EDN-LD: A simple linked data tool==
EDN-LD: A simple linked data tool James A. Overton1∗ 1 Knocean, Toronto, Ontario, Canada ABSTRACT programming language, and JSON is now widely used for data EDN-LD is a set of conventions for representing linked data using transfer between programs. It has replaced heavier formats such as Extensible Data Notation (EDN) and a library for conveniently working XML in many applications. with those representations in the Clojure programming language. It JSON has many limitations, including a lack of comments, provides a lightweight alternative to existing linked data tools for many ambiguous numbers, and the lack of any mechanism for extending common use cases, much in the spirit of JSON-LD. We present the its types. In practise, strings are used to represent most types of motivation and design of EDN-LD, and demonstrate how it can clearly data, but since it is difficult to attach type information to aid in their and concisely transform tables into triples. interpretation, this can quickly lead to ambiguity. The ubiquity of JSON was one motivation for the JSON-LD W3C 1 INTRODUCTION Recommendation: “A JSON-based Serialization for Linked Data”.7 In JSON-LD strings are used to represent IRIs (and compact IRIs) EDN-LD is a set of conventions for representing linked data using for resources, plain literals can be strings, and typed literals are Extensible Data Notation (EDN),1 and a library for conveniently objects (maps) with a special @value, @type, and @language working with those representations in the Clojure programming keys. Graphs and datasets are represented as nested objects (maps) language.2 Clojure is a modern Lisp that runs on the Java Virtual and sets are represented by arrays, with details depending on the Machine (JVM) and has full access to the vast ecosystem of Java chosen “Document Form”. libraries. Since many linked data libraries and tools also target the The core of JSON-LD is the @context map, which can be JVM, Clojure is a tempting alternative to Java for working with specified inside a JSON record, externally using a link, or provided linked data. Tawny-OWL is another example of a linked data tool by the consuming application. The context allows for strings to be written in Clojure (Lord, 2013), however it is focused on ontology interpreted as IRIs, for compact IRI strings to be expanded, and for development and takes quite a different approach from EDN-LD. types to be attached to literals. Since the context can be supplied With this project our goal is to provide a lightweight alternative to externally, existing JSON data can be reinterpreted as JSON-LD by existing linked data tools for many common use cases, much in the providing an appropriate context. spirit of JSON-LD.3 In this presentation we discuss the motivation JSON-LD is an exciting addition to the ecosystem of linked and design of EDN-LD, and demonstrate how it can clearly and data tools, but it is constrained by the limitations of the JSON concisely transform tables into triples. format. The heavy use of strings, in particular, can make it difficult EDN-LD is open source software, published under a BSD license. to distinguish between a literal string, a compact IRI, or a fully The source code is written in a literate style, with extensive unit resolved IRI. The complex context processing8 and expansion tests. It is available on GitHub4 with a tutorial that also serves algorithms9 are indicative of this problem, as is the need for several as an automated integration test. Our interactive online tutorial similar-but-different “Document Forms”. EDN-LD uses the richer can be used without needing to install Clojure.5 Feedback and elements and structures available in EDN to reduce these problems. contributions are welcome on our GitHub site. 2 JSON-LD 3 EXTENSIBLE DATA NOTATION EDN-LD shares many of the motivations and goals of JSON-LD. Like JSON and JavaScript, Extensible Data Notation (EDN) is the First we will discuss the benefits and shortcomings of JSON-LD, a data format at the core of Clojure. The basic EDN elements are: then show how EDN-LD improves on it in several respects. nil, booleans, strings, characters, symbols, keywords, integers, and JavaScript Object Notation (JSON)6 is a subset of the JavaScript floating point numbers. These can be combined into lists, vectors, programming language that is widely used for expressing literal data maps, and sets. Any element can serve as the key or value of a map. within JavaScript programs. JSON’s elements are: null, booleans, EDN is extensible in the sense that it allows for tagged elements, strings, and numbers. Elements can be combined into arrays and indicated by a special tag followed by an EDN element. EDN objects, where the latter are effectively maps from strings to other also allows two kinds of comments. Multiple alternatives to strings values. These simple elements are common to virtually every (i.e. keywords and symbols), more carefully defined numbers, sets, and more flexible maps all make it easier to express complex data ∗ To whom correspondence should be addressed: james@overton.ca efficiently and unambiguously in EDN than in JSON. 1 https://github.com/edn-format/edn 2 http://clojure.org 7 http://www.w3.org/TR/json-ld/ 3 http://json-ld.org 8 http://www.w3.org/TR/json-ld-api/ 4 https://github.com/ontodev/edn-ld #context-processing-algorithms 5 http://try.edn-ld.com 9 http://www.w3.org/TR/json-ld-api/ 6 http://json.org #expansion-algorithms Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes 1 Overton EDN does not have a type system and does not include schemas. prefixes. By using keywords to distinguish contracted IRIs from full However several schema systems have been created for validating IRIs and literal data, and consistently using maps for literal data, we EDN data structures. EDN-LD uses Prismatic’s Schema library10 to gain more control over the interpretation of strings than JSON-LD, specify the required “shapes” for various elements. without loss of concision. 4 EDN FOR LINKED DATA (def resources In EDN-LD as in JSON-LD, IRIs and blank node identifiers are {"Homer" :Homer}) represented by strings. IRIs can be contracted to keywords using (def prefixes a context: a map from keywords to IRIs or other contractions. (merge Contractions can be expanded to IRIs using the same context. default-prefixes Literals are always represented as maps with a special :value key context)) for the lexical value, and optional :type and :lang keys. Discrete (->> "books.tsv" triples and quads are represented with vectors. Graphs and datasets read-tsv are represented as nested maps from graph IRI to subject IRI to (map assign-subject-iri) predicate IRI, ending with a set of objects. These two “document (mapcat #(triplify resources %)) forms” have very different shapes, suited to different processing (map #(expand-all context %)) goals, e.g. sequences of triples for streaming and filtering, and (write-triples "books.tsv" nested maps for sorting and selecting. EDN-LD uses Apache Jena11 prefixes)) to read from and write to a wide range of linked data formats. Figure 1 shows an EDN-LD context. It includes a :dc prefix Fig. 2. An example of an EDN-LD conversion pipeline for Dublin Core metadata elements, and an :ex prefix for the example domain. The nil key indicates that its value :ex is the default prefix. The :title and :author contractions expand (recursively) to Dublin Core IRIs. 5 FUTURE WORK EDN-LD is still in development, but available for use. We plan (def context to implement convenient syntax for RDF collections (linked lists), {:dc "http://purl.org/dc/elements/1.1/" and for various OWL constructs including annotation axioms :ex "http://example.com/" and class expressions. We are also considering a ClojureScript nil :ex implementation of EDN-LD. ClojureScript is a language that is :title :dc:title closely related to Clojure, compiling to JavaScript rather than JVM :author :dc:author}) bytecode. Dual Clojure and ClojureScript libraries are becoming (expand context :title) increasingly common. But a ClojureScript version of EDN-LD ; "http://purl.org/dc/elements/1.1/title" could not use Jena, and would need alternative methods for reading and writing linked data files. Fig. 1. An example of an EDN-LD context, showing an expand function call on the :title contraction, and the expanded IRI that is returned 6 DEMONSTRATION At ICBO we plan to demonstrate the use of EDN-LD for Figure 2 shows an example of a simple data conversion pipeline transforming tables to triples, and for efficiently filtering large using EDN-LD. First we define a map from names (strings) to linked data files to specified subsets. contracted resource IRIs and merge our context with the default prefixes. The ->> is a “threading macro” that inserts the first value 7 CONCLUSION as the last argument to the second function, and so on, letting EDN-LD was developed for the Immune Epitope Database (IEDB), deeply nested function calls be clearly expressed as “pipelines”. and was preceded by several related systems for working with linked Here “books.tsv” is the name of a file in tab-separated values data and ontologies using Clojure. These techniques have proved format and read-tsv is a function that returns a sequence valuable for rapid development of data processing workflows, of maps for each row, each with column names as keys. The merging disparate sources of biological data. EDN-LD improves on assign-subject-iri function is called on each of the maps JSON-LD in several respects, and is well suited to working with to add a :subject-iri key with appropriate value. Then linked data in Clojure. triplify is used to convert the maps to triples, represented as vectors: subject keyword, predicate keyword, and object keyword or ACKNOWLEDGEMENTS literal map as determined by the resources map. The keywords The author was supported in this work by the Immune Epitope represent contracted IRIs, and the expand-all function converts Database and Analysis Project, funded by the National Institutes them to full IRI strings. Finally the write-triples function of Health [HHSN272201200010C]. writes the results to the “books.ttl” Turtle file using our specified REFERENCES 10 https://github.com/Prismatic/schema Lord, P. (2013). The Semantic Web takes wing: Programming ontologies with Tawny- 11 http://jena.apache.org OWL. OWLED 2013. 2 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes