=Paper= {{Paper |id=Vol-1184/paper7 |storemode=property |title=Programmable Analytics for Linked Open Data |pdfUrl=https://ceur-ws.org/Vol-1184/ldow2014_paper_07.pdf |volume=Vol-1184 |dblpUrl=https://dblp.org/rec/conf/www/HuR14 }} ==Programmable Analytics for Linked Open Data== https://ceur-ws.org/Vol-1184/ldow2014_paper_07.pdf
               Programmable Analytics for Linked Open Data

                    Bo Hu                              Eduarda Mendes                              Emeric Viel
        Fujitsu Laboratories of Europe                    Rodrigues                          Fujitsu Laboratories Ltd
                Middlesex, UK                    Fujitsu Laboratories of Europe                 Kawasaki, Japan
          bo.hu@uk.fujitsu.com                           Middlesex, UK                   emeric.viel@jp.fujitsu.com



ABSTRACT                                                            semantic-rich data accumulated so far have actually driven
LOD initiative has made a major impact on data provision.           away potential users. On the one hand, it adds extra layers
Thus far, more than 800 datasets have been published, con-          of abstraction/conceptualisation to the data, making them
taining tens of billions of RDF triples. The sheer size of          not suitable for toolkits tuned against data represented in
data has not resulted in a significant increase of data con-        tabular format. On the other hand, the sheer volume of
sumption. We contend that a new programming paradigm                data renders many semantic web tools less productive. We
is necessary to simplify LOD data utilisation. This paper           contend that a major obstacle that prevents ordinary users
reports an early phase development towards programmable             from tapping into LOD cloud is the lack of a mechanism
web of LOD data. We propose to tap into a distributed               that allows people to make “sense” out of the overwhelming
computing environment underpinning the popular statistical          amount of data. More specifically, in order to facilitate the
toolkit R. Where possible, native R operators and functions         general uptake of LOD by research communities and practi-
are used in our approach so as to lower the learning curve.         tioners, simply making the data available is not sufficient. It
The crux of our future work lies in the full implementation         is essential to offer, along side the data, a means of utilising
and evaluation.                                                     such resources in such a way that is comprehensible to users
                                                                    with a wide range of backgrounds and potentially limited
                                                                    knowledge of semantic technologies.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous;
                                                                    In this paper, we propose a solution that tightly integrates
D.2.12 [Interoperability]: Data mapping
                                                                    linked data computing with the popular statistic program-
                                                                    ming platform R. This brings together two well established
Keywords                                                            efforts and thus two large user bases: R offers a declara-
Linked Open Data, RDF, R, Programmability                           tive and well formed programming language for mining and
                                                                    analysing LOD datasets while the LOD Cloud paves the way
1.     INTRODUCTION                                                 to instantaneous access to a large amount of structured data
As of mid 2013, totally 870 datasets had been published as          on which existing R functions and packages can be applied.
part of the Linked Open Data (LOD) cloud, exposing nearly
62 billion RDF triples in a computer-readable representation
format1 . These numbers are still rapidly growing largely at-
                                                                    1.1    Programmability of LOD
                                                                    Thus far, data available through LOD Cloud are accessed
tribute to open governmental data initiatives and “online”
                                                                    primarily using SPARQL. Typically, this is conducted by
high-throughput scientific instruments. As greater amounts
                                                                    submitting query scripts to a SPARQL endpoint and based
of data become available through LOD cloud, the expected
                                                                    on the query results, filtering/joining/aggregating (available
virtuous cycle–more data leading to more consumption and
                                                                    from SPARQL 1.1) candidate results either on the server
thus encouraged data publication–has not been clearly wit-
                                                                    side or at the local clients. SPARQL is based on set al-
nessed. On the contrary, it is observed that, in many occa-
                                                                    gebra. This is both an advantage and a disadvantage. It
sions, after the initial spark of interest and test applications,
                                                                    resembles the prevailing SQL for RDB. People familiar with
data use at many linked data hosting sites declined signifi-
                                                                    the latter can, therefore, enjoy a fast learning curve when
cantly [3]. Some critics believed that the massive amounts of
                                                                    making the paradigm shift. On the other hand, SPARQL
1
    http://stats.lod2.eu. Accessed: January 2014.                   is mainly a query language and thus does not stand-out for
                                                                    post-query data processing. In many cases, the results of
                                                                    SPARQL queries are extracted and converted into the na-
                                                                    tive data structures of other programming languages (e.g.
                                                                    Java) for further manipulation.

                                                                    Equipping and/or enhancing LOD with high programmabil-
                                                                    ity beyond SPARQL has been investigated previously. The
                                                                    (dis)similarity between RDF as the underlying data struc-
Copyright held by the author/owner(s)
LDOW2014 April 8, 2014, Seoul, Korea.                               ture of LOD and the general object oriented methodology
                                                                    inspired ActiveRDF [5], where semantic data are exposed
through a declarative script language. Along the same di-        2.   PROGRAMMABLE LOD
rection, RDFReactor [9] packs RDF resources as Java ob-          LOD Cloud provides a framework to access and navigate
jects where instances are objects and properties are accessed    through apparently unrelated data, with conceptual models
through java methods.                                            capable of explicating hidden knowledge. The logic based
                                                                 axioms (underpinning RDF) in many cases are not powerful
Unfortunately, the above integrations have not lowered the       enough to capture all the regularities in the data. We vision
threshold to fully exploring LOD cloud. Among other rea-         that a programming language, aiming to utilise and inter-
sons, the most prominent ones include the follows. It will be    act with LOD cloud (the datasets therein), is preferably to
very difficult for such approaches to deal with missing values   present the following characteristics.
and sparse structures, which abound in uncurated or auto-
matically produced collections. The size and quality of LOD      Native support to LOD data structure. The underlying data
cloud lends itself to statistical data analysis. Performing      structure of LOD is RDF triples which essentially compose
such analysis using SPARQL queries can become cumber-            a directed, labelled graph. SPARQL, the standard RDF
some in many cases, requiring recursive SPARQL queries           querying language, transforms data into tabular form for
and multiple join operations. Moreover, neither SPARQL           better alignment with the RDB conventions. This extra
nor the integrated framework enjoys native support to ma-        formatting layer is not always necessary when the underlying
trix operations and solving linear equations, while such char-   data structure can be accessed with native graph operators.
acteristics become increasingly critical in processing large
amounts of data.                                                 Native support to data analysis. Better data accessibility
                                                                 inherent to LOD presents itself as both an opportunity and
R, as a dynamic and functional language, offers good capac-      a challenge. With better access, an LOD data consumer is
ity to enhance the programmability of LOD and remedy the         exposed to data linked in through semantic relations, most
shortcoming of existing approaches.                              of which he or she may not be aware of. More data is not al-
                                                                 ways necessarily a merit. In this case, the consumer is likely
                                                                 to be overwhelmed by data with different formats and differ-
1.2      Why R?                                                  ent semantics, making analysis struggling. A programming
R is a programming language and a software toolkit for data      platform capable of dynamically handling different format
science. Though not outspoken, R is designed for domain          becomes desirable.
experts instead of conventional software programmers. It
focuses on transactions that are more familiar to the for-       Ready for distributed processing. Applications accessing LOD
mer, e.g. organising data, manipulating spreadsheets and         Cloud can easily be exposed to billions of triples, tanta-
data visualisation. R is open source with over 2,000 pack-       mount to terabyte-grade data transactions. Single machine
ages/libraries for a wide variety of data analytics2 . The       and single threaded statistical offerings will find themselves
most distinctive feature of R is its native support to vector    struggling in such situations. The programming platform
arithmetics. In addition, versatile graphics and data visual-    should offer parallelisation capacity for good scalability.
isation packages as well as easy access to a large number of
specialist machine learning and predictive algorithms make       Inspecting R within the scope of the above requirements, we
R a widely adopted computing environment in scientific com-      can make the following observations. Firstly, R is a func-
munities (c.f. [2]). R is essentially single threaded. Scaling   tional language with lazy evaluation, wherein functions are
R for Big Data analysis can be achieved with RHadoop3 . In       lifted to become first class citizen. Also, R has a dynamic
this paper, we focus on adapting R for LOD data structure.       type system. These fit well with RDF’s idiosyncrasy. Sec-
                                                                 ondly, R is designed for statistical computing. Missing value
Integrating R and LOD has been inspected previously. The         support and sparse matrix handling permeates all R func-
SPARQL R Library [8] aims to expose RDF data and wrap            tions and operations. Finally, though R is single-threaded,
SPARQL endpoints with a black-box style connector library.       for many machine learning tasks it is possible to distribute
Largely in the same vein, the most recent effort, rrdf li-       the underlying R data structures and facilitate process dis-
brary [10], allows loading and updating RDF files through        tribution over a layer of data abstraction.
manually crafted RDF-R mapping. The in-memory RDF
models can then be queried using SPARQL. We see the
following issues with SPARQL-based integration. Firstly,         3.   SYSTEM ARCHITECTURE
SPARQL queries and the target RDF data sets are not              The concept of programmable LOD is experimented on the
transparent to R users, making it difficult to validate and      BigGraph platform, denoted as BGR . BigGraph aims at a
optimise the processes. Arbitrary SPARQL queries can in-         generic distributed graph storage with RESTful interface.
cur global scans that drastically impede the system perfor-      Figure 1 illustrates the main building blocks of BGR . At
mance. Secondly, R environment loses the regulatory control      the top, there is the user interface. An BGR user programs
over SPARQL queries. Such a blindness subjects the system        using R primaries with dedicated functions that facilitate
to safety and security concerns. Finally, domain experts and     the RDF to R data type mapping. BGR programs are sub-
statisticians are required to manually compose the SPARQL        mitted to a master node as the main entry point through
queries. This means learning the fundamentals of RDF and         which the user interacts with the system. The runtime at
a new query language.                                            the master is responsible for the following tasks: 1) inter-
                                                                 preting BGR programs; 2) interacting with the in-memory
2
    http://www.r-project.org. Accessed: January 2014.            graph model for graph transactions; and 3) deciding which
3
    https://github.com/RevolutionAnalytics/RHadoop/wiki          data server/worker it should directly query.
            Program (extended R)                                   3.2     Mapping to the underlying storage
                                                                   In order to accommodate the sheer size of LOD Cloud and
                                                                   leverage parallel data loading, a distributed storage is nec-
                                              In-memory            essary. We opt for an edge-based storage solution that fits
                   Master Node                                     nicely with the principles of a Key-Value Store (KVS) [4].
                                                                   KVS plays a key role in our approach to scale-out RDF
                                                                   graphs. RDF triples are, however, not KVS ready. The first
                     Graph Model                                   and foremost step is therefore to define the key-value tuples
                                                                   that a standard KVS can conveniently consume. In BGR ,
                                                                   different components of a triple are concatenated together
                                                                   and encoded as UUID which is then treated as the key while
           R                       R
         System                  System
                                                                   the value parts of KVS are reserved for other purposes, e.g.
                      ... ...                                      named graph, provenance, and access control.
         Storage                 Storage
         Driver                  Driver                            An RDF triple is indexed three times each. Even though pre-
                                                                   senting a replication factor of at least three, our approach is
                                                                   under the consideration of query performance and fault re-
                                                Physical           covery. Loading RDF data into R variables is normally tak-
                                                Storage            ing the form of localised range queries, fixing either the sub-
            N0        ... ...      Nn                              ject or object of the triples and replacing the rest with wild-
                                                                   cards. For instance graph.find(s, null, null) retrieves all
                                                                   the triples of a resource while graph.find(null, p, o) presents
             Figure 1: System architecture                         an inverse traverse from object o. By replicating triples,
                                                                   data can be sorted according to not only subjects but also
                                                                   predicates and objects. This improves query execution.
The runtime on each data server mainly consists of two key
components: R environment and storage driver. Each lo-             3.3     Loading graph
cal R installation executes statistical analysis directly or ex-   For performance, LOD datasets are treated in the following
poses such analytical capacity through the in-memory graph         ways. For datasets with RESTful API (e.g. DBpedia), the
model. A storage driver is responsible for I/O with the un-        RDF resource to R variable mapping can be realised straight-
derlying storage unit.                                             forwardly. Some datasets expose only SPARQL endpoints.
                                                                   SPARQL queries become necessary with the restriction that
3.1    Mapping RDF resources to R variables                        only local scans (e.g. hs, ∗, ∗i or h∗, ∗, oi) are permitted. Ide-
The fundamental data structure for storing data in R is vec-       ally, results of scan are used to construct local data graph.
tor, where a single integer for example is seen as a vector of     In the long run, on-demand data crawling can maintain lo-
length one. Variations and extensions of vector data type          cal copies of frequently used datasets, helping to ensure data
include matrices, arrays and data frames. Though RDF               quality and manage mappings through local data curation.
graphs can be easily stored as adjacency matrices or adja-
cency lists, we would opt against a full conversion of LOD         3.4     Processing data
cloud, adding extra computing expenses. Rather, a direct           R is inherently a single threaded application, though paral-
one-to-one mapping between RDF resources (being classes            lelisation has been implemented using snow and snowfall
and instances) and R variables can provide a seamless and          packages [6]. The use of LOD Cloud falls into the following
smooth integration while at the same time ensures the in-          categories for which we proposed solutions to achieving good
tegrity of the original data. For instance, an RDF instance        scalability.
becomes an R dataframe consisting of single-element vectors.
Similarly, an RDF class can be assigned to a two dimen-            3.4.1    Bulky processing
sional dataframe with rows corresponding to instances and          This OLAP-like data processing aims to emerge patterns
columns the properties. Instance values can be loaded ei-          (such as hidden semantic relationships and semantic data
ther column wise or row wise depending on the analytical           clusters) out of data held in LOD Cloud. Such a process nor-
and performance requirements. In the following example,            mally is performed on preloaded data and is not time criti-
column-based initialisation is conducted.                          cal. While a plethora of R packages can be leveraged for data
                                                                   mining, the main difficulty lies in populating R dataframes
                                                                   with LOD data that can facilitate R functions. By encoding
> s <- data.frame(name=av, age=bv, email=cv)                       each RDF resource as one R variable, it is easy to construct
> s
                                                                   matrices that fit with special purposes. For many predictive
     age    name       homepage
P1   5      foo        foo@bar.com                                 machine learning tasks, voting based aggregation (e.g. bag-
P2   6      john       john@bar.com                                ging [1]) can distribute the overall learning tasks to carefully
...                                                                sampled subsets of the target datasets. This can be easily
                                                                   achieved and managed by traversing the graph to the se-
                                                                   lected subsects of concept instances.
Note that in this example, a class resource is extensionally
represented by the set of its instances at the snapshot of         Example. Given a dataset with patients data, the follow-
data loading.                                                      ing code fragment splits the set of patient instances into 10
subsets4 . Traversal with named vertices and edges can be        in the previous section) to data in an incremental fashion.
carried out along both inbound and outbound directions.          This incremental characteristic is two-fold. Firstly, the sys-
                                                                 tem should detect the difference between existing classified
                                                                 data and inputs so as to isolate the changes and restrain re-
1: patient_v <- graph_get_vertex("Patient")                      classification only against the differences. Secondly, the sys-
2: all_patients <- graph_get(patient_v, edge="rdf:type")
3: for(i in 1:10) {
                                                                 tem should update only those classifiers whose input data
4:   vname <- paste("p_set", i, sep="");                         have changed since the most recent retraining. BGR ac-
5:   assign(vname,                                               commodates both requirements through distributed logging
        sample(all_patients,length(all_patients)/10))            of graph structural changes and localised event propagation
6:   saveRDS(vname, file="...")                                  observing graph structures. For instance, “OutEdgeCreat-
7: }                                                             edEvent” is issued by the storage listener if an edge is in-
                                                                 serted. This event instance carries information such as the
                                                                 edge (in triple form) and on which vertex (vs ) this edge is
Here, we assume the entire set of patient instances will be
                                                                 created. Events propagate along paths that originate from
loaded into memory. Alternatively, a partial loading can be
                                                                 vs to avoid global scans. As a result, affected classifiers along
executed to lower the demand for computing resources and
                                                                 the propagation routes are scheduled for update. Note that
latency. In the following example, edges of patient resource
                                                                 some machine learning algorithms can be easily adapted to
is indexed. Sampling is conducted against the index. Only
                                                                 fulfill the requirements (c.f. random forest [?]).
selected instances are loaded.
                                                                 Versioning resources. An RDF resource normally consists
1: patient_v <- graph_get_vertex("Patient")                      of multiple triples jointly stating the constrains on the re-
2: patient_size <- graph_get_edge_count(patient_v,               source. Therefore, the event-driven incremental processing,
                       edge="rdf:type")                          which only has visibility of individual triples, requires a
3: patient_index <- graph_edge_index(patient_v,                  mechanism to obtain complete statements of the resource.
                        edge="rdf:type")
                                                                 We use versioning to ensure consistency when data are clas-
4: n <- c(1:patient_size)
5: ns <- sample(n, size/10)                                      sified and when classifiers are retrained. Version information
6: for (i in ns) {                                               is stored at the value part of the key-value tuples and version
7:   ins<-graph_traverse(patient_v,edge=index[i])                updates are treated as atomic operations.
8:   saveRDS(ins, file="...")
9: }                                                             Multiple threads. Multi-threaded R is not likely to be avail-
                                                                 able in the near future. As spawning threads is not possible,
The following code fragment intends to construct a random-       BGR runs multiple processes communicating through socket.
forest-based prognosis model (line 11) for a certain disease     For instance, one R process listens to the underlying storage
based on a patient’s gender and age. The patient data are        driver for fetching graph structural events through a dedi-
loaded with a graph traverse transaction over the given pa-      cate socket address. The events are then parsed to extract
tient instance vertices and the given outgoing edges (line 3-    event types, triples that raise the events, and versions of the
6, where wildcard indicates all the outgoing edges). Missing     triples. Other R processes handle the events and dispatch
values are set to a default one (i.e. age = 75) for simplicity   them for further actions when necessary, again by writing
(line 9).                                                        to a socket address. Socket-based communication may not
                                                                 provide ideal performance; in many cases it becomes the
                                                                 main bottle neck of performance. It, however, offers the
 1: p_partition<-readRDS(file="...")
 2: patients <- data.frame()                                     most cost-effective solution to increase parallelism without
 3: for(i in p_partition) {                                      dismantling R.
 4:   p_data <- graph_traverse(vertex=i, out_edge="*");
 5:   patients <-rbind(patients, p_data)                         3.5    Resource local processors
 6: }
 7: size <- length(patients)                                     We advocate and practice a declarative and resource-centric
 8: training_set <- data.frame(                                  approach in BGR . More specifically, expected analytics are
                        age=patients$has_age,                    constructed at the resource level and are associated with
                        gender=patients$has_gender, ...)         the target resource through RDF property declarations. For
 9: training_set$age[is.na(training_set$age)] <- 75              instance the following RDF triples assign an R random-forest
10: labels <- as.factor(patients$status);
                                                                 classifier (defined in section 3.4.1) to a resource (i.e. the
11: rfp <- randomForest(training_set, labels)
                                                                 “Patient” class).
In this example, we assume that the patient data partitions
are passed using data file residing on the disk (line 1). This   :Patient a owl:Class ;
is for illustrative purposes only and does not exclude shared      rdfs:subClassOf
memory or message passing based solutions.                           [ a owl:Restriction ;
                                                                          owl:onProperty :has_behaviour ;
                                                                          owl:someValuesFrom
3.4.2    Incremental processing                                             [ a owl:Class ;
OLTP-like realtime data processing is supported through an                       owl:oneOf (:new_patient_behaviour
event-driven mechanism that applies classifiers (obtained as                                :update_patient_behaviour)]].
                                                                 ...
4
  Based on the literature, bagging should take a fraction be-    :new_patient_behaviour
tween 1/2 to 1/50 depending on the size of the sample data.        a            :Behaviour , owl:NamedIndividual ;
    :event       :onNewInstanceAdded ;                           6:   .jcall(g.obj, "S", "find", x, y)
    :has_handler "R:rfp" .                                       7: }

This essentially defines how a resource (e.g. Patient) reacts
to (or behaves against) events (e.g. onNewInstanceAsserted       We intend to minimise the effort of extending R, i.e. avoid-
event), realised using the attached process (e.g. R:rfp). At     ing introducing compiled R packages. This is under mainly
the ontology class level, enumeration (owl:oneOf) is used        practical considerations. It lowers the learning curves for
to establish conceptual relationship between the Patient         people already familiar with R, as basically no extra opera-
class and the desired functionalities w.r.t. the correspond-     tors need to learn. Also, it increases the visibility of data
ing events. The actual implementation of behaviour in-           management with respect to the underlying data structure.
stances can be realised, for example, in R. Depending on
the size of the compiled code, the implementation can be         5.   CONCLUSIONS
stored either entirely at the value part of the KV tuple         This paper calls for user-friendly and programmable LOD
of h:new_patient_behaviour, :has_handler, "R:rft"i or            by leveraging and enhancing R, a free software toolkit for
separately with a pointer from the value part of the tuple.      statistical computing and graphics.
When a new patient instance is asserted, an event is raised
which will trigger the embedded R function to react to such      Note that there are a few R packages (e.g. bigmemoRy) that
a change in the storage.                                         aims in particular at Big Data computing. There are also R
                                                                 packages (e.g. foreach, ff, etc.) for strengthening R par-
Several advantages are evident by assigning behaviour and        allelism. Our proposal is not to compete with such existing
storing its implementation close to a resource. Firstly, for     solutions but to advocate a collaboration of two indepen-
a distributed data storage, this implies a close proximity of    dent efforts and provide solutions that fit the visions and
data and process localities. Secondly, behaviour enhances        requirements of linked data paradigm.
the reactive programming principle by packing small pro-
cess units against very specific data units. Thirdly, data be-   We also do not see competition with the RESTful movement,
haviours and their implementations are conceptualised with       such as Linked Data Platform (LDP, [?]) which already
well-formed RDFS constructs. This facilitates ontological        gained momentum in the LOD community. LDP works at a
inferences when necessary, though with caveats: i) increased     layer lower than the proposed LOD/R integration, assisting
inference complexity and ii) anonymous resources complicat-      data exposure so that the data can be consumed by the BGR
ing RDF query handling.                                          functions and operators.

4.        PRELIMINARY RESULTS                                    6.   REFERENCES
      R
BG is still under development. This section reports the           [1] E. Bauer and R. Kohavi. An empirical comparison of
system design that has been considered so far and lists out           voting classification algorithms: Bagging, boosting,
potential future work.                                                and variants. Machine Learning, 36(1-2):105–139, July
                                                                      1999.
The underlying graph storage is a distributed KVS based on        [2] B. Everitt and T. Hothorn. A handbook of statistical
HBase. HBase also handles data partition, locality, replica-          analyses using R. CRC Press, Boca Raton, Fla, 2010.
tion and fault tolerance. Jena graph introduces the neces-        [3] N. C. Helbig, A. M. Cresswell, B. Burke, and
sary abstraction layer for indexing and retrieving triples in         L. Luna-Reyes. The dynamics of opening government
the KVS. A simple graph programming interface is respon-              data. Technical report, Nov. 2012.
sible for graph traversal and scan operations. It follows the     [4] A. Lakshman and P. Malik. Cassandra: a
Tinkerpop Blueprint convention5 and currently talks to Jena           decentralized structured storage system. SIGOPS
graph so as to construct resource subgraph from the edge              Oper. Syst. Rev., 44(2):35–40, Apr. 2010.
based storage data structure. The use of Jena is mainly for       [5] E. Oren, B. Heitmann, and S. Decker. Activerdf:
the convenience of leveraging Jena models when in-memory              Embedding semantic web data into object-oriented
ontology inference becomes necessary. In the future, direct           languages. Web Semant., 6(3):191–202, Sept. 2008.
communication between storage and graph API is expected           [6] L. Tierney, A. J. Rossini, and N. Li. Snow : A parallel
to improve the overall system performance. This is at the             computing framework for the r system. International
price of reduced ontological inference capacity.                      Journal of Parallel Programming, 37(1):78–90, 2009.
                                                                  [7] S. Urbanek. rJava: Low-Level R to Java Interface,
Both storage and graph modules are implemented in Java.               2009. R package version 0.8-1.
R communicates with the storage driver through an R-Java
                                                                  [8] W. R. van Hage and T. Kauppinen. SPARQL package
interfacing library, rJava package [7]. Calling Java methods
                                                                      for R, 2011. available at http:
are straightforward as illustrated in the following example:
                                                                      //linkedscience.org/tools/sparql-package-for-r.
                                                                  [9] M. Völkel. Rdfreactor – from ontologies to
1: .jinit()                                                           programatic data access. In Proc. of the Jena User
2: # do something before loading the graph                            Conference 2006. HP Bristol, May 2006.
3: g.obj<- .jnew("Graph")                                        [10] E. Willighagen. Accessing biological data with
4: # do something else                                                semantic web technologies.
5: graph.find <- function(x, y) {                                     http://dx.doi.org/10.7287/peerj.preprints.185v1, 2013.
5
    https://github.com/tinkerpop/blueprints/wiki