=Paper= {{Paper |id=None |storemode=property |title=LEAPS: A Semantic Web and Linked Data Framework for the Algal Biomass Domain |pdfUrl=https://ceur-ws.org/Vol-1272/paper_46.pdf |volume=Vol-1272 |dblpUrl=https://dblp.org/rec/conf/semweb/Solanki14 }} ==LEAPS: A Semantic Web and Linked Data Framework for the Algal Biomass Domain== https://ceur-ws.org/Vol-1272/paper_46.pdf
     LEAPS: A Semantic Web and Linked data
     framework for the Algal Biomass Domain

                    Monika Solanki1 and Johannes Skarka2
                              1
                               Aston University, UK
                             m.solanki@aston.ac.uk
               2
                 Karlsruhe Institute of Technology, ITAS, Germany
                           johannes.skarka@kit.edu



      Abstract. In this paper we present, LEAPS , a Semantic Web and
      Linked data framework for searching and visualising datasets from the
      domain of Algal biomass. LEAPS provides tailored interfaces to explore
      algal biomass datasets via REST services and a SPARQL endpoint for
      stakeholders in the domain of algal biomass. The rich suite of datasets
      include data about potential algal biomass cultivation sites, sources of
      CO2 , the pipelines connecting the cultivation sites to the CO2 sources
      and a subset of the biological taxonomy of algae derived from the world’s
      largest online information source on algae.


1   Motivation

Recently the idea that algae biomass based biofuels could serve as an alternative
to fossil fuels has been embraced by councils across the globe. Major companies,
government bodies and dedicated non-profit organisations such as ABO (Algal
Biomass Organisation) 3 and EABA(European Algal Biomass Association)4 have
been pushing the case for research into clean energy sources including algae
biomass based biofuels.
    It is quickly evident that because of extensive research being carried out,
the domain itself is a very rich source of information. Most of the knowledge is
however largely buried in various formats of images, spreadsheets, proprietary
data sources and grey literature that are not readily machine accessible/inter-
pretable. A critical limitation that has been identified is the lack of a knowledge
level infrastructure that is equipped with the capabilities to provide semantic
grounding to the datasets for algal biomass so that they can be interlinked,
shared and reused within the biomass community.
    Integrating algal biomass datasets to enable knowledge representation and
reasoning requires a technology infrastructure based on formalised and shared
vocabularies. In this paper, we present LEAPS 5 , a Semantic Web/Linked data
framework for the representation and visualisation of knowledge in the domain
3
  http://www.algalbiomass.org/
4
  http://www.eaba-association.eu/
5
  http://www.semanticwebservices.org/enalgae
of algal biomass. One of the main goals of LEAPS is to enable the stakeholders
of the algal biomass domain to interactively explore, via linked data, potential
algal sites and sources of their consumables across NUTS (Nomenclature of Units
for Territorial Statistics)6 regions in North-Western Europe.
    Some of the objectives of LEAPS are,

 – motivate the use of Semantic Web technologies and LOD for the algal biomass
   domain.
 – laying out a set of ontological requirements for knowledge representation
   that support the publication of algal biomass data.
 – elaborating on how algal biomass datasets are transformed to their corre-
   sponding RDF model representation.
 – interlinking the generated RDF datasets along spatial dimensions with other
   datasets on the Web of data.
 – visualising the linked datasets via an end user LOD REST Web service.
 – visualising the scientific classification of the algae species as large network
   graphs.


2   LEAPS Datasets

The transformation of the raw datasets to linked data takes place in two steps.
The first part of the data processing and the potential calculation are performed
in a GIS-based model which was developed for this purpose using ArcGIS 7
9.3.1. The second step of lifting the data from XML to RDF is carried out using
a bespoke parser that exploits XPath 8 to selectively query the XML datasets
and generate linked data using the ontologies.
    The transformation process yielded four datasets which were stored in dis-
tributed triple store repositories: Biomass production sites, CO2 sources, pipelines
and region potential. We stored the datasets in separate repositories to simulate
the realistic scenario of these datasets being made available by distinct and ded-
icated dataset providers in the future. While a linked data representation of the
NUTS regions data 9 , was already available there was no SPARQL endpoint or
service to query the dataset for region names. We retrieved the dataset dump and
curated it in our local triple store as a separate repository. The NUTS dataset
was required to link the biomass production sites and the CO2 sources to re-
gions where they would be located and to the dataset about the region potential
of biomass yields. The transformed datasets interlinked resources defining sites,
CO2 sources, pipelines, regions and NUTS data using link predicates defined in
the ontology network.
    Datasets about algae cultivation can become more meaningful and useful to
the biomass community, if they are integrated with datasets about algal strains.
6
  http://bit.ly/I7y5st
7
  http://www.esri.com/software/arcgis/index.html
8
  http://www.w3.org/TR/xpath/
9
  http://nuts.geovocab.org/
This can help the plant operators in taking judicious decisions about which
strain to cultivate at a specific geospatial location. Algaebase10 provides the
largest online database of algae information. While Algaebase does not make
RDF versions of the datasets directly available through its website, they can
be programmatically retrieved via their LSIDs (Life Science Identifiers) from
the LSID Web resolver 11 made available by Biodiversity Information Standards
(TDWG)12 working group.
    We retrieved RDF metadata for 113061 species of algae13 and curated in our
triple store. We then used the Semantic import plugin with Gephi to visualise
the biological taxonomy of the algae species.


3    System Description

LEAPS provides an integrated view over multiple heterogeneous datasets of po-
tential algal sites and sources of their consumables across NUTS regions in North-
Western Europe. Figure 1 illustrates the conceptual architecture of LEAPS . The




                         Fig. 1. Architecture of LEAPS


main components of the application are
10
   http://www.algaebase.org/about/
11
   http://lsid.tdwg.org/
12
   http://www.tdwg.org/
13
   The retrieval algorithm ran on an Ubuntu server for three days
 – Parsing modules: As shown in Figure 1, the parsing modules are responsi-
   ble for lifting the data from their original formats to RDF. The lifting process
   takes place in two stages to ensure uniformity in transformation.
 – Linking engine: The linking engine along with the bespoke XML parser
   is responsible for producing the linked data representation of the datasets.
   The linking engine uses ontologies, dataset specific rules and heuristics to
   generate interlinking between the five datasets. From the LOD cloud, we
   currently provide outgoing links to DBpedia14 and Geonames15 .
 – Triple store: The linked datasets are stored in a triple store. We use
   OWLIM SE 5.0 16 .
 – Web services: Several REST Web services have been implemented to pro-
   vide access to the linked datasets.
 – Ontologies: A suite of OWL ontologies for the algal biomass domain have
   been designed and made available.
 – Interfaces: The Web interface provides an interactive way to explore various
   facets of sites, sources, pipelines, regions, ontolgoies and SPARQL endpoints.
   The map visualisation has been rendered using Google maps. Besides the
   SPARQL endpoint and the interactive Web interface, a REST client has
   been implemented for access to the datasets. Query results are available in
   RDF/XML, JSON, Turtle and XML formats.


4    Application access
LEAPS 17 is available on the Web. The interface currently provides visualisation
and navigation of the algae cultivation datasets in a way most intuitive for the
phycologists. The application has been demonstrated to several stakeholders of
the community at various algae-related workshops and congresses. They have
found the navigation very useful and made suggestions for future dataset ag-
gregation. At the time of this writing, data retrieval is relatively slow for some
queries because of their federated nature, however optimisation work on the
retrieval mechanism is in progress to enable faster retrieval of information.

Acknowledgments
The research described in this paper was partly supported by the Energetic Algae
project (EnAlgae), a 4 year Strategic Initiative of the INTERREG IVB North West
Europe Programme. It was carried out while the first author was a researcher at BCU,
UK.


References
14
   http://dbpedia.org/About
15
   http://sws.geonames.org/
16
   http://www.ontotext.com/owlim/editions
17
   http://www.semanticwebservices.org/enalgae