Optique 1.0: Semantic Access to Big Data?
The Case of Norwegian Petroleum Directorate’s FactPages

E. Kharlamov1,?? , M. Giese2 , E. Jiménez-Ruiz1 , M. G. Skjæveland2 , A. Soylu2 ,
     D. Zheleznyakov1 , T. Bagosi3 , M. Console5 , P. Haase4 , I. Horrocks1 ,
  S. Marciuska3 , C. Pinkel4 , M. Rodriguez-Muro3 , M. Ruzzi5 , V. Santarelli5 ,
  D. F. Savo5 , K. Sengupta4 , M. Schmidt4 , E. Thorstensen2 , J. Trame4 , and
                                   A. Waaler2
                1
                 University of Oxford, UK; 2 University of Oslo, Norway;
     3
         Free University of Bozen-Bolzano, Italy; 4 fluid Operations AG, Germany;
                           5
                             Sapienza Università di Roma, Italy


         Abstract. The Optique project aims at developing an end-to-end system
         for semantic data access to Big Data in industries such as Statoil ASA
         and Siemens AG. In our demonstration we present the first version of the
         Optique system customised for the Norwegian Petroleum Directorate’s
         FactPages, a publicly available dataset relevant for engineers at Statoil
         ASA. The system provides different options, including visual, to formu-
         late queries over ontologies and to display query answers. Optique 1.0
         offers installation wizards that allow to extract ontologies from rela-
         tional schemata, extract and define mappings connecting ontologies and
         schemata, and align and approximate ontologies. Moreover, the system
         offers highly optimised techniques for query answering.


1    Introduction
Accessing the relevant data in Big Data scenarios is increasingly difficult both
for end-user and IT-experts, due to the volume, variety, velocity, and complexity
dimensions of Big Data. This brings a high cost overhead in data access for large
enterprises. For instance, in the oil and gas industry, engineers spend 30–70% of
their time gathering and assessing the quality of data. The Optique project1 [1,
2] advocates for a next generation of the well known Ontology-Based Data Access
(OBDA) approach to address the data access problem. The project aims at
solutions that reduce the cost of data access dramatically. In our demonstration
we present the first version of the Optique system which we customised for the
Norwegian Petroleum Directorate’s (NPD) FactPages.2
    OBDA systems address the data access problem by presenting a general
ontology-based and end-user oriented query interface over heterogeneous data
sources. The core elements in a classical OBDA systems are an ontology, describing
?
   The research was supported by the FP7 grant Optique (n. 318338).
??
   Corresponding author: evgeny.kharlamov@cs.ox.ac.uk
 1
   http://www.optique-project.eu/
 2
   http://factpages.npd.no
 Presentation                                                                                      Installation Wizards
 Layer                                                                                     Basic                Advanced
    Query Formulation Interface            Visualisation              System Interface                            Import onto.
                                                                                               Import
                                                                                                                   vocabulary
                                                                                              metadata
                                                                                                                  & metadata
 Application
 Layer                               Answer             Ontology                           Automatic extract:   Semi-automat.
                                  Visualisation       Visualisation                            ontology &          extract:
                                                                                            Direct Mappings     R2RML Mapps
                                                                        Ontology and
    Visual Query        SPARQL               Triple Store                 Mapping
                                                                        Management                   Saturate ontology
    Formulation          Editor                                                                       from metadata


                                                Query                                        Add external ontology
     Reasoner                                                              Reasoner
                                              Answering                                           Load              Align
                                                                                                ontology           ontology


 Data Layer
                                               NPD                                                      Approximate
                                                 NPD                  Expert       End                   ontology
                                             FactPages
                                              FactPages               users        users
                                                                                                               out


Fig. 1. Left: General architecture of the Optique 1.0 system; Right: installation process


the application domain, and a set of mappings, relating the ontological terms
with the schemata of the underlying data sources. End-users formulate queries
using the ontological terms and thus they are not required to understand the
structure of the data sources. These queries are then automatically translated
using the ontology and mappings into an executable code over the data sources.
    State of the art OBDA systems, however, have shown among others the
following limitations:
  – The usability of OBDA systems is hampered by the need to use a formal
     query language. Even if the users know the ontological vocabulary, they may
     find difficult to formulate queries with several concepts and relationships.
  – The prerequisites of OBDA, i.e., ontology and mappings, are in practice
     expensive to obtain. Additionally, they are not static artefacts and should
     evolve according to the new end-users’ information requirements.
  – The efficiency of the translation process and the execution of the queries is
     usually not sufficiently addressed in OBDA systems.
The first version of the Optique system, i.e., Optique 1.0, aims at partially
overcoming the above limitations. Demonstration videos are available at following
address: http://www.cs.ox.ac.uk/isg/projects/Optique/demos/iswc2013/.


2       System Overview
A general three-layer architecture of the Optique system is depicted in Fig-
ure 1 (Left). The current version of the system offers two main functionalities:
to query/visualise data and install/maintain the ontology and the mappings. At
the backend, the system also offers an efficient query processing mechanism.
    Optique 1.0 allows to pose queries via a visual query formulation (VQF)
interface, a SPARQL editor, or from a query catalog. VQF exploits reasoning
in order to show both explicit and implicit domain knowledge to guide the
formulation of the query.
    Queries are executed by the Query Answering module based on Ontop system.3
Ontop provides functionalities for rewriting SPARQL queries using the system’s
ontology and mappings, syntactic and semantic query optimisation, and query
unfolding. Thus, high efficiency of query answering is guaranteed. Rewritten and
unfolded queries are in SQL and they are executed over the NPD FactPages data,
which is stored in a relational database. The query answers are converted into
triples in order to confirm the format of the system’s ontology, temporally stored
in the system’s triple store, and displayed to the user in a tabular way or on
maps (using OpenStreetMap).
    The installation and maintenance of the ontology and the mappings is done via
the Ontology and Mapping Management component. Currently, this component
includes two installation wizards: basic and advanced. In Figure 1 (Right) we
depict workflows of the wizards. The basic wizard exploits the relational database
metadata and automatically extracts an initial version of the ontology and direct
mappings4 to the ontology entities. The advanced wizard, unlike the basic one,
requires the user intervention and an ontology vocabulary as input in order to
(manually) create and edit R2RML mappings.5 Both the basic and advanced
wizards provide functionalities to align the bootstrapped ontology with a state of
the art domain ontology and approximate the resulting ontology if it is outside
the desired OWL 2 QL profile.6 Alignment is performed using the ontology
matching system LogMap,7 which has shown to work well in practice and also
includes mapping repair facilities.
   Optique 1.0 is built on top of the Information Workbench8 (IWB), a generic
platform for semantic data management. The IWB provides a shared triple
store for managing the assets of Optique 1.0, such as, ontologies, mappings,
query logs, (excerpts of) query answers, database metadata, etc. The IWB also
provides generic interfaces and APIs for semantic data management, e.g., ontology
processing APIs. In addition to these backend data management capabilities, the
IWB provides a flexible user interface which follows a semantic wiki approach,
based on a rich, extensible pool of widgets for visualisation, interaction, mashup,
and collaboration.
    Finally, Optique 1.0 is customised for the NPD FactPages, which is a public,
freely available dataset created to regulate and overlook the petroleum activities
on the Norwegian Continental Shelf (NCS) and contains information collected
from a wide range of activities on the NCS, e.g., operating companies, fields,
discoveries, facilities, pipelines, and seismic surveys—both historic and current
data. Its data has been converted and published as semantic web data [3], of
which parts have been fed into the Optique 1.0 system.

3
  http://ontop.inf.unibz.it/
4
  http://www.w3.org/TR/rdb-direct-mapping/
5
  http://www.w3.org/2001/sw/rdb2rdf/r2rml/
6
  http://www.w3.org/TR/owl2-profiles/
7
  http://code.google.com/p/logmap-matcher/
8
  http://www.fluidops.com/information-workbench/
           Fig. 2. Optique 1.0 System, visual query formulation component


3    Demonstration Details
During the demonstration we will describe the NPD FactPages and present
functionalities of the Optique 1.0 system, with the focus on the following aspects:
query formulation and execution, and system installation. These aspects will
be illustrated on the NPD FactPages data. For the query formulation we will
stress our visual query formulation tool that currently supports construction of
tree-shaped conjunctive SPARQL queries. The demonstrated queries will be from
the oil industry domain. An example query is: “Find all fields that are operated
by ’Statoil Petroleum AS’ and which have a facility that produces oil”; it can be
seen in the screenshot of the VQF in Figure 2. We will run queries and present
results both in tables and maps, e.g., the location of “Fields” and “Oil facilities”
will be displayed on maps. Regarding the system’s installation, we will present
both basic and advanced wizards and guide through their steps, that is, loading
metadata, extraction of an ontology and mappings, alignment with the domain
ontology, and approximation of the integrated ontology. We will also show how
to edit extracted direct mappings and define new R2RML mappings.

References
1.   M. Giese et al. “Scalable End-user Access to Big Data”. In: Big Data Computing.
     Ed. by R. Akerkar. Chapman and Hall/CRC, 2013.
2.   E. Kharlamov et al. “Optique: Towards OBDA Systems for Industry”. In: ESWC
     postproceedings volume: Best Workshop Papers. 2013.
3.   M. G. Skjæveland, E. H. Lian, and I. Horrocks. “Publishing the Norwegian Petroleum
     Directorate’s FactPages as Semantic Web Data”. In: The Semantic Web – ISWC
     2013. Ed. by H. Alani et al. Vol. 8219. LNCS. 2013.