Ontology Based Data Access: Where do the Ontologies
             and Mappings come from?

                                   Juan F. Sequeda

                                       Capsenta
                                  juan@capsenta.com


1   Introduction
We are experiencing an increase of Ontology Based Data Access (OBDA) systems be-
ing deployed in industrial applications. In the OBDA paradigm, the ontology provides
a logical abstraction, independent of how and where the data is physically stored. The
ontology serves as a business view, using business terminology, which is then connected
to data sources. Thus, providing a foundation for comfortable communication between
business users and IT developers.
     Even though OBDA has been widely researched theoretically, there is still need
to understand how to effectively implement OBDA systems in practice. Our focus is
in Business Intelligence (BI) reporting. The common definition of OBDA states that
given a source relational database, a target ontology and a mapping from the relational
database to the ontology, the goal is to answer queries over the target ontology using
these three daysonents. From a practical point of view, this begs the question: where
does the target ontology and the mappings come from?
Ontology Challenges Ontology engineering is a challenge by itself. In order to cre-
ate the target ontology, users can follow traditional ontology engineering methodolo-
gies [2, 10], using competency questions [1, 5], test driven development [4], ontology
design patterns [3], etc. Additionally, per standard practices, it is recommended to
reuse and extend existing ontologies in domains of interest such as Good Relations
for e-commerce [13], FIBO for finance [14], Gist for general business concepts [14],
Schema.org [16], etc. In OBDA, the challenge increases because the source database
schemas can be considered as additional inputs to the ontology engineering process.
Common enterprise application’s database schema commonly consist of thousands of
tables and tens of thousands of attributes. A common approach is to bootstrap ontolo-
gies derived from the source database schemas, known also putative ontologies[6, 7].
The putative ontologies can gradually be transformed into target ontologies, using ex-
isting ontology engineering methodologies.
Mapping Challenges Once the Target ontology has been created, the source databases
can be mapped. The W3C Direct Mapping standard can be used to bootstrap mappings
[11]. The declarative nature of W3C R2RML mapping language[12] enables users to
state which elements from the source database are connected to the target ontology, in-
stead of writing procedural code. Given that source database schemas are very large, the
OBDA mapping challenge is suggestive of an ontology matching problem: the putative
ontology of the source database and the target ontology. In addition to 1-1 correspon-
dences between classes and properties, mappings can be complex involving calculations
and rules that are part of business logic. For example, the notion of net sales of an order
is defined as gross sales minus taxes, discounts given, etc. The discount can be different
depending on the type of user. Therefore, a business user needs to provide these defini-
tions before hand. That is why it is hard to automate this process. Another challenge is
to create tools that can create and manage mappings [9].
    Addressing these challenges is crucial for the success of OBDA in practice. To
answer the main question of this paper: Where do the Ontologies and Mappings come
from? Our answer: from the business questions.


2   Pay-as-you-go Methodology for OBDA
We recently introduced a methodology to create the target ontology and mappings for
an OBDA system, driven by a prioritized list of business questions[8]. The objective
is to create a target ontology and mappings, that enable answers to list of business
questions, in an incremental manner. After a minimal set of business questions have
been successfully modeled, mapped, answered and made into dashboards, then the set
of business questions can be extended. The new questions, in turn, may extend the target
ontology and new mappings incrementally added. With this methodology, the target
ontology and mappings are developed in an iterative pay-as-you-go approach. Thus,
providing an agile methodology for BI using the OBDA paradigm because the focus
is to provide early and continuous delivery of answers to the business users. Figure 1
provides an overview of the methodology. We refer the reader to [8] for more details.


                     Fig. 1. The Pay-as-you-go Methodology for OBDA


    However, this leads to another question: where do the business questions come
from? We observe three common sources: 1) Questions coming directly from business
users (e.g. What is the net sales of orders group by country and month) 2) Business
application systems returning fixed reports which can not be customized or extended,
3) SQL queries that have been customized to answer specific questions. We focus on
the latter: business questions coming from SQL queries used to generate BI reports.
3   Generating Ontologies and Mappings from SQL queries

A common scenario is the following: SQL queries are initially created by developers
who are knowledgeable of the large database schema. Developers come and go within
an organization. Queries get shared, altered, extended and combined. After time, users
are executing SQL queries without any understanding of what the queries actually do.
Users rely on a description of what the SQL query is suppose to be returning.
    Our position is that we should be able to extract valuable information from a SQL
query which is being used for BI. Specifically, we observe that it is possible to generate
an Ontology and Mapping from a query. This Ontology and Mapping is the starting
point to implement an OBDA system for BI. Consider the following SQL query which
is used to return the net sales of all orders.

SELECT o.orderid, o.orderdate, o.ordertotal - ot.finaltax -
  CASE WHEN o.currencyid in (‘USD’, ‘CAD’) THEN o.shippingcost
       ELSE o.shippingcost - ot.shippingtax END AS netsales,
  o.currencyid
FROM order o, ordertax ot
WHERE o.orderid = ordertax.orderid AND o.statusid NOT IN (4, 5)

Just by analyzing the SQL query alone, without any other additional resource, we can
extrapolate the following:
Relational Schema
order(orderid, orderdate, ordertotal, currencyid, shippingcost, statusid, ...)
ordertax(orderid,finaltax,shippingtax, ...)
order.orderid is a Primary Key
ordertax.orderid is a Primary Key
ordertax.orderid is a Foreign Key referencing order.orderid
order.statusid could be a Foreign Key referencing a table containing status codes

Instances
order.currencyid = ‘USD’, ‘CAD’
order.statusid = 4, 5

Calculations
orderShippingCost = IF (currency = USD or CAD) THEN (RETURN shippingcost)
ELSE (RETURN shippingcost - shippingtax)
netsales = ordertotal - final tax - orderShippingCost

    We define the Relational Schema, Instances and Calculations derived from a query
as the components of a query. The goal is to generate an ontology and mappings from
the components of a query.
    First we specify a conceptualization: Orders, Currency, Order Status, order date,
order net sales, order total, order final tax, order shipping cost, order shipping tax. Fur-
thermore, the database can be investigated to identify other valuable information. For
example, after further investigation and discussions we learn that order status 1-3 are
active and 4-5 are inactive. We now have the enough elements to create an initial on-
tology. Focusing on the elements in the components of the query, we avoid the effort
of creating a mapping from an unknown large database schema to the target ontology.
The mapping problem has been reduced to a clear and well understood subset of the
database schema. The relationship between the SQL query and the ontological element
represents a mapping between the source database and the evolving target ontology at
the most granular level. Figure 2 shows an example ontology and mapping derived from
components of the example SQL query:


                    Fig. 2. Generate Ontology from the AAA of a query


4   Conclusion
Ontologies and mappings for OBDA can be generated from business questions. Our
focus is on business questions coming from SQL queries used to generate existing BI
reports. The first step is to extrapolate the components of a query: relational schema,
instances and calculations. The next step is to generate an ontology and mappings from
the components of a query.
    This approach is currently being successfully deployed with Capsenta’s customers.
The results are early and continuous delivery of answers to the business users. This has
not been achieved before with traditional BI methodologies.
    This is just the beginning. To the best of our knowledge, the engineering of ontology
and mappings for OBDA is still open grounds for research. In our current work, we are
in the process of formalizing this approach in order to fully understand what should
form the components of a query. There are several challenges going forward, such as:
Automation: Given a SQL query, how can we automatically generate the components
of a query? Given a components of a query, how can we automatically generate and
OWL ontology and R2RML mappings? Iteration: Manage new business questions that
extend the ontology and mappings. What happens if a new query contradicts the current
ontology and/or mappings, hence it is non-monotonic? Tools: There is a need for tools
that can manage large database schemas at scale.
References
 1. Kamal Azzaoui et al. Scientific competency questions as the basis for semantically en-
    riched open pharmacological space development. Drug Discovery Today 18(17-18): 843-852
    (2013)
 2. Oscar Corcho, Mariano Fernndez-Lpez, Asuncin Gmez-Prez. Methodologies, tools and lan-
    guages for building ontologies: Where is their meeting point? Data Knowl. Eng. 46(1): 41-64
    (2003)
 3. Pascal Hitzler, Aldo Gangemi, Krzysztof Janowicz, Adila Krisnadhi, Valentina Presutti
    (eds.), Ontology Engineering with Ontology Design Patterns: Foundations and Applications.
    Studies on the Semantic Web 25, IOS Press/AKA, 2016.
 4. C. Maria Keet, Agnieszka Lawrynowicz: Test-Driven Development of Ontologies. ESWC
    2016
 5. Yuan Ren, Artemis Parvizi, Chris Mellish, Jeff Z. Pan, Kees van Deemter, Robert Stevens.
    Towards Competency Question-Driven Ontology Authoring. ESWC 2014
 6. Juan Sequeda, Marcelo Arenas, Daniel P. Miranker. On directly mapping relational databases
    to RDF and OWL. WWW 2012
 7. Juan F. Sequeda, Syed Hamid Tirmizi, Oscar Corcho, Daniel P. Miranker. Survey of di-
    rectly mapping SQL databases to the Semantic Web. Knowledge Eng. Review 26(4): 445-
    486 (2011)
 8. Juan F. Sequeda, Daniel P. Miranker.: A Pay-As-You-Go Methodology for Ontology-Based
    Data Access. IEEE Internet Computing 21(2): 92-96 (2017)
 9. Juan F. Sequeda, Daniel P. Miranker. Ultrawrap Mapper: A Semi-Automatic Relational
    Database to RDF (RDB2RDF) Mapping Tool. ISWC Posters & Demos 2015
10. Mike Uschold, Michael Gruninger. Ontologies: principles, methods and applications.
    Knowledge Eng. Review 11(2): 93-136 (1996)
11. A Direct Mapping of Relational Data to RDF, https://www.w3.org/TR/
    rdb-direct-mapping/
12. R2RML: RDB to RDF Mapping Language https://www.w3.org/TR/r2rml/
13. Good Relations http://www.heppnetz.de/projects/goodrelations/
14. Finance      Industry    Business      Ontology      http://www.edmcouncil.org/
    financialbusiness
15. Gist https://semanticarts.com/gist/
16. Schema.org http://schema.org/