An architecture for efficient knowledge-driven
        information and data access (abstract)

                   Pablo Rubén Fillottrani1,2 and C. Maria Keet3
 1
      Departamento de Ciencias e Ingenierı́a de la Computación, Universidad Nacional
                  del Sur, Bahı́a Blanca, Argentina, prf@cs.uns.edu.ar
     2
       Comisión de Investigaciones Cientı́ficas, Provincia de Buenos Aires, Argentina
       3
         Department of Computer Science, University of Cape Town, South Africa
                                   mkeet@cs.uct.ac.za

    Advanced information systems require the orchestration of many compo-
nents, including ontologies or knowledge graphs, and efficient data management,
in order to provide a means for better informed decision-making and to keep
up with new requirements in organisational needs. A major question in deliv-
ering such systems, is which components to design and put together to create
a ‘knowledge to data’ pipeline, as each component and process has trade-offs,
such as the computational complexity of the representation and query languages,
which reasoning services that operate under an open or closed world, and main-
tainability. Such a combination of knowledge with data is named with various
terms, including ontology-based data access (OBDA) [4], with (virtual) knowl-
edge graphs gaining in popularity in research and industry (e.g., [12, 15]). OBDA
has become synonymous with the approach of declaring a mapping layer between
knowledge represented in an OWL file and data stored in a relational database
whilst using query rewriting in answering conjunctive queries. That mapping
layer is known to be costly both computationally [7] and in design and main-
tenance [10]. Also, database users may want to retain full SQL expressiveness
when querying data, and remain within the closed world assumption they are
more familiar with cf. the open world assumption in OBDA systems.
    In an attempt to avoid these issues, an “Abstract Relational Model” (ARM)
with special object identifiers and a strict extension to SQL for path queries
(SQLP) has been proposed [3] and experimentally shown to simplify queries
[11]. This approach avoids the costly mapping layer through transformations
and offers more than full SQL, but the ARM is not an ontology. Put differently
this ARM+SQLP falls short of the knowledge layer.
    We aim to address this limitation of the ARM+SQLP option. To this end, we
introduce a new knowledge-to-data architecture, KnowID: Knowledge-driven
Information and Data access. It pulls together both recently proposed compo-
nents of the knowledge layer and, to complete the pipeline, we add novel trans-
formation rules between EER and the ARM, which enhances and adapts rules
from the regular EER-to/from-Relational Model (RM) transformation with im-
portant additional expressiveness. KnowID’s components and the addition to
ARM+SQLP is visualised in Fig. 1. Regarding the four steps in the top block:
1) If the model is not in EER, one can convert it into EER by means of a meta-
model or common core (e.g., [5, 9]); 2) If the EER diagram was not formalised


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0)
                                                      KnowID

      conceptual     1. Conversion to        2. Formalisation                                4. Materialisation
                                                                     3. Classification
      data model    EER (if applicable)       (if applicable)                                  of deductions
   or application
      ontology C
                                                         EER diagram
                     Knowledge and information
                           management                                    transform


                                                                               ARM A                  Query
                                          Data completion                                             request Q
                     ARM+SQLP
                                                                        transform           query formulation
                                                                      transform             in SQLP, assisted
                       SQL Evaluation of                Database
                                                                                                by A or C
                                                                               RM A’
                       result q1 over S+D              schema(s) S
                                                            Data D                       q1 in SQLP


Fig. 1. Extending ARM+SQLP toward KnowID, i.e., adding a knowledge layer to
the architecture, into the Knowledge-driven Information and Data access, KnowID,
architecture.


yet, one of the logic-based reconstructions may be used (e.g., [2, 14]) that, ide-
ally, supports all that KnowID supports as modelling language features: entity
type (weak and strong), n-ary relationship (n ≥ 2), attribute, basic cardinality
constraints (0..n, 0..1, 1, 1..n) and identifier, entity type subsumption, and dis-
jointness and covering constraints; 3) inferences can be computed (e.g., with a
DL reasoner) and undesirable deductions dealt with by the modeller as usual;
and 4) materialising the deductions amounts to modifying the EER diagram by
adding the acceptable deductions to the model, in a similar fashion as used to
be possible in the earlier Protégé tool for OWL [6]. The model resulting from
completing step 4 is the one that will be closed and transformed into an ARM
by means of our proposed set of rules and then used for querying the data.
    KnowID’s functionality is thus similar to OBDA systems [4] and to ‘enhanced’
databases such as OntoMinD [1]: one can reason and pose queries at the knowl-
edge layer—i.e., supporting a user in what to query, without the labour-intensive
discovery of how and where—that will be evaluated over an ‘intelligent’ database
that avails of the formally represented knowledge in the conceptual data model
or ontology. Architecturally, a distinct practical advantage is that it achieves this
through a series of automated transformations that are linear in the model’s size,
rather than (manual or automated) specifications of non-trivial mappings in a
separate mapping layer. Further advantages are closed world assumption com-
monly used in information systems and full SQL augmented with path queries.
The latter has been shown in user experiments to make query formulation faster
with at least the same level of accuracy or fewer errors [11, 8], and discovery
through paths is seen as essential for data integration [13].
    We are currently implementing the EER↔ARM transformation rules as a
first step toward concretely realising KnowID as a usable and scalable software
system.
References
 1. Al-Jadir, L., Parent, C., Spaccapietra, S.: Reasoning with large ontologies stored
    in relational databases: The OntoMinD approach. DKE 69, 1158–1180 (2010)
 2. Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Rea-
    soning over extended ER models. In: Parent, C., Schewe, K.D., Storey, V.C., Thal-
    heim, B. (eds.) Proc. of ER’07. LNCS, vol. 4801, pp. 277–292. Springer (2007),
    auckland, New Zealand, November 5-9, 2007
 3. Borgida, A., Toman, D., Weddell, G.E.: On referring expressions in information
    systems derived from conceptual modelling. In: Proc. of ER’16. LNCS, vol. 9974,
    pp. 183–197. Springer (2016)
 4. Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M.,
    Rodriguez-Muro, M., Xiao, G.: Ontop: Answering SPARQL queries over relational
    databases. Semantic Web Journal 8(3), 471–487 (2017)
 5. Fillottrani, P.R., Keet, C.M.: Evidence-based languages for conceptual data mod-
    elling profiles. In: Morzy, T., et al. (eds.) Proc. of ADBIS’15. LNCS, vol. 9282, pp.
    215–229. Springer (2015), 8-11 Sept, 2015, Poitiers, France
 6. Gennari, J.H., Musen, M.A., Fergerson, R.W., Grosso, W.E., Crubézy, M., Eriks-
    son, H., Noy, N.F., Tu, S.W.: The evolution of Protégé: an environment for
    knowledge-based systems development. International Journal of Human-Computer
    Studies 58(1), 89–123 (2003)
 7. Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Za-
    kharyaschev, M.: The price of query rewriting in ontology-based data access. Artif.
    Intell. 213, 42–59 (2014)
 8. Junkkari, M., Vainio, J., Iltanenan, K., Arvola, P., Kari, H., , Kekäläinen, J.:
    Path expressions in SQL: A user study on query formulation. Journal of Database
    Management 22(3), 22p (2016)
 9. Keet, C.M., Fillottrani, P.R.: An ontology-driven unifying metamodel of UML
    Class Diagrams, EER, and ORM2. DKE 98, 30–53 (2015)
10. Lubyte, L., Tessaris, S.: Automated extraction of ontologies wrapping relational
    data sources. In: Proc of DEXA’09. pp. 128–142. Springer (2009)
11. Ma, W., Keet, C.M., Oldford, W., Toman, D., Weddell, G.: The utility of the
    abstract relational model and attribute paths in sql. In: Faron Zucker, C., Ghidini,
    C., Napoli, A., Toussaint, Y. (eds.) Proc. of EKAW’18. pp. 195–211. Springer
    (2018), 12-16 Nov. 2018, Nancy, France
12. Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., Taylor, J.: Industry-scale
    knowledge graphs: Lessons and challenges. Queue 17(2), 20:48–20:75 (Apr 2019)
13. Stonebraker, M., F., I.I.: Data integration: the current status and the way forward.
    IEEE Data Engineering 41(2), 3–9 (2018)
14. Toman, D., Weddell, G.E.: On adding inverse features to the description logic
    CFD ∀ nc . In: Proc of PRICAI 2014. pp. 587–599 (2014), Gold Coast, QLD, Aus-
    tralia, December 1-5, 2014.
15. Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual knowledge graphs: An
    overview of systems and use cases. Data Intelligence 1, 201–223 (2019)