=Paper=
{{Paper
|id=Vol-3409/paper11
|storemode=property
|title=Model-Independent Design of Knowledge Graphs (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3409/paper11.pdf
|volume=Vol-3409
|authors=Luigi Bellomarini,Andrea Gentili,Eleonora Laurenza,Emanuel Sallinger
|dblpUrl=https://dblp.org/rec/conf/amw/BellomariniGLS23
}}
==Model-Independent Design of Knowledge Graphs (short paper)==
Model-Independent Design of Knowledge Graphs
Luigi Bellomarini1 , Andrea Gentili1 , Eleonora Laurenza1 and Emanuel Sallinger2,3
1
  Bank of Italy (Italy)
2
  Technische Universität Wien (Austria)
3
  University of Oxford (UK)
                                         Abstract
                                         Knowledge Graphs (KGs) can be seen as knowledge bases combining an extensional component, a
                                         database of facts, typically a property graph, and an intensional component, a formal specification of the
                                         available business experience, to derive new knowledge from those facts, often as new nodes and edges.
                                             Capitalizing on our experience in KGs and model management for the rollout of financial KGs for
                                         the Central Bank of Italy, in this work we present KGModel, a model-independent design framework for
                                         KGs. The framework adopts a meta-level approach: the data engineer visually designs the extensional
                                         component of the KG at a conceptual level and augments it with intensional specifications in MetaLog, a
                                         new logical model-independent language. This high-level specification of the KG is then translated into
                                         enforceable schema definitions for the target database and executable logical rules for a target reasoner.
                                             Our framework offers (i) a model-independent visual modeling language; (ii) MetaLog, a new language
                                         of the Datalog+/- family for the intensional component; (iii) new complementary software tools for the
                                         translation of meta-level specifications into their executable versions. We present the main ideas behind
                                         KGModel and show the suitability of the framework for real-world scenarios.
                                               This work is a short version of an EDBT 2022 paper.
                                         Keywords
                                         knowledge graphs, Datalog, conceptual design, data modeling, schema and data translation
1. Introduction
Knowledge Graphs (KGs) can be seen as models for knowledge representation and reasoning,
that combine an extensional component—a database of facts, typically a property graph (PG) [1]—
and an intensional component, a formal specification of business experience, to derive new
knowledge from those facts, often as new nodes and edges [2, 3]. Capitalizing on our experience
in the construction of large KGs, especially in the financial and economic realms [4, 5, 6, 7], we
observe that the need for a KG design methodology is clearly emerging.
   Such a methodology: 1 Should provide conceptual data models, enabling a simple, non-
technical, high-level, visual representation of the domain. 2 Should be implementation-independent,
i.e., it should be possible to deploy the extensional component into any Graph Database Man-
agement System, relational, triple-store system, etc., and it should be possible to express the
intensional components regardless of the target systems. 3 Should provide a set of constructs
15th Alberto Mendelzon International Workshop on Foundations of Data Management, May 22–26, Santiago, Chile
$ luigi.bellomarini@bancaditalia.it (L. Bellomarini); andrea.gentili@bancaditalia.it (A. Gentili);
eleonora.laurenza@bancaditalia.it (E. Laurenza); sallinger@dbai.tuwien.ac.at (E. Sallinger)
 0000-0001-6863-0162 (L. Bellomarini); 0000-0001-7441-129X (E. Sallinger)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: The KGModel stack (on the left) and the super-model dictionary (right).
to define graph schemas. 4 Should allow encoding the intensional component specification with
a reasoning language that is graph-ergonomic and expressive enough to handle KGs. Reflecting
the UC2RPQs literature stream [8], navigational expressions should be intuitively supported
by the syntax. Reasoning to the extent of tractable description logic should be feasible (e.g.,
DL-Lite𝑅 ). In other terms, the language should be expressive enough to cover any SPARQL
query over RDF datasets, under the entailment regime of OWL 2 QL [9]. 5 Should adopt a
model-driven approach [10]: an enforceable graph schema and an executable version of the
intensional components should directly derive from a high-level conceptual representation of
the domain given by the data engineer.
   To the best of our knowledge, there is no comprehensive methodology for KG design and
none of the existing approaches satisfies the illustrated desiderata.
Contribution. In this short version of a recent EDBT paper [11], we present KGModel, a
model-independent framework for Knowledge Graphs design, comprising a methodology and a set
of support tools. The framework is described in Section 2 and some insights on the designer’s
perspective are given in Section 3. For space reasons, for a detailed presentation, including a
discussion of all the KGModel components, the patterns of the design methodology, and the
analysis of related literature, the reader is referred to the long version of the paper.
2. The KGModel Framework
KGModel adopts a layered approach to data representation, in Figure 1 (left-hand side). It is
organized into three stacks of representations: model, schema, and instance, where each level
contains a set of constructs that specialize (or are specialized by) the constructs of the level
above (below). The instance stack instantiates the schema stack, which instantiates the model.
  In the model stack, we adopt the idea of a super-model grouping the super-constructs that
can be used to define different Knowledge Graph models, which are all specializations of the
super-model. Examples are the models of Neo4J PG, Amazon Neptune, OrientDB, or even
non-graph-like models. At the highest level, a meta-model contains the foundational meta-
constructs, namely, MM_Entity (an abstract entity of the domain), MM_Link (a connection
between entities), as well as their properties. The super-model, visualized in Figure 1, contains
super-constructs that specialize those of the meta-model and subsumes, i.e., generalizes, any
possible KG model. Examples of super-constructs are SM_Node, SM_Edge. The various models
comprise constructs that specialize the super-constructs for a specific use. Example of PG
constructs are Node, Relationship, and Label, instantiating SM_Node, SM_Relationship,
and SM_Type.
   In the schema stack, schemas capture the type of specific nodes, edges, and properties of a
given domain of interest, in the same sense that a relational database schema is an instance of
the relational model. As the super-model generalizes every model, a schema can be expressed
in either a model-dependent way, as an instance of a model, or in a model-independent way, as
an instance of the super-model, in which case we call it super-schema. A super-schema 𝑆1 can
be cast into a schema 𝑆2 of a model 𝑀 by a specific set of translation rules, namely, mappings,
which apply the needed simplifications, when the case eliminating constructs of the super-model
that are not supported by the specific target model, and finally instantiate the super-constructs
into 𝑀 constructs, accordingly. We define the mappings as MetaLog rules. MetaLog is
our new variant of the Vadalog language [12] for graphs. Vadalog extends Datalog with
existential quantification and other useful features, while introducing mild syntactic restrictions
to guarantee decidability and tractability of the reasoning task. Vadalog reasoning programs
can be processed by the Vadalog System, a state-of-the-art reasoner. MetaLog inherits the
Vadalog (and thus Datalog) semantics and expressive power, enriching its syntax with the
possibility to use pattern-matching graph exploration primitives. It is model-independent, as it
operates at meta-level, it is expressive and efficient enough to support ontological reasoning, it is
model aware and ergonomic, as incorporates syntactic elements to exploit schema information.
   The instance stack represents the extensional component, i.e., both the ground data and
those derived by materializing the intensional component.
The Design Approach. With KGModel, we offer the data engineer a model-driven design
approach to KGs. The data engineer is provided with a conceptual visual modeling language,
named Graph Schema Language (GSL) to design a graph schema as a super-schema. A GSL
diagram defines an instance of the super-model, with visual graphemes denoting instances of the
super-constructs. The business knowledge is encoded by the data engineer in the intensional
component, with MetaLog programs acting on the super-model constructs.
   To deploy the designed schemas into the target systems, KGModel translates the super-
schemas provided by the engineer into instances of the target models by applying the translation
mappings. Schemas then contain all the information needed to be deployed and enforced, with
different methods, depending on the target systems: for relational systems, for instance, they
can be rendered as DDL statements, which include the respective constraints such as keys,
foreign keys, domain constraints, and so on; for RDF stores, schemas can be rendered as RDF-S
(RDF Schema) documents, to be validated by dedicated tools; for schema-less systems, schemas
can be enforced with ad-hoc methodologies [13].
The Tools. Our framework incorporates (a) Graph Dictionaries: A set of graph databases to store
the instances of the super-model and of the models. (b) Knowledge Graph Schema Environment:
A tool to graphically design GSL schemas and store them in the super-model dictionary. (c)
MetaLog to Vadalog Translator (MTV): A compiler to generate Vadalog programs from
MetaLog code. (d) Super-Schema to Schema Translator (SSST): A module that takes as input a
super-schema 𝑆, a super-model-level intensional component Σ expressed as MetaLog rules, a
MetaLog mapping ℳ(𝑀 ) for the translation of a super-schema into a schema of the target
model 𝑀 , and generates: (i) the instance 𝑆 ′ of 𝑀 , i.e., the target schema; (ii) a new version of
the intensional component that can be applied to 𝑆 ′ instances. SSST uses MTV to compile and
run MetaLog.
The Language. MetaLog combines Warded Datalog± [14], at the core of Vadalog, and graph
pattern matching. A MetaLog program is a set Σ of existential rules 𝜙(𝑥       ¯ , 𝑦¯) → ∃𝑧¯ 𝜓(𝑥
                                                                                              ¯ , 𝑧¯),
where 𝑥 ¯ , 𝑦¯, and 𝑧¯ are tuples of variables, 𝜙 is a conjunction of atoms denoting nodes, path
patterns, conditions, and expressions and 𝜓 is a conjunction of node atoms and path patterns.
A path pattern 𝑥𝑅𝑦 individuates all the pairs of nodes ⟨𝑥, 𝑦⟩ connected by a semi-path that
conforms to the regular language 𝐿(𝑅) defined by 𝑅. The semantics of MetaLog descends
from the Vadalog one. Given a graph 𝐺, for each fact of 𝜙(𝑡¯, ¯𝑡′ ), that is, a conjunction of 𝐺
paths, there exists a tuple ¯𝑡′′ of constants and new symbols to satisfy existential quantification,
such that the paths 𝜓(𝑡¯, ¯𝑡′′ ) are also in 𝐺. Given a set Σ of MetaLog rules, the chase alters 𝐺
by adding new paths, until Σ(𝐺) satisfies all of them.
3. The Designer’s Perspective
We used KGModel to design the Company KG of the Bank of Italy. Let us try to narratively
simulate a small fragment of that modeling journey. The domain revolves around the notions
of physical persons, i.e., individuals, or legal persons. These two entities share many features,
but participate in different relationships.
   I will capture the structure by introducing distinct SM_Nodes for persons, i.e., PhysicalPerson
and LegalPerson, with a distinct set of SM_Attributes.
   I will introduce an SM_Generalization, where a Person generalizes and collects the common
features of PhysicalPerson and LegalPerson.
   A person can withhold stakes in a company capital, and multiple persons may have different
rights upon the same portion of company capital.
   I will introduce a Share SM_Node and the HOLDS-BELONGS_TO SM_Edges decoupling owner-
owned SM_Nodes so that multiple Persons can HOLD a Share each with right and percentage.
   Figure 2 shows a portion of the designed KG. The entities of the extensional component are
represented by solid lines. The following program gives an idea of how MetaLog captures the
intensional knowledge (dashed lines). Clearly, for space reasons, here we show only a minimal
fragment of the KG, which actually has tens of entities and many MetaLog snippets.
                                 (𝑥 : Business) → ∃𝑐 (𝑥)[𝑐 : CONTROLS](𝑥)                          (1)
                                   (𝑥 : Business)[: CONTROLS](𝑧 : Business)
                                      [: OWNS; percentage : 𝑤](𝑦 : Business),
                     𝑣 = sum(𝑤, ⟨𝑧⟩), 𝑣 > 0.5 → ∃𝑐 (𝑥)[𝑐 : CONTROLS](𝑦)                            (2)
  A business 𝑥 controls a business 𝑦, if: (i) 𝑥 controls itself; or, (ii) the sum of the shares 𝑤 of 𝑦
owned by companies 𝑧 (possibly including 𝑥), over all companies 𝑧 controlled by 𝑥, is above the
50% threshold.
Figure 2: A portion of the Bank of Italy KG designed with KGModel methodology.
4. Conclusion
We could appreciate how a technology-independent super-model guides the designer through
the modeling activity, offering a toolkit of lenses to capture real-world objects, understand their
characteristics and relationships, and communicate the design choices with stakeholders. Our
methodology fulfills the presented desiderata and extends existing meta-level approaches [15, 16]
to the KG realm.
5. Acknowledgments
This work has been funded by the Vienna Science and Technology Fund (WWTF) [10.47379/VRG18013,
10.47379/NXT22018, 10.47379/ICT2201]; and the Christian Doppler Research Association (CDG)
JRC LIVE.
References
 [1] R. Angles, The property graph database model, in: AMW, volume 2100 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2018.
 [2] L. Bellomarini, D. Fakhoury, G. Gottlob, E. Sallinger, Knowledge graphs and enterprise AI:
     the promise of an enabling technology, in: ICDE, IEEE, 2019, pp. 26–37.
 [3] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. de Melo, C. Gutiérrez, S. Kirrane,
     J. E. L. Gayo, R. Navigli, S. Neumaier, A. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula,
     L. Schmelzeisen, J. F. Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM Comput.
     Surv. 54 (2021) 71:1–71:37.
 [4] P. Atzeni, L. Bellomarini, M. Iezzi, E. Sallinger, A. Vlad, Weaving enterprise knowledge
     graphs: The case of company ownership graphs, in: EDBT, OpenProceedings.org, 2020,
     pp. 555–566.
 [5] L. Bellomarini, M. Benedetti, S. Ceri, A. Gentili, R. Laurendi, D. Magnanimi, M. Nissl,
     E. Sallinger, Reasoning on company takeovers during the COVID-19 crisis with knowledge
     graphs, in: RuleML+RR, 2020.
 [6] L. Bellomarini, E. Laurenza, E. Sallinger, Rule-based anti-money laundering in financial
     intelligence units: Experience and vision, in: RuleML+RR (Supplement), volume 2644 of
     CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 133–144.
 [7] L. Bellomarini, M. Nissl, E. Sallinger, Rule-based blockchain knowledge graphs: Declarative
     ai for solving industrial blockchain challenges, in: RuleML+RR (To Appear), CEUR
     Workshop Proceedings, CEUR-WS.org, 2021.
 [8] M. Y. Vardi, A theory of regular queries, in: PODS, ACM, 2016, pp. 1–9.
 [9] B. Glimm, C. Ogbuji, S. Hawke, I. Herman, B. Parsia, A. Polleres, A. Seaborne, SPARQL 1.1
     entailment regimes, 2013. W3C Recommendation 21 March 2013, 2013.
[10] F. A. Fontana, H. Brunelière, H. A. Müller, C. Raibulet, Guest editors’ introduction to the
     special issue on model driven engineering and reverse engineering: Research and practice,
     J. Syst. Softw. 159 (2020).
[11] L. Bellomarini, A. Gentili, E. Laurenza, E. Sallinger, Model-independent design of knowl-
     edge graphs - lessons learnt from complex financial graphs, in: EDBT, OpenProceed-
     ings.org, 2022, pp. 2:524–2:526.
[12] L. Bellomarini, E. Sallinger, G. Gottlob, The vadalog system: Datalog-based reasoning for
     knowledge graphs, PVLDB 11 (2018).
[13] A. Bonifati, P. Furniss, A. Green, R. Harmer, E. Oshurko, H. Voigt, Schema validation and
     evolution for graph databases, in: ER, volume 11788 of Lecture Notes in Computer Science,
     Springer, 2019, pp. 448–456.
[14] G. Gottlob, A. Pieris, Beyond SPARQL under OWL 2 QL entailment regime: Rules to the
     rescue, in: IJCAI, 2015, pp. 2999–3007.
[15] P. Atzeni, L. Bellomarini, F. Bugiotti, G. Gianforme, MISM: A platform for model-
     independent solutions to model management problems., in: J. Data Semantics, 2009,
     pp. 133–161.
[16] P. Atzeni, P. Cappellari, P. A. Bernstein, Modelgen: Model independent schema translation,
     in: ICDE, IEEE Computer Society, 2005, pp. 1111–1112.