=Paper= {{Paper |id=Vol-1350/paper-50 |storemode=property |title=Polynomial encoding of ORM conceptual models in CFDI |pdfUrl=https://ceur-ws.org/Vol-1350/paper-50.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/FillottraniKT15 }} ==Polynomial encoding of ORM conceptual models in CFDI== https://ceur-ws.org/Vol-1350/paper-50.pdf
Polynomial encoding of ORM conceptual models
                 in CF DI ∀−
                          nc


             Pablo Rubén Fillottrani1,2 , C. Maria Keet3 , David Toman4
 1
      Departamento de Ciencias e Ingenierı́a de la Computación, Universidad Nacional
                  del Sur, Bahı́a Blanca, Argentina, prf@cs.uns.edu.ar
     2
       Comisión de Investigaciones Cientı́ficas, Provincia de Buenos Aires, Argentina
       3
          Department of Computer Science, University of Cape Town, South Africa
                                   mkeet@cs.uct.ac.za
         4
           Cheriton School of Computer Science, University of Waterloo, Canada
                                  david@cs.uwaterloo.ca



         Abstract. The use of conceptual models has long been confined to the
         data analysis stage of software development. In recent years, this has been
         extended to use them at run-time as well, for, among others, querying
         large amounts of data. This brings afore the need to have tractable logic-
         based reconstructions of the conceptual models, i.e., in at most PTIME.
         We provide such a logic-based reconstruction for most of ORM using the
         Description Logic language CFDI ∀−  nc , which has several features impor-
         tant for conceptual models, notably n-ary relationships, complex identi-
         fication constraints, and role subsumption. The encoding captures over
         96% of the constructs used in practice in the set of 33 ORM diagrams
         analysed. The results are easily transferable to EER and UML Class
         diagrams, with an even greater coverage.


1      Introduction
While for many years conceptual models were developed and then shelved upon
implementation or used ‘offline’ for documentation purposes, recent years has
seen an increase in using such precious resources computationally. One strand
of investigation focuses on expressiveness and a logic foundation to compute
satisfiability and detect inconsistencies over the TBox only, among others [2, 5,
8, 16, 24], and several implementations exists using various technologies that are
more or less scalable; e.g., using OCL [16], Alloy [8, 23], or a DL reasoner [19].
Another looks at usage during runtime for a range of purposes, such as using a
conceptual model for scalable test data generation [35] and for designing [6] and
executing [13] queries. Approximations of conceptual models are used also deeper
into the machinery of querying databases, in particular during various stages
of query compilation, e.g., when reasoning about duplicates [39]. The former
purpose requires a logic foundation using a (very) expressive logic, whereas the
latter requires computationally better behaved logics to keep the whole process
feasible. What such a tractable logic language looks like that captures most, or
all, important conceptual data modelling language features, has received little
attention, however. To the best of our knowledge, there are only four efforts to
capture a fragment of which concept or full satisfiability checking is (or is claimed
to be) in polynomial time (or less): one for ER [2], two for UML [1, 25], and one
for ORM [35], with the first one logic-driven and the last one implementation-
driven, and they all differ substantially in coverage of features. The main general
question, thus, is: what is a useful fragment for tractable runtime usage of a
conceptual data model? Then, how to formalise it.
    To answer this question, we choose to focus on ORM first, for it is strictly
more expressive than ER and UML class diagrams, and then facilitating trans-
ferability of the results. Given that there is a lot of insight in computational com-
plexity of Description Logic (DL) languages and which ones are in P, we zoom in
on those for a logic-based reconstruction. It appears that the DL CF DI ∀−     nc [40]
is a good fit and can capture 96.5% of the ORM models in a dataset of 33 ORM
models that are part of the dataset of 101 conceptual models (dataset available
from [28]). This is chiefly thanks to ‘trading’ costly but lesser used covering
constraints for the more often used arbitrary identifiers and n-ary relationships.
    In the remainder of the paper, we first provide background definitions in
Section 2. The main results are presented in Section 3, and are discussed and
compared with related research in Section 4. We summarize our results and
briefly outline future work in Section 5.


2     Preliminaries
The two preliminaries are the DL CF DI ∀−
                                       nc and the ORM language, so as to
keep the paper self-contained.

2.1   The Description Logic CF DI ∀−
                                  nc

All members of the CF D family of DLs are fragments of FOL with underlying
signatures based on disjoint sets of unary predicate symbols called primitive
concepts, constant symbols called individuals and unary function symbols called
features.
Definition 1 (CF DI ∀−  nc Knowledge Bases) Let F, PC and IN be disjoint sets
of (names of) functional features, primitive concepts and individuals, respec-
tively. A path function Pf is a word in F∗ , and we denote the empty word by id
and concatenation by “.”. Metadata and data in a CF DI ∀−  nc knowledge base K
are respectively defined by a TBox T and an ABox A. A TBox T consists of a
finite set of inclusion dependencies of the form

             A v B, A v ¬B, A v ∀f.B, ∀f.A v B, A v ∃f −1 ,
                                 or A v B : Pf 1 , . . . , Pf k → Pf

where A, B ∈ PC, f ∈ F, and Pf i ∈ F∗ . A concept “B : Pf 1 , . . . , Pf k → Pf” that
participates in the last dependency is called a path functional dependency (PFD).
An ABox A consists of a finite set of facts in the form of concept assertions A(a),
and function assertions f (a) = b where A ∈ PC, f ∈ F, and a, b ∈ IN. Any PFD
occurring in T must also satisfy a regularity condition by adhering to one of the
following two forms:

   C : Pf . Pf 1 , Pf 2 , . . . , Pf k → Pf    or    C : Pf . Pf 1 , Pf 2 , . . . , Pf k → Pf .g.   (1)

A PFD is a key if it adheres to the first of these forms.
     The semantics is defined in the standard way with respect to an interpretation
I = (4, (·)I ), where 4 is a domain of “objects” and (·)I an interpretation
function that fixes the interpretation of primitive concepts A to be subsets of
4, features f to be total functions on 4, and individuals a to be elements of
4. The interpretation function is extended to path expressions by interpreting
id , the empty word, as the identity function λx.x, concatenation as function
composition, and to derived concept descriptions as follows:
                            (¬A)I =4 \ AI
                          (∀f.A)I ={x | f I (x) ∈ AI }
                          (∃f −1 )I ={x | ∃y ∈ 4 : f I (y) = x}
                                                     Vk
  (C : Pf 1 , . . . , Pf k → Pf)I ={x | ∀y ∈ CI : ( i=1 Pf Ii (x) = Pf Ii (y))
                                                                  ⇒ Pf I (x) = Pf I (y)}

An interpretation I satisfies an inclusion dependency C v D if CI ⊆ DI , a con-
cept assertion A(a) if aI ∈ AI , and a function assertion f (a) = b if f I (aI ) = bI .
I satisfies a knowledge base K if it satisfies each inclusion dependency and asser-
tion in K. In addition, every CF DI ∀−
                                     nc knowledge base must satisfy the following
two conditions.

 1. (inverse feature and value restriction interaction) If {A v ∃f −1 , ∀f.A0 v
    B} ⊆ T then (a) A v A0 ∈ T , (b) A0 v A ∈ T or (c) A v ¬A0 ∈ T .

 2. (inverse feature and PFD interaction) Any non-key PFD occurring in T that
    involves features used in the ∃f −1 concept in T must also satisfy a stronger
    regularity condition by adhering to the following form:

                                   C : Pf .f, Pf 2 , . . . , Pf k → Pf .g.                          (2)

Proposition 1 ([40]). Consistency of CF DI ∀−
                                           nc knowledge bases is complete
for PTIME.

Other reasoning tasks, such as logical implication and concept consistency reduce
(linearly) to knowledge base consistency. Relaxing either of the aforementioned
conditions leads to EXPTIME and PSPACE completeness, respectively [40].
Note also, that condition (2) imposed on PFDs applies only to non-key PFDs.
Overall, however, the restrictions do not seem to impact the modeling utility
of CF DI ∀−
          nc in relation to keys and functional constraints. Indeed, arbitrary
functional dependencies in relational schema are easily captured. Finally and
for convenience CF DI ∀−
                       nc supports additional syntax, e.g., subsumptions of the
form ∃f −1 v A. This additional syntax is mere syntactic sugar and can be
equivalently expressed in CF DI ∀−
                                 nc as defined above [40].
Fig. 1. Sample ORM2 diagram for a particular university where, among others, each
Support Staff must have exactly one Extension that is identified by a code, Support Staff
and Academics are subtypes of Employee and are disjoint, each Room is identified by
the Building and its RoomNr, each Employee has exactly one Name, and a support staff
that supports a department also works for that department.

2.2    Brief overview of ORM2

Object-Role Modelling as a general term is also known as Fact-Based Model-
ing and comprises several notational variants with varying language features,
ranging from its predecessor NIAM and its recent notation in the proprietary
CogNIAM tool of PNA, to the FCO-IM flavour [4], to ORM as in [21] popu-
larised with VisioModeler 3.1 and ORM2 [22] popularised with the NORMA
tool [15] as plugin for MS Visio, among others. We use Halpin’s notation [22]
in the remainder of the paper for its compactness. As we are interested in the
static, structural components, we ignore deontic constraints5 —they did not oc-
cur in any of the evaluated ORM models anyway [28]—and derived constraints
(ORM’s version of methods). ORM2 has four different kinds of named entities:
  – entity type, which is like an EER entity type and UML class and may be
     objectified, and is denoted with a blue rectangle with round corners;
  – value type, which is an entity type that has a binary fact type (relationship)
     to a data type, denoted with a blue dashed rectangle with round corners;
  – role that each entity/value type plays in a fact type, denoted with a small
     rectangle and connected to the object type or value type;
  – n-ary fact type (n ≥ 1) that relates entity types to each other or entity
     types to value types and they have to be elementary (uniqueness constraint
     spanning n or n − 1 roles), denoted with a rectangle composed of the roles.
Typically, roles and fact types are named automatically, but this can be added,
as indicated with the user-named role [DE] in Fig. 1; note that the text next to
the fact types are ‘readings’ for model verbalisations, not fact type names.
5
    Refer to [36] for a treatment of deontic constraints in the context of DLs and SBVR
    ORM has many constraint types (some of which are used rarely [28]); a non-
exhaustive example is shown in Fig. 1. The ones used in the figure are mandatory
(small blob), uniqueness (line over rectangle), reference schemes (simple identi-
fiers, e.g., .empNr), and external identifiers (circle with double horizontal line),
subtype (arrow) and subset (over roles, fact types, paths), disjointness (encir-
cled cross); others include coverage, constraints on values, and 11 relationship
constraints (a.o., transitive, irreflexive).
    As transformations exist from one ORM model to UML and to ER [22], ORM
is ‘more conceptual’ than the other models for one does not have to commit to
an implementation paradigm upfront. Moreover, the notation is also used for
business rules specifications [32].


3     ORM2cfd fragment into CF DI ∀−
                                  nc

While we primarily focus on ORM to CF DI ∀−    nc mapping here, we also aim for
an easy extension to EER and UML Class diagrams, the ability to use it for
inter-model assertions across models represented in different languages [17], and
for unification of the conceptual modelling language families for a widely used
subset of features. To this end, we first define the syntax and semantics of a
‘generic’ conceptual data modelling language, which we name CM−        com , as it
is, essentially, a proper fragment of CMcom —used as common conceptual data
modelling language for EER, UML, and ORM [30]—without covering and dis-
junctive mandatory constraints, and with limited cardinalities and a more precise
definition of relationship subsumption and disjointness. CM−  com contains those
features mappable into CF DI ∀−  nc (as described in Section 3.3) and captures a
subset of features of ORM, named ORM2cfd . Thus, within the scope of this
paper, one can equate CM−    com and ORM2cfd .


3.1   Syntax
The syntax is introduced first, and subsequently illustrated with an example.
We assume a transformation where an ORM value type is encoded in CM−      com as
an attribute, and note that recursive relationships are allowed such that a class
can participate more than once in a relationship.
Definition 2 (Conceptual Data Model CM−                                     −
                                                    com syntax) A CMcom concep-
tual data model is a tuple Σ = (L, rel, att, cardR , cardA , isaC , isaR , isaU ,
disjC , disjR , id, extid, fd, obj, rex) such that:
1. L is a finite alphabet partitioned into the sets: C (object type symbols), A (at-
   tribute symbols), R (relationship symbols), U (role symbols), and D (domain
   symbols); the tuple (C, A, R, U, D) is the signature of Σ.
2. att is a function that maps a class symbol in C to an A-labeled tuple over D,
   so that att(C) = {A1 : D1 , . . . , Ah : Dh } where h is a non-negative integer.
3. rel is a function that maps a relationship symbol in R to an U-labeled tuple
   over C, rel(R) = {Ui : Ci }hi=1 , h ≥ 1, Ui 6= Uj if i 6= j, h is called the arity of
   R, and if (Ui , Ci ) ∈ rel(R) then player(R, Ui ) = Ci and role(R, Ci ) = Ui .
   The signature of the relation is σR = {Ui }ni=1 .
4. cardR is a partial function cardR : R × U → {0, 1} × {1, ∞} denoting
   cardinality constraints. We denote with cmin(R, U ) and cmax(R, U ) the first
   and second component of cardR .
5. cardA is a partial function cardA : C × A → {0, 1} × {1, ∞} denoting multi-
   plicity constraints for attributes, such that cardA (C, A) is defined iff (A, D) ∈
   att(C) for some D ∈ D. We denote with cmin(C, A) and cmax(C, A) the
   first and second component of cardA .
6. isaC is a binary relationship isaC ⊆ C × C.
7. isaR is a binary relationship isaR ⊆ R × R. isaR between relationships is
   restricted to relationships with compatible signatures.
8. isaU is a binary relationship isaU ⊆ U × U.
9. disjC is a relationship over 2C × C, describing disjointness partitions over
   groups of isa that share the same superclass.
10. disjR is a subset of 2R , describing disjointness over a group of relations
   (with compatible signatures).
11. id is a partial function, id : C → 2A , that maps a class symbol in C to its
   identifier (key) attributes; cardA (C, A) = (1, 1) holds for each A ∈ id(C).
12. extid is called an external uniqueness, which is a partial function that
   maps a class to a set of sets of relation-role pairs or attributes, extid : C →
      (R×U )∪A
   22          , where for any S ∈ extid(C) it holds that at least one pair (R, U ) ∈
   S, and whenever (R, U ) ∈ S then player(R, U ) = C, cardR (R, U ) = (1, 1),
   and when A ∈ S then cardA (C, A) = (1, 1).
                                                                                  U
13. fd is a functional dependency assertion, a partial function fd : R → 22 ×U
   such that fd(R) may be defined only if its arity is ≥ 2 and for all Ui , U ∈ U
   such that ({Ui }, U ) ∈ fd(R) then exist Ci , C such that Ui : Ci ∈ rel(R)
   and U : C ∈ rel(R).
14. obj is an objectification partial function that maps an n-ary relationship
   symbol R ∈ R to a C ∈ C, i.e., obj : R → C. Whenever obj(R) = C and
   Ui : Ci ∈ rel(R), 1 ≤ i ≤ n then there exist n new binary relationships Ri ∈
   R such that rel(Ri ) = {Ui1 : C, Ui2 : Ci }, with Ui1 , Ui2 ∈ U; extid(C) =
   {(Ui , Ci ) | 0 < i ≤ n}; cardR (Ri , Ui1 ) = (1, 1) and cardR (Ri , Ui2 ) = (0, 1).
15. rex is a subset of 2U describing disjointness partitions over a group of roles,
   i.e, if {Ui }hi=1 ∈ rex then exist Ri ∈ R, Ci ∈ C such Ui : Ci ∈ rel(Ri ), and
   all Ri have the same arity.

To link this syntax to ORM’s icons, value types are transformed into attributes,
any unary relationship is translated into a class and a binary relationship with
a Boolean datatype, and a suitable naming scheme for the roles and fact types
is in place.
Example 1 Let us consider a mapping between this CM−       com syntax and some of
the mappable ORM icons of the ORM2cfd fragment, using the diagram in Fig. 1.
The fact type verbalised with works for can be represented with the relationship
   rel(works) = {DE : Department, ED : Employee}
Identification (single attribute, resp. external) of the Employee and Room with
   id(Employee) = {empNr} and extid(Room) = {{locBuildingName, roomNr}}.
Subsumption of ORM entity types and fact types is straightforward as:
   Academic isaC Employee and supports isaR works
and mandatory participation of Department in works as
   cardR (works, DE) = (1, ∞).                                                            ♦

3.2     Semantics
The model-theoretic semantics of CM−         com in the light of ORM2cfd is as follows.
Definition 3 (CMcom Semantics) Let Σ be a CM−
                        −
                                                             com conceptual data model.
An interpretation for the conceptual model Σ is a tuple I = (∆I ∪ ∆ID , ·I ), such
that:
 – ∆I is aSnonempty set of abstract objects disjoint from ∆ID ;
 – ∆ID = Di ∈D ∆Di is the set of basic domain values used in Σ; and
 – ·I is a function that maps:
       • Every basic domain symbol D ∈ D into a set DI = ∆D .
       • Every class C ∈ C to a set C I ⊆ ∆I .
       • Every attribute A ∈ A to a set AI ⊆ ∆I × ∆ID , such that for each A ∈ A
          and C ∈ C with A : D ∈ att(C) for some D ∈ D, then for all o ∈ C I
          there exists d ∈ DI such that (o, d) ∈ AI .
       • Every relationship R ∈ R to a set RI of tuples of U-labeled elements
          of ∆I : for R an n-ary relationship connecting the classes C1 , . . . , Cn ,
          rel(R) = {Ui : Ci }ni=1 , i.e., {Ui : oi }ni=1 ∈ RI implies oi ∈ CiI .
I is called a legal database state or legal application software state if it satisfies
all of the constraints expressed in the conceptual data model:
 – for each C1 , C2 ∈ C: if C1 isaC C2 , then C1I ⊆ C2I .
 – for each R1 , R2 ∈ R (with the same signature): if R1 isaR R2 , then R1I ⊆ R2I .
     In addition, ORM stipulates that the participating role sequences (which in
     this case are all roles of Ri ) of every relationship participating in the R1I ⊆ R2I
     to be role-sequence compatible.6
 – for each U1 , U2 ∈ U: if U1 isaU U2 , then there exist R1 , R2 ∈ R with
     player(R1 , U1 ) = C1 and player(R2 , U2 ) = C1 , and C1 , C2 ∈ C, and
     {o ∈ C1I such that U1 : o ∈ R1I } ⊆ {o ∈ C2I such that U2 : o ∈ R2I }.
 – for each C ∈ C, R ∈ R, U ∈ U: if cardR (R, U ) = (x, y) and player(R, U ) =
     C, then x ≤ #{o ∈ C I such that U : o ∈ RI } ≤ y.
 – for each C ∈ C, A ∈ A: if cardA (C, A) = (x, y), then A : D ∈ att(C) for
     some D ∈ D, and x ≤ #{d ∈ DI such that (o, d) ∈ AI , o ∈ C I } ≤ y.
 – if {Ci }ni=1 ∈ disjC then CiI 6= CjI for all i, j, j 6= i, 1 ≤ i, j ≤ n.
 – if {Ri }ni=1 ∈ disjR then Ri are role-sequence compatible and RjI ∩ RkI = ∅.
 – for each C ∈ C, Ai ∈ A, 1 ≤ i ≤ n: if id(C) = {Ai },Vthen exists Di ∈ D such
                                                                n
     that Ai : Di ∈ att(C), and #{o ∈ C I such that i=1 (o, di ) ∈ AIi } ≤ 1 for
                 I
     any di ∈ Di , 1 ≤ i ≤ n.
 – for each C ∈ C, Ri ∈ R, Ui ∈ U, 1 ≤ i ≤ h, Aj , 1 ≤ j ≤ l: if {(Ri , Ui )}hi=1 ∪
     {Aj }lj=1 ∈ extid(C), then exist Dj ∈ D with Aj : Dj ∈ att(C), and #{o ∈
     C I such that (Ui : o) ∈ RiI and (o, dj ) ∈ AIj } ≤ 1 for any dj ∈ DjI , 1 ≤ j ≤ l.
6
    We assume that the compatibility is enforced explicitly by additional isaC pairs of
    the classes linked to the matching roles in the relationships, e.g., for U : C ∈ rel(R1 )
    and U : D ∈ rel(R2 ) we have C isaC D.
 – for each U, Ui ∈ U, 1 ≤ i ≤ h, R ∈ R: if fd(R) = ({Ui }hi=1 , U ), then exists
    C, Ci ∈ C such that C : U ∈ rel(R), Ci : Ui ∈ rel(R) and #{o ∈
    C I such that {U : o} ∪ {Ui : oi , } ∈ RI } ≤ 1 for any oi ∈ CiI , 1 ≤ i ≤ h.
 – for each R ∈ R, C ∈ C: if obj(R) = C, then exist Ri ∈ R with rel(Ri ) =
    {Ui1 : C, Ui2 : Ci }, for all 1 ≤ i ≤ n, and the following statements are also
    satisfied: extid(C) = {(Ri , Ui2 ) | 0 < i ≤ n}, cardR (Ri , Ui1 ) = (1, 1), and
    cardR (Ri , Ui2 ) = (0, 1).
 – for each Ui ∈ U, 1 ≤ i ≤ h: if {Ui }hi=1 ∈ rex, then exists Ri ∈ R with
    arity m, such that Ui : Ci ∈ rel(Ri ). Then for each o ∈ CiI the following
    condition is true #{k such that {Uk : o} ∈ RkI , 1 ≤ k ≤ h} ≤ 1.
Σ is said to be globally consistent if it admits at least one legal state.


3.3     Mapping of ORM2cfd to CF DI ∀−
                                    nc

We formalize how CF DI ∀−          nc can capture ORM2cfd conceptual models. Let Σ =
(L, rel, att, cardR , cardA , isaC , isaR , isaU , disjC , disjR , id, extid, fd, obj,
rex) be a CM−      com conceptual data model corresponding to ORM2cfd . We map
Σ to a CF DI ∀−  nc TBox TΣ in the vocabulary PC, F using the following rules:
 – Include in the vocabulary one concept name for each ORM2cfd object type
    and datatype, i.e., for each C ∈ C, D ∈ D we have C ∈ PC, D ∈ PC.
 – To map attributes, there are two cases to consider: if the attribute is func-
    tional then it is mapped as a function symbol and two concepts that reify
    each of the roles; otherwise it is reified as a new concept, with the corre-
    sponding cardinality constraints. For each C ∈ C such that att(C) = {Ai :
    Di }ni=1 , and for each i:
      • if cardA (C, Ai ) = (1, 1), then we introduce two new concepts UA                         1 , U2 ∈
                                                                                                     i    Ai
                                                                                                 Ai      Ai
         PC, a new function symbol ai ∈ F, and {C v ∀ai .Di , C v U1 , U1 v
         C, UA 2 v ∃ai
                i           −1
                               , ∃ai −1 v UA  2 } ∈ TΣ .
                                                i


      • otherwise we introduce new concept symbols Ai , UA                                A
                                                                                  i,1 , Ui,2 ∈ PC, and
         two new function symbols ai,1 , ai,2 ∈ F, with {Ai v ∀ai,1 .UA                            i,1 , Ai v
                                                                     −1                   −1
         ∀ai,2 .UAi,2 , U A
                          i,1 v   C,  U A
                                        i,2 v  D  i , U A
                                                        i,1 v ∃a i,1    , U A
                                                                            i,2 v ∃a  i,2    , ∀a        A
                                                                                                  i,1 .Ui,1 v
         Ai , ∀ai,2 .UA i,2 v Ai , Ai v Ai : ai,1 , ai,2 → id} ⊆ TΣ . If cmin(C, Ai ) = 1,
         then also C v UA         i,1 ∈ TΣ ; if if cmax(C, Ai ) = 1, then Ai v Ai : ai,1 →
         id ∈ TΣ .
 – The mapping of relationships (ORM2cfd fact types) is similar as the map-
    ping of attributes. If the relationship is binary and one of its roles has a
    (1, 1) constraint, then it is mapped as an attribute; in the other cases the
    relationship and its roles are reified as a new concepts, with new attributes
    from the reified relationship to the reified roles. The reified roles are then
    subconcepts of the concepts originally participating in the relationship. For
    each R ∈ R such that rel(R) = {Ui : Ci }ni=1 then
      • if n = 2 and cardR (R, U1 ) = (1, 1), then we introduce a new function
         symbol u1 ∈ F, and {C1 v ∀u1 .C2 , C1 v C1 : u1 → id} ∈ TΣ . Similarly
         if cardR (R, U2 ) = (1, 1) then we introduce function symbol u2 ∈ F and
         {C2 v ∀u2 .C1 , C2 v C2 : u2 → id} ∈ TΣ .
     • otherwise we add new concept symbols R, UR                       i ∈ PC, 1 ≤ i ≤ n , new
         function symbols uR     i  ∈  F,  1    ≤   i  ≤   n,    and   then  {R v ∀uR      R   R
                                                                                      i .Ui , Ui v
                          −1
         Ci , U R       R
                  i v ∃ui    , ∀uR   R            n                  R        R
                                 i .Ui v R}i=1 ∪ {R : u1 , . . . , un → id} ⊆ TΣ . Addi-
         tionally for each i, 1 ≤ i ≤ n if cmin(R, Ui ) = 1 then also Ci v UR               i ∈ TΣ ;
         and if cmax(R, Ui ) = 1, then R v R : uR                i  →  id  ∈ TΣ .
 – for each C1 , C2 ∈ C such that C1 isaC C2 , then C1 v C2 ∈ TΣ .
 – for each R1 , R2 ∈ R such that R1 isaR R2 , then R1 v R2 ∈ TΣ . In order
   to define relationship inheritance in ORM2cfd , the types of the participating
   concepts must be compatible, therewith adhering to the syntax restriction
   of ORM (as aside: without this condition, the reconstruction in a PTIME
   language is not possible).
 – for each U1 , U2 ∈ U such that U1 isaU U2 then U1 v U2 ∈ TΣ .
 – for each Ci , C ∈ C such that {Ci }ni=1 disjC C, then {Ci v C}ni=1 ∪ {Ci v
   ¬Cj }ni6=j,i,j=1 ⊆ TΣ , stating the concepts are pairwise disjoint.
 – for each Ri ∈ R such that {Ri }ni=1 ∈ disjR , then {Ri v ¬Rj , Ri v Rj :
   σRj → id}ni6=j,i,j=1 ⊆ TΣ . Again, ORM has the condition that relationship
   exclusion must be defined over compatible types for the participating con-
   cepts (which also happens to be a necessary condition for the efficiency of
   the translation).
 – for each C ∈ C, Ai ∈ A such that id(C) = {Ai }ni=1 , then C v C :
   a1 , . . . , an → id ∈ TΣ . In this case since the attributes are keys then they
   must be in (1, 1) constraint with C, so they are mapped as features in TΣ
   by the first point of the respective rule, described above.
 – for each C ∈ C, Ri ∈ R, Ui ∈ U, Aj ∈ A such that {(Ri , Ui )}hi=1 ∪ {Aj }kj=1 ∈
   extid(C), then C v C : uR                       R
                                      1 , . . . , uh , a1 , . . . , ak → id ∈ TΣ . Here the only
   possibility is that Ui and Aj belonging to the external identifier have a (1, 1)
   constraint, so they are mapped with features.
 – for each U, Ui ∈ U, R ∈ R: if fd(R) = ({Ui }hi=1 , U ), then CR v CR :
   uR             R    R
     1 , . . . , uh → u . This case is similar as the previous one: we ensure the roles
   and attributes belonging to the external identifier are mapped as features in
   TΣ because the are (1, 1) to C.
 – for each R ∈ R, C ∈ C: if obj(R) = C, then we have the mappings for
   extid(C), cardR (Ri , Ui1 ) = (1, 1), and cardR (Ri , Ui2 ) = (0, 1).
 – for each Ui ∈ U: if {Ui }hi=1 ∈ rex, then exists Ri ∈ R, Ci ∈ C with arity m,
   such that (Ui : Ci ) ∈ rel(Ri ), 1 ≤ i ≤ h. Then {UR                           R h
                                                                          i v ¬Uj }i6=j,i,j=1 ⊆ TΣ .
   Note that rex requires optional participation in the role, and therewith uses
   the second case of the relationship mapping above.
Case analysis of the translation combined with Proposition 1 yields the following:

Theorem 1. Let Σ be an ORM2cfd conceptual data model. A class C is con-
sistent in Σ if and only if the knowledge base (TΣ , {C(aC )}) is consistent. Σ is
globally consistent if and only if TΣ is consistent with A = {C(aC ) | C ∈ C}.

Example 2 Consider again the running example with Fig. 1 and the illustration
of the syntax (Example 1), the corresponding CF DI ∀−
                                                   nc knowledge base contains,
among others: the translation where empNr is an attribute of Employee with
cardA (empNr, Employee) = (1, 1), the following CF DI ∀−
                                                      nc axioms:

    {Employee v ∀empNr.Integer, Employee v UempNr
                                            1     , UempNr
                                                     1     v Employee,
                              UempNr
                                2    v ∃empNr−1 , ∃empNr−1 v UempNr
                                                                2     }.
Then, to represent the identifier, we add Employee v Employee : empNr → id.
The mapping of rel(works) with cardinalities cardR (works, DE) = (1, ∞) and
cardR (works, ED) = (0, ∞) is the set of axioms
                                                                               −1
{ Works v ∀deworks .DEworks , DEworks v Department, DEworks v ∃deworks ,
  ∀deworks .DEworks v Works, Works v Works : deworks , edworks → id,
  Department v DEworks , Works v ∀edworks .EDworks , EDworks v Employee,
                     −1
  EDworks v ∃edworks , ∀edworks .EDworks v Works                         },
and likewise for the remainder of the diagram in Figure 1.                          ♦

4    Discussion
There are many papers with logic-based reconstructions of ORM, EER, and
UML; we discuss a subset relevant to the scope of this paper.

Comparison with other ORM2 Encodings. Fairly expressive logic-based
reconstructions of ORM fragments exist, including ORM 2zero in the EXPTIME-
complete ALCQI [20], ORM2− [30] in the EXPTIME-complete DLRifd [11], an
ORM2 fragment (e.g., [41]) in SROIQ that is N2EXPTIME-Complete [27], and
ORM in the undecidable FOL [21]. An Alloy encoding and a numeric model as
encoding for ORM are proposed in [23], which are experimentally compared to
unsatisfiability pattern checks, showing that the latter two far outperform the
Alloy approach (seconds vs. hours and timeouts), but complexity results are not
provided. Their ORM fragment of the number encoding does include ‘costly’ fea-
tures, such as covering constraints, disjunctive mandatory, arbitrary frequency
(with uniqueness check), external identifiers, and value constraints, but it is un-
clear what was used in the test ORM diagrams. The only ORM fragment claimed
in PTIME is Smaragdakis et al.’s ORM− [35] that also uses a number model. It
includes non-overlapping uniqueness constraints over n-ary relationships, simple
mandatory, non-overlapping frequency constraints (cardinalities > 1), value con-
straints, and subtype constraints. Arbitrary frequency constraints (like arbitrary
projections in a relational table) cause undecidability, but, though not specified
in [35], one could assume it always occurs in conjunction with a suitable unique-
ness constraint in order to regain decidability, as discussed in [23, 29]. Their value
constraints are not constrained either, i.e., value ranges of integers, floats, and
enumerations are allowed, and have no constraints, such as so-called “safe” data
types [3]. Most problematic, however, is the integer bound propagation in step 2
of Algorithm 2 in [35], which has recently shown to be NP-Complete [7], hence,
Smaragdakis et al.’s solution seems to be at least NP-Hard. To the best of our
knowledge, the here provided logic-based reconstruction of the ORM2cfd ORM
fragment in the PTIME CF DI ∀−    nc is the first tractable encoding of ORM, yet
still capturing most of the entities encountered in extant ORM models.
Extensibility to EER and UML Class Diagrams. As ORM is more ex-
pressive than UML Class Diagrams and EER, and an ORM diagram can be
translated into UML and into ER [22], the results obtained should be at least as
good for those. A quick matching thanks to availing of the unifying metamodel
[31, 18] reveals that that is the case, where CF DI ∀−   nc can encode over 97% of
the 34 UML models and over 99% of the 34 (E)ER models in our dataset. This
can be of use here as well, as also most UML Class Diagram encodings focus
on expressiveness, using, among others, DLRifd [5] as well, or an OCL-lite en-
coding matching ALCI [34], which is still EXPTIME-complete. Kaneiwa and
Satoh claim to have some fragments of UML Class Diagrams in P and PSpace
for full satisfiability checking (all classes must be satisfiable) [25, 26], but this has
been proven otherwise by Artale et al. [1]. Focusing on coverage of features, the
smallest restricted fragment is shown to be NLogSpace [1] when disallowing isa
on associations and completeness on subclasses, using approximations of reified
binaries (i.e., missing extid, and thus also no qualified associations). An ini-
tial analysis shows that this might still capture almost 96% of the UML models
in our dataset, and might thus also be a worthwhile fragment of UML. Such
high coverage can be obtained partially due to the changes to UML v2.4.1 [33]
where relational properties (asymmetry and transitivity) for aggregation have
been dropped, with aggregation taking up about a quarter of all associations,
and not all UML features in the standard are implemented in modelling tools.
    The story is similar for (E)ER. Various encodings exist [14, 37, 38], (partially
due to the absence of a standard), which either use a language in the EXPTIME-
complete DLR family [9–11] or DL-Lite family [12] with different computational
complexities for different EER fragments [2]. Trading functionality for gaining
a little in computational complexity [2], however, is certainly not an option if
coverage is also an aim, especially due to all the identifiers in EER and ORM
(which can be represented in CF DI ∀−    nc ).




5    Conclusions

A logic-based reconstruction for most of ORM using the PTIME Description
Logic language CF DI ∀− nc has been presented, covering 96.5% of a set of extant
ORM models. This is the first tractable encoding of ORM, which includes fea-
tures important for conceptual models, notably n-ary relationships and complex
identification constraints. Future work includes working toward implementations
of the scenarios alluded to in the introduction, and we also expect to apply this
encoding to facilitate inter-model interoperability [17] as the results are easily
transferable to EER and UML Class diagrams.


Acknowledgments. This work is based upon research supported by the Na-
tional Research Foundation of South Africa (Project UID: 90041), the Argen-
tinian Ministry of Science and Technology (PRF and CMK), and NSERC (DT).
References
 1. Artale, A., Calvanese, D., Ibáñez-Garcı́a, Y.: Full satisfiability of UML class dia-
    grams. In: Parsons, J., et al. (eds.) Proceedings of the 29th International Conference
    on Conceptual Modeling (ER’10). LNCS, vol. 6412, pp. 317–331. Springer (2010)
 2. Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Rea-
    soning over extended ER models. In: Parent, C., Schewe, K.D., Storey, V.C., Thal-
    heim, B. (eds.) Proceedings of the 26th International Conference on Conceptual
    Modeling (ER’07). LNCS, vol. 4801, pp. 277–292. Springer (2007), auckland, New
    Zealand, November 5-9, 2007
 3. Artale, A., Ryzhikov, V., Kontchakov, R.: DL-Lite with attributes and datatypes.
    In: Proceeding of the 20th European Conference on Artificial Intelligence
    (ECAI’12). pp. 61–66. IOS Press (2012)
 4. Bakema, G., Zwart, J.P., van der Lek, H.: Volledig Communicatiegeoriënteerde
    Informatiemodellering FCO-IM. Academic Service (2005)
 5. Berardi, D., Calvanese, D., De Giacomo, G.: Reasoning on UML class diagrams.
    Artificial Intelligence 168(1-2), 70–118 (2005)
 6. Bloesch, A.C., Halpin, T.A.: Conceptual Queries using ConQuer-II. In: Proceedings
    of ER’97: 16th International Conference on Conceptual Modeling. LNCS, vol. 1331,
    pp. 113–126. Springer (1997)
 7. Bordeaux, L., Katsirelos, G., Narodytska, N., Vardi, M.Y.: The complexity of in-
    teger bound propagation. Journal of Artificial Intelligence Research 40, 657–676
    (2011)
 8. Braga, B.F.B., Almeida, J.P.A., Guizzardi, G., Benevides, A.B.: Transforming On-
    toUML into Alloy: towards conceptual model validation using a lightweight formal
    methods. Innovations in Systems and Software Engineering 6(1-2), 55–63 (2010)
 9. Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query contain-
    ment under constraints. In: Proc. of the 17th ACM SIGACT SIGMOD SIGART
    Sym. on Principles of Database Systems (PODS’98). pp. 149–158 (1998)
10. Calvanese, D., De Giacomo, G., Lenzerini, M.: Reasoning in expressive description
    logics with fixpoints based on automata on infinite trees. In: Proc. of the 16th Int.
    Joint Conf. on Artificial Intelligence (IJCAI’99). pp. 84–89 (1999)
11. Calvanese, D., De Giacomo, G., Lenzerini, M.: Identification constraints and func-
    tional dependencies in description logics. In: Nebel, B. (ed.) Proc. of the 17th Int.
    Joint Conf. on Artificial Intelligence (IJCAI 2001). pp. 155–160. Morgan Kaufmann
    (2001), seattle, Washington, USA, August 4-10, 2001
12. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family.
    Journal of Automated Reasoning 39(3), 385–429 (2007)
13. Calvanese, D., Keet, C.M., Nutt, W., Rodrı́guez-Muro, M., Stefanoni, G.: Web-
    based graphical querying of databases through an ontology: the WONDER system.
    In: Shin, S.Y., Ossowski, S., Schumacher, M., Palakal, M.J., Hung, C.C. (eds.)
    Proceedings of ACM Symposium on Applied Computing (ACM SAC’10). pp. 1389–
    1396. ACM (2010), 22-26 March, 2010, Sierre, Switzerland
14. Chen, P.P.: The entity-relationship model—toward a unified view of data. ACM
    Transactions on Database Systems 1(1), 9–36 (1976)
15. Curland, M., Halpin, T.: Model driven development with NORMA. In: Proceedings
    of the 40th International Conference on System Sciences (HICSS-40). pp. 286a–
    286a. IEEE Computer Society (2007), los Alamitos, Hawaii
16. Farré, C., Queralt, A., Rull, G., Teniente, E., Urpı́, T.: Automated reasoning on
    UML conceptual schemas with derived information and queries. Information and
    Software Technology 55(9), 1529 – 1550 (2013)
17. Fillottrani, P.R., Keet, C.M.: Conceptual model interoperability: a metamodel-
    driven approach. In: Bikakis, A., et al. (eds.) Proceedings of the 8th International
    Web Rule Symposium (RuleML’14). LNCS, vol. 8620, pp. 52–66. Springer (2014),
    august 18-20, 2014, Prague, Czech Republic
18. Fillottrani, P.R., Keet, C.M.: KF metamodel formalisation. Technical report
    1412.6545v1 (September 2014), arxiv.org
19. Fillottrani, P.R., Franconi, E., Tessaris, S.: The ICOM 3.0 intelligent conceptual
    modelling tool and methodology. Semantic Web Journal 3(3), 293–306 (2012)
20. Franconi, E., Mosca, A.: The formalisation of ORM2 and its encoding in
    OWL2. Technical Report KRDB12-2, KRDB Research Centre, Faculty of
    Computer Science, Free University of Bozen-Bolzano, Italy (March 2012),
    http://www.inf.unibz.it/krdb/pub/TR/KRDB12-2.pdf
21. Halpin, T.: A logical analysis of information systems: static aspects of the data-
    oriented perspective. Ph.D. thesis, University of Queensland, Australia (1989)
22. Halpin, T., Morgan, T.: Information modeling and relational databases. Morgan
    Kaufmann, 2nd edn. (2008)
23. Jahangard-Rafsanjani, A., Mirian-Hosseinabadi, S.H.: Lightweight formalization
    and validation of ORM models. Journal of Logical and Algebraic Methods in Pro-
    gramming p. http://dx.doi.org/10.1016/j.jlamp.2015.03.001 (2015)
24. Jahangard Rafsanjani, A., Mirian-Hosseinabadi, S.H.: A Z approach to formaliza-
    tion and validation of ORM models. In: Ariwa, E., El-Qawasmeh, E. (eds.) Digital
    Enterprise and Information Systems, Communications in Computer and Informa-
    tion Science, vol. 194, pp. 513–526. Springer (2011)
25. Kaneiwa, K., Satoh, K.: Consistency checking algorithms for restricted UML class
    diagrams. In: Proceedings of the 4th International Symposium on Foundations of
    Information and Knowledge Systems (FoIKS’06). Springer Verlag (2006)
26. Kaneiwa, K., Satoh, K.: On the complexities of consistency checking for restricted
    UML class diagrams. Theoretical Computer Science 411, 301–323 (2010)
27. Kazakov, Y.: RIQ and SROIQ Are Harder than SHOIQ. In: Brewka, G., Lang, J.
    (eds.) 11th International Conference on Principles of Knowledge Representation
    and Reasoning (KR’08). pp. 274–284. AAAI Press (2008), 16-19 September, 2008,
    Sydney, Australia
28. Keet, C.M., Fillottrani, P.R.: Dataset of publicly available conceptual models and
    its analysis (2015), http://www.meteck.org/SAAR.html
29. Keet, C.M.: Prospects for and issues with mapping the Object-Role Modeling
    language into DLRif d . In: 20th International Workshop on Description Logics
    (DL’07). CEUR-WS, vol. 250, pp. 331–338 (2007), 8-10 June 2007, Bressanone,
    Italy
30. Keet, C.M.: Ontology-driven formal conceptual data modeling for biological data
    analysis. In: Elloumi, M., Zomaya, A.Y. (eds.) Biological Knowledge Discovery
    Handbook: Preprocessing, Mining and Postprocessing of Biological Data, chap. 6,
    pp. 129–154. Wiley (2013)
31. Keet, C.M., Fillottrani, P.R.: Toward an ontology-driven unifying metamodel for
    UML class diagrams, EER, and ORM2. In: Ng, W., Storey, V.C., Trujillo, J. (eds.)
    32nd International Conference on Conceptual Modeling (ER’13). LNCS, vol. 8217,
    pp. 313–326. Springer (2013), 11-13 November, 2013, Hong Kong
32. Object Management Group: Semantics of Business Vocabulary and Rules
    (SBVR) – OMG released versions of SBVR, formal/2008-01-02 (January 2008),
    http://www.omg.org/spec/SBVR/1.0
33. Object Management Group: Superstructure specification. Standard 2.4.1, Object
    Management Group (2012), http://www.omg.org/spec/UML/2.4.1/
34. Queralt, A., Artale, A., Calvanese, D., Teniente, E.: OCL-Lite: Finite reasoning on
    UML/OCL conceptual schemas. Data & Knowledge Engineering 73, 1–22 (2012)
35. Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking
    and test data generation from modeling diagrams. Automation in Software Engi-
    neering 16, 73–99 (2009)
36. Solomakhin, D., Franconi, E., Mosca, A.: Logic-based reasoning support for SBVR.
    Fundamenta Informaticae 124(4), 543–560 (2013)
37. Song, I.Y., Chen, P.P.: Entity relationship model. In: Liu, L., Özsu, M.T. (eds.)
    Encyclopedia of Database Systems, vol. 1, pp. 1003–1009. Springer (2009)
38. Thalheim, B.: Extended entity relationship model. In: Liu, L., Özsu, M.T. (eds.)
    Encyclopedia of Database Systems, vol. 1, pp. 1083–1091. Springer (2009)
39. Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compi-
    lation. Synthesis Lectures on Data Management, Morgan & Claypool Publishers
    (2011)
40. Toman, D., Weddell, G.E.: On adding inverse features to the description logic
    CFD ∀ nc . In: PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim
    International Conference on Artificial Intelligence, Gold Coast, QLD, Australia,
    December 1-5, 2014. pp. 587–599 (2014)
41. Wagih, H.M., Zanfaly, D.S.E., Kouta, M.M.: Mapping Object Role Modeling 2
    schemes into SROIQ(D) Description Logic. International Journal of Computer
    Theory and Engineering 5(2), 232–237 (2013)