=Paper=
{{Paper
|id=Vol-1350/paper-50
|storemode=property
|title=Polynomial
encoding of ORM conceptual models in CFDI
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-50.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/FillottraniKT15
}}
==Polynomial
encoding of ORM conceptual models in CFDI==
Polynomial encoding of ORM conceptual models in CF DI ∀− nc Pablo Rubén Fillottrani1,2 , C. Maria Keet3 , David Toman4 1 Departamento de Ciencias e Ingenierı́a de la Computación, Universidad Nacional del Sur, Bahı́a Blanca, Argentina, prf@cs.uns.edu.ar 2 Comisión de Investigaciones Cientı́ficas, Provincia de Buenos Aires, Argentina 3 Department of Computer Science, University of Cape Town, South Africa mkeet@cs.uct.ac.za 4 Cheriton School of Computer Science, University of Waterloo, Canada david@cs.uwaterloo.ca Abstract. The use of conceptual models has long been confined to the data analysis stage of software development. In recent years, this has been extended to use them at run-time as well, for, among others, querying large amounts of data. This brings afore the need to have tractable logic- based reconstructions of the conceptual models, i.e., in at most PTIME. We provide such a logic-based reconstruction for most of ORM using the Description Logic language CFDI ∀− nc , which has several features impor- tant for conceptual models, notably n-ary relationships, complex identi- fication constraints, and role subsumption. The encoding captures over 96% of the constructs used in practice in the set of 33 ORM diagrams analysed. The results are easily transferable to EER and UML Class diagrams, with an even greater coverage. 1 Introduction While for many years conceptual models were developed and then shelved upon implementation or used ‘offline’ for documentation purposes, recent years has seen an increase in using such precious resources computationally. One strand of investigation focuses on expressiveness and a logic foundation to compute satisfiability and detect inconsistencies over the TBox only, among others [2, 5, 8, 16, 24], and several implementations exists using various technologies that are more or less scalable; e.g., using OCL [16], Alloy [8, 23], or a DL reasoner [19]. Another looks at usage during runtime for a range of purposes, such as using a conceptual model for scalable test data generation [35] and for designing [6] and executing [13] queries. Approximations of conceptual models are used also deeper into the machinery of querying databases, in particular during various stages of query compilation, e.g., when reasoning about duplicates [39]. The former purpose requires a logic foundation using a (very) expressive logic, whereas the latter requires computationally better behaved logics to keep the whole process feasible. What such a tractable logic language looks like that captures most, or all, important conceptual data modelling language features, has received little attention, however. To the best of our knowledge, there are only four efforts to capture a fragment of which concept or full satisfiability checking is (or is claimed to be) in polynomial time (or less): one for ER [2], two for UML [1, 25], and one for ORM [35], with the first one logic-driven and the last one implementation- driven, and they all differ substantially in coverage of features. The main general question, thus, is: what is a useful fragment for tractable runtime usage of a conceptual data model? Then, how to formalise it. To answer this question, we choose to focus on ORM first, for it is strictly more expressive than ER and UML class diagrams, and then facilitating trans- ferability of the results. Given that there is a lot of insight in computational com- plexity of Description Logic (DL) languages and which ones are in P, we zoom in on those for a logic-based reconstruction. It appears that the DL CF DI ∀− nc [40] is a good fit and can capture 96.5% of the ORM models in a dataset of 33 ORM models that are part of the dataset of 101 conceptual models (dataset available from [28]). This is chiefly thanks to ‘trading’ costly but lesser used covering constraints for the more often used arbitrary identifiers and n-ary relationships. In the remainder of the paper, we first provide background definitions in Section 2. The main results are presented in Section 3, and are discussed and compared with related research in Section 4. We summarize our results and briefly outline future work in Section 5. 2 Preliminaries The two preliminaries are the DL CF DI ∀− nc and the ORM language, so as to keep the paper self-contained. 2.1 The Description Logic CF DI ∀− nc All members of the CF D family of DLs are fragments of FOL with underlying signatures based on disjoint sets of unary predicate symbols called primitive concepts, constant symbols called individuals and unary function symbols called features. Definition 1 (CF DI ∀− nc Knowledge Bases) Let F, PC and IN be disjoint sets of (names of) functional features, primitive concepts and individuals, respec- tively. A path function Pf is a word in F∗ , and we denote the empty word by id and concatenation by “.”. Metadata and data in a CF DI ∀− nc knowledge base K are respectively defined by a TBox T and an ABox A. A TBox T consists of a finite set of inclusion dependencies of the form A v B, A v ¬B, A v ∀f.B, ∀f.A v B, A v ∃f −1 , or A v B : Pf 1 , . . . , Pf k → Pf where A, B ∈ PC, f ∈ F, and Pf i ∈ F∗ . A concept “B : Pf 1 , . . . , Pf k → Pf” that participates in the last dependency is called a path functional dependency (PFD). An ABox A consists of a finite set of facts in the form of concept assertions A(a), and function assertions f (a) = b where A ∈ PC, f ∈ F, and a, b ∈ IN. Any PFD occurring in T must also satisfy a regularity condition by adhering to one of the following two forms: C : Pf . Pf 1 , Pf 2 , . . . , Pf k → Pf or C : Pf . Pf 1 , Pf 2 , . . . , Pf k → Pf .g. (1) A PFD is a key if it adheres to the first of these forms. The semantics is defined in the standard way with respect to an interpretation I = (4, (·)I ), where 4 is a domain of “objects” and (·)I an interpretation function that fixes the interpretation of primitive concepts A to be subsets of 4, features f to be total functions on 4, and individuals a to be elements of 4. The interpretation function is extended to path expressions by interpreting id , the empty word, as the identity function λx.x, concatenation as function composition, and to derived concept descriptions as follows: (¬A)I =4 \ AI (∀f.A)I ={x | f I (x) ∈ AI } (∃f −1 )I ={x | ∃y ∈ 4 : f I (y) = x} Vk (C : Pf 1 , . . . , Pf k → Pf)I ={x | ∀y ∈ CI : ( i=1 Pf Ii (x) = Pf Ii (y)) ⇒ Pf I (x) = Pf I (y)} An interpretation I satisfies an inclusion dependency C v D if CI ⊆ DI , a con- cept assertion A(a) if aI ∈ AI , and a function assertion f (a) = b if f I (aI ) = bI . I satisfies a knowledge base K if it satisfies each inclusion dependency and asser- tion in K. In addition, every CF DI ∀− nc knowledge base must satisfy the following two conditions. 1. (inverse feature and value restriction interaction) If {A v ∃f −1 , ∀f.A0 v B} ⊆ T then (a) A v A0 ∈ T , (b) A0 v A ∈ T or (c) A v ¬A0 ∈ T . 2. (inverse feature and PFD interaction) Any non-key PFD occurring in T that involves features used in the ∃f −1 concept in T must also satisfy a stronger regularity condition by adhering to the following form: C : Pf .f, Pf 2 , . . . , Pf k → Pf .g. (2) Proposition 1 ([40]). Consistency of CF DI ∀− nc knowledge bases is complete for PTIME. Other reasoning tasks, such as logical implication and concept consistency reduce (linearly) to knowledge base consistency. Relaxing either of the aforementioned conditions leads to EXPTIME and PSPACE completeness, respectively [40]. Note also, that condition (2) imposed on PFDs applies only to non-key PFDs. Overall, however, the restrictions do not seem to impact the modeling utility of CF DI ∀− nc in relation to keys and functional constraints. Indeed, arbitrary functional dependencies in relational schema are easily captured. Finally and for convenience CF DI ∀− nc supports additional syntax, e.g., subsumptions of the form ∃f −1 v A. This additional syntax is mere syntactic sugar and can be equivalently expressed in CF DI ∀− nc as defined above [40]. Fig. 1. Sample ORM2 diagram for a particular university where, among others, each Support Staff must have exactly one Extension that is identified by a code, Support Staff and Academics are subtypes of Employee and are disjoint, each Room is identified by the Building and its RoomNr, each Employee has exactly one Name, and a support staff that supports a department also works for that department. 2.2 Brief overview of ORM2 Object-Role Modelling as a general term is also known as Fact-Based Model- ing and comprises several notational variants with varying language features, ranging from its predecessor NIAM and its recent notation in the proprietary CogNIAM tool of PNA, to the FCO-IM flavour [4], to ORM as in [21] popu- larised with VisioModeler 3.1 and ORM2 [22] popularised with the NORMA tool [15] as plugin for MS Visio, among others. We use Halpin’s notation [22] in the remainder of the paper for its compactness. As we are interested in the static, structural components, we ignore deontic constraints5 —they did not oc- cur in any of the evaluated ORM models anyway [28]—and derived constraints (ORM’s version of methods). ORM2 has four different kinds of named entities: – entity type, which is like an EER entity type and UML class and may be objectified, and is denoted with a blue rectangle with round corners; – value type, which is an entity type that has a binary fact type (relationship) to a data type, denoted with a blue dashed rectangle with round corners; – role that each entity/value type plays in a fact type, denoted with a small rectangle and connected to the object type or value type; – n-ary fact type (n ≥ 1) that relates entity types to each other or entity types to value types and they have to be elementary (uniqueness constraint spanning n or n − 1 roles), denoted with a rectangle composed of the roles. Typically, roles and fact types are named automatically, but this can be added, as indicated with the user-named role [DE] in Fig. 1; note that the text next to the fact types are ‘readings’ for model verbalisations, not fact type names. 5 Refer to [36] for a treatment of deontic constraints in the context of DLs and SBVR ORM has many constraint types (some of which are used rarely [28]); a non- exhaustive example is shown in Fig. 1. The ones used in the figure are mandatory (small blob), uniqueness (line over rectangle), reference schemes (simple identi- fiers, e.g., .empNr), and external identifiers (circle with double horizontal line), subtype (arrow) and subset (over roles, fact types, paths), disjointness (encir- cled cross); others include coverage, constraints on values, and 11 relationship constraints (a.o., transitive, irreflexive). As transformations exist from one ORM model to UML and to ER [22], ORM is ‘more conceptual’ than the other models for one does not have to commit to an implementation paradigm upfront. Moreover, the notation is also used for business rules specifications [32]. 3 ORM2cfd fragment into CF DI ∀− nc While we primarily focus on ORM to CF DI ∀− nc mapping here, we also aim for an easy extension to EER and UML Class diagrams, the ability to use it for inter-model assertions across models represented in different languages [17], and for unification of the conceptual modelling language families for a widely used subset of features. To this end, we first define the syntax and semantics of a ‘generic’ conceptual data modelling language, which we name CM− com , as it is, essentially, a proper fragment of CMcom —used as common conceptual data modelling language for EER, UML, and ORM [30]—without covering and dis- junctive mandatory constraints, and with limited cardinalities and a more precise definition of relationship subsumption and disjointness. CM− com contains those features mappable into CF DI ∀− nc (as described in Section 3.3) and captures a subset of features of ORM, named ORM2cfd . Thus, within the scope of this paper, one can equate CM− com and ORM2cfd . 3.1 Syntax The syntax is introduced first, and subsequently illustrated with an example. We assume a transformation where an ORM value type is encoded in CM− com as an attribute, and note that recursive relationships are allowed such that a class can participate more than once in a relationship. Definition 2 (Conceptual Data Model CM− − com syntax) A CMcom concep- tual data model is a tuple Σ = (L, rel, att, cardR , cardA , isaC , isaR , isaU , disjC , disjR , id, extid, fd, obj, rex) such that: 1. L is a finite alphabet partitioned into the sets: C (object type symbols), A (at- tribute symbols), R (relationship symbols), U (role symbols), and D (domain symbols); the tuple (C, A, R, U, D) is the signature of Σ. 2. att is a function that maps a class symbol in C to an A-labeled tuple over D, so that att(C) = {A1 : D1 , . . . , Ah : Dh } where h is a non-negative integer. 3. rel is a function that maps a relationship symbol in R to an U-labeled tuple over C, rel(R) = {Ui : Ci }hi=1 , h ≥ 1, Ui 6= Uj if i 6= j, h is called the arity of R, and if (Ui , Ci ) ∈ rel(R) then player(R, Ui ) = Ci and role(R, Ci ) = Ui . The signature of the relation is σR = {Ui }ni=1 . 4. cardR is a partial function cardR : R × U → {0, 1} × {1, ∞} denoting cardinality constraints. We denote with cmin(R, U ) and cmax(R, U ) the first and second component of cardR . 5. cardA is a partial function cardA : C × A → {0, 1} × {1, ∞} denoting multi- plicity constraints for attributes, such that cardA (C, A) is defined iff (A, D) ∈ att(C) for some D ∈ D. We denote with cmin(C, A) and cmax(C, A) the first and second component of cardA . 6. isaC is a binary relationship isaC ⊆ C × C. 7. isaR is a binary relationship isaR ⊆ R × R. isaR between relationships is restricted to relationships with compatible signatures. 8. isaU is a binary relationship isaU ⊆ U × U. 9. disjC is a relationship over 2C × C, describing disjointness partitions over groups of isa that share the same superclass. 10. disjR is a subset of 2R , describing disjointness over a group of relations (with compatible signatures). 11. id is a partial function, id : C → 2A , that maps a class symbol in C to its identifier (key) attributes; cardA (C, A) = (1, 1) holds for each A ∈ id(C). 12. extid is called an external uniqueness, which is a partial function that maps a class to a set of sets of relation-role pairs or attributes, extid : C → (R×U )∪A 22 , where for any S ∈ extid(C) it holds that at least one pair (R, U ) ∈ S, and whenever (R, U ) ∈ S then player(R, U ) = C, cardR (R, U ) = (1, 1), and when A ∈ S then cardA (C, A) = (1, 1). U 13. fd is a functional dependency assertion, a partial function fd : R → 22 ×U such that fd(R) may be defined only if its arity is ≥ 2 and for all Ui , U ∈ U such that ({Ui }, U ) ∈ fd(R) then exist Ci , C such that Ui : Ci ∈ rel(R) and U : C ∈ rel(R). 14. obj is an objectification partial function that maps an n-ary relationship symbol R ∈ R to a C ∈ C, i.e., obj : R → C. Whenever obj(R) = C and Ui : Ci ∈ rel(R), 1 ≤ i ≤ n then there exist n new binary relationships Ri ∈ R such that rel(Ri ) = {Ui1 : C, Ui2 : Ci }, with Ui1 , Ui2 ∈ U; extid(C) = {(Ui , Ci ) | 0 < i ≤ n}; cardR (Ri , Ui1 ) = (1, 1) and cardR (Ri , Ui2 ) = (0, 1). 15. rex is a subset of 2U describing disjointness partitions over a group of roles, i.e, if {Ui }hi=1 ∈ rex then exist Ri ∈ R, Ci ∈ C such Ui : Ci ∈ rel(Ri ), and all Ri have the same arity. To link this syntax to ORM’s icons, value types are transformed into attributes, any unary relationship is translated into a class and a binary relationship with a Boolean datatype, and a suitable naming scheme for the roles and fact types is in place. Example 1 Let us consider a mapping between this CM− com syntax and some of the mappable ORM icons of the ORM2cfd fragment, using the diagram in Fig. 1. The fact type verbalised with works for can be represented with the relationship rel(works) = {DE : Department, ED : Employee} Identification (single attribute, resp. external) of the Employee and Room with id(Employee) = {empNr} and extid(Room) = {{locBuildingName, roomNr}}. Subsumption of ORM entity types and fact types is straightforward as: Academic isaC Employee and supports isaR works and mandatory participation of Department in works as cardR (works, DE) = (1, ∞). ♦ 3.2 Semantics The model-theoretic semantics of CM− com in the light of ORM2cfd is as follows. Definition 3 (CMcom Semantics) Let Σ be a CM− − com conceptual data model. An interpretation for the conceptual model Σ is a tuple I = (∆I ∪ ∆ID , ·I ), such that: – ∆I is aSnonempty set of abstract objects disjoint from ∆ID ; – ∆ID = Di ∈D ∆Di is the set of basic domain values used in Σ; and – ·I is a function that maps: • Every basic domain symbol D ∈ D into a set DI = ∆D . • Every class C ∈ C to a set C I ⊆ ∆I . • Every attribute A ∈ A to a set AI ⊆ ∆I × ∆ID , such that for each A ∈ A and C ∈ C with A : D ∈ att(C) for some D ∈ D, then for all o ∈ C I there exists d ∈ DI such that (o, d) ∈ AI . • Every relationship R ∈ R to a set RI of tuples of U-labeled elements of ∆I : for R an n-ary relationship connecting the classes C1 , . . . , Cn , rel(R) = {Ui : Ci }ni=1 , i.e., {Ui : oi }ni=1 ∈ RI implies oi ∈ CiI . I is called a legal database state or legal application software state if it satisfies all of the constraints expressed in the conceptual data model: – for each C1 , C2 ∈ C: if C1 isaC C2 , then C1I ⊆ C2I . – for each R1 , R2 ∈ R (with the same signature): if R1 isaR R2 , then R1I ⊆ R2I . In addition, ORM stipulates that the participating role sequences (which in this case are all roles of Ri ) of every relationship participating in the R1I ⊆ R2I to be role-sequence compatible.6 – for each U1 , U2 ∈ U: if U1 isaU U2 , then there exist R1 , R2 ∈ R with player(R1 , U1 ) = C1 and player(R2 , U2 ) = C1 , and C1 , C2 ∈ C, and {o ∈ C1I such that U1 : o ∈ R1I } ⊆ {o ∈ C2I such that U2 : o ∈ R2I }. – for each C ∈ C, R ∈ R, U ∈ U: if cardR (R, U ) = (x, y) and player(R, U ) = C, then x ≤ #{o ∈ C I such that U : o ∈ RI } ≤ y. – for each C ∈ C, A ∈ A: if cardA (C, A) = (x, y), then A : D ∈ att(C) for some D ∈ D, and x ≤ #{d ∈ DI such that (o, d) ∈ AI , o ∈ C I } ≤ y. – if {Ci }ni=1 ∈ disjC then CiI 6= CjI for all i, j, j 6= i, 1 ≤ i, j ≤ n. – if {Ri }ni=1 ∈ disjR then Ri are role-sequence compatible and RjI ∩ RkI = ∅. – for each C ∈ C, Ai ∈ A, 1 ≤ i ≤ n: if id(C) = {Ai },Vthen exists Di ∈ D such n that Ai : Di ∈ att(C), and #{o ∈ C I such that i=1 (o, di ) ∈ AIi } ≤ 1 for I any di ∈ Di , 1 ≤ i ≤ n. – for each C ∈ C, Ri ∈ R, Ui ∈ U, 1 ≤ i ≤ h, Aj , 1 ≤ j ≤ l: if {(Ri , Ui )}hi=1 ∪ {Aj }lj=1 ∈ extid(C), then exist Dj ∈ D with Aj : Dj ∈ att(C), and #{o ∈ C I such that (Ui : o) ∈ RiI and (o, dj ) ∈ AIj } ≤ 1 for any dj ∈ DjI , 1 ≤ j ≤ l. 6 We assume that the compatibility is enforced explicitly by additional isaC pairs of the classes linked to the matching roles in the relationships, e.g., for U : C ∈ rel(R1 ) and U : D ∈ rel(R2 ) we have C isaC D. – for each U, Ui ∈ U, 1 ≤ i ≤ h, R ∈ R: if fd(R) = ({Ui }hi=1 , U ), then exists C, Ci ∈ C such that C : U ∈ rel(R), Ci : Ui ∈ rel(R) and #{o ∈ C I such that {U : o} ∪ {Ui : oi , } ∈ RI } ≤ 1 for any oi ∈ CiI , 1 ≤ i ≤ h. – for each R ∈ R, C ∈ C: if obj(R) = C, then exist Ri ∈ R with rel(Ri ) = {Ui1 : C, Ui2 : Ci }, for all 1 ≤ i ≤ n, and the following statements are also satisfied: extid(C) = {(Ri , Ui2 ) | 0 < i ≤ n}, cardR (Ri , Ui1 ) = (1, 1), and cardR (Ri , Ui2 ) = (0, 1). – for each Ui ∈ U, 1 ≤ i ≤ h: if {Ui }hi=1 ∈ rex, then exists Ri ∈ R with arity m, such that Ui : Ci ∈ rel(Ri ). Then for each o ∈ CiI the following condition is true #{k such that {Uk : o} ∈ RkI , 1 ≤ k ≤ h} ≤ 1. Σ is said to be globally consistent if it admits at least one legal state. 3.3 Mapping of ORM2cfd to CF DI ∀− nc We formalize how CF DI ∀− nc can capture ORM2cfd conceptual models. Let Σ = (L, rel, att, cardR , cardA , isaC , isaR , isaU , disjC , disjR , id, extid, fd, obj, rex) be a CM− com conceptual data model corresponding to ORM2cfd . We map Σ to a CF DI ∀− nc TBox TΣ in the vocabulary PC, F using the following rules: – Include in the vocabulary one concept name for each ORM2cfd object type and datatype, i.e., for each C ∈ C, D ∈ D we have C ∈ PC, D ∈ PC. – To map attributes, there are two cases to consider: if the attribute is func- tional then it is mapped as a function symbol and two concepts that reify each of the roles; otherwise it is reified as a new concept, with the corre- sponding cardinality constraints. For each C ∈ C such that att(C) = {Ai : Di }ni=1 , and for each i: • if cardA (C, Ai ) = (1, 1), then we introduce two new concepts UA 1 , U2 ∈ i Ai Ai Ai PC, a new function symbol ai ∈ F, and {C v ∀ai .Di , C v U1 , U1 v C, UA 2 v ∃ai i −1 , ∃ai −1 v UA 2 } ∈ TΣ . i • otherwise we introduce new concept symbols Ai , UA A i,1 , Ui,2 ∈ PC, and two new function symbols ai,1 , ai,2 ∈ F, with {Ai v ∀ai,1 .UA i,1 , Ai v −1 −1 ∀ai,2 .UAi,2 , U A i,1 v C, U A i,2 v D i , U A i,1 v ∃a i,1 , U A i,2 v ∃a i,2 , ∀a A i,1 .Ui,1 v Ai , ∀ai,2 .UA i,2 v Ai , Ai v Ai : ai,1 , ai,2 → id} ⊆ TΣ . If cmin(C, Ai ) = 1, then also C v UA i,1 ∈ TΣ ; if if cmax(C, Ai ) = 1, then Ai v Ai : ai,1 → id ∈ TΣ . – The mapping of relationships (ORM2cfd fact types) is similar as the map- ping of attributes. If the relationship is binary and one of its roles has a (1, 1) constraint, then it is mapped as an attribute; in the other cases the relationship and its roles are reified as a new concepts, with new attributes from the reified relationship to the reified roles. The reified roles are then subconcepts of the concepts originally participating in the relationship. For each R ∈ R such that rel(R) = {Ui : Ci }ni=1 then • if n = 2 and cardR (R, U1 ) = (1, 1), then we introduce a new function symbol u1 ∈ F, and {C1 v ∀u1 .C2 , C1 v C1 : u1 → id} ∈ TΣ . Similarly if cardR (R, U2 ) = (1, 1) then we introduce function symbol u2 ∈ F and {C2 v ∀u2 .C1 , C2 v C2 : u2 → id} ∈ TΣ . • otherwise we add new concept symbols R, UR i ∈ PC, 1 ≤ i ≤ n , new function symbols uR i ∈ F, 1 ≤ i ≤ n, and then {R v ∀uR R R i .Ui , Ui v −1 Ci , U R R i v ∃ui , ∀uR R n R R i .Ui v R}i=1 ∪ {R : u1 , . . . , un → id} ⊆ TΣ . Addi- tionally for each i, 1 ≤ i ≤ n if cmin(R, Ui ) = 1 then also Ci v UR i ∈ TΣ ; and if cmax(R, Ui ) = 1, then R v R : uR i → id ∈ TΣ . – for each C1 , C2 ∈ C such that C1 isaC C2 , then C1 v C2 ∈ TΣ . – for each R1 , R2 ∈ R such that R1 isaR R2 , then R1 v R2 ∈ TΣ . In order to define relationship inheritance in ORM2cfd , the types of the participating concepts must be compatible, therewith adhering to the syntax restriction of ORM (as aside: without this condition, the reconstruction in a PTIME language is not possible). – for each U1 , U2 ∈ U such that U1 isaU U2 then U1 v U2 ∈ TΣ . – for each Ci , C ∈ C such that {Ci }ni=1 disjC C, then {Ci v C}ni=1 ∪ {Ci v ¬Cj }ni6=j,i,j=1 ⊆ TΣ , stating the concepts are pairwise disjoint. – for each Ri ∈ R such that {Ri }ni=1 ∈ disjR , then {Ri v ¬Rj , Ri v Rj : σRj → id}ni6=j,i,j=1 ⊆ TΣ . Again, ORM has the condition that relationship exclusion must be defined over compatible types for the participating con- cepts (which also happens to be a necessary condition for the efficiency of the translation). – for each C ∈ C, Ai ∈ A such that id(C) = {Ai }ni=1 , then C v C : a1 , . . . , an → id ∈ TΣ . In this case since the attributes are keys then they must be in (1, 1) constraint with C, so they are mapped as features in TΣ by the first point of the respective rule, described above. – for each C ∈ C, Ri ∈ R, Ui ∈ U, Aj ∈ A such that {(Ri , Ui )}hi=1 ∪ {Aj }kj=1 ∈ extid(C), then C v C : uR R 1 , . . . , uh , a1 , . . . , ak → id ∈ TΣ . Here the only possibility is that Ui and Aj belonging to the external identifier have a (1, 1) constraint, so they are mapped with features. – for each U, Ui ∈ U, R ∈ R: if fd(R) = ({Ui }hi=1 , U ), then CR v CR : uR R R 1 , . . . , uh → u . This case is similar as the previous one: we ensure the roles and attributes belonging to the external identifier are mapped as features in TΣ because the are (1, 1) to C. – for each R ∈ R, C ∈ C: if obj(R) = C, then we have the mappings for extid(C), cardR (Ri , Ui1 ) = (1, 1), and cardR (Ri , Ui2 ) = (0, 1). – for each Ui ∈ U: if {Ui }hi=1 ∈ rex, then exists Ri ∈ R, Ci ∈ C with arity m, such that (Ui : Ci ) ∈ rel(Ri ), 1 ≤ i ≤ h. Then {UR R h i v ¬Uj }i6=j,i,j=1 ⊆ TΣ . Note that rex requires optional participation in the role, and therewith uses the second case of the relationship mapping above. Case analysis of the translation combined with Proposition 1 yields the following: Theorem 1. Let Σ be an ORM2cfd conceptual data model. A class C is con- sistent in Σ if and only if the knowledge base (TΣ , {C(aC )}) is consistent. Σ is globally consistent if and only if TΣ is consistent with A = {C(aC ) | C ∈ C}. Example 2 Consider again the running example with Fig. 1 and the illustration of the syntax (Example 1), the corresponding CF DI ∀− nc knowledge base contains, among others: the translation where empNr is an attribute of Employee with cardA (empNr, Employee) = (1, 1), the following CF DI ∀− nc axioms: {Employee v ∀empNr.Integer, Employee v UempNr 1 , UempNr 1 v Employee, UempNr 2 v ∃empNr−1 , ∃empNr−1 v UempNr 2 }. Then, to represent the identifier, we add Employee v Employee : empNr → id. The mapping of rel(works) with cardinalities cardR (works, DE) = (1, ∞) and cardR (works, ED) = (0, ∞) is the set of axioms −1 { Works v ∀deworks .DEworks , DEworks v Department, DEworks v ∃deworks , ∀deworks .DEworks v Works, Works v Works : deworks , edworks → id, Department v DEworks , Works v ∀edworks .EDworks , EDworks v Employee, −1 EDworks v ∃edworks , ∀edworks .EDworks v Works }, and likewise for the remainder of the diagram in Figure 1. ♦ 4 Discussion There are many papers with logic-based reconstructions of ORM, EER, and UML; we discuss a subset relevant to the scope of this paper. Comparison with other ORM2 Encodings. Fairly expressive logic-based reconstructions of ORM fragments exist, including ORM 2zero in the EXPTIME- complete ALCQI [20], ORM2− [30] in the EXPTIME-complete DLRifd [11], an ORM2 fragment (e.g., [41]) in SROIQ that is N2EXPTIME-Complete [27], and ORM in the undecidable FOL [21]. An Alloy encoding and a numeric model as encoding for ORM are proposed in [23], which are experimentally compared to unsatisfiability pattern checks, showing that the latter two far outperform the Alloy approach (seconds vs. hours and timeouts), but complexity results are not provided. Their ORM fragment of the number encoding does include ‘costly’ fea- tures, such as covering constraints, disjunctive mandatory, arbitrary frequency (with uniqueness check), external identifiers, and value constraints, but it is un- clear what was used in the test ORM diagrams. The only ORM fragment claimed in PTIME is Smaragdakis et al.’s ORM− [35] that also uses a number model. It includes non-overlapping uniqueness constraints over n-ary relationships, simple mandatory, non-overlapping frequency constraints (cardinalities > 1), value con- straints, and subtype constraints. Arbitrary frequency constraints (like arbitrary projections in a relational table) cause undecidability, but, though not specified in [35], one could assume it always occurs in conjunction with a suitable unique- ness constraint in order to regain decidability, as discussed in [23, 29]. Their value constraints are not constrained either, i.e., value ranges of integers, floats, and enumerations are allowed, and have no constraints, such as so-called “safe” data types [3]. Most problematic, however, is the integer bound propagation in step 2 of Algorithm 2 in [35], which has recently shown to be NP-Complete [7], hence, Smaragdakis et al.’s solution seems to be at least NP-Hard. To the best of our knowledge, the here provided logic-based reconstruction of the ORM2cfd ORM fragment in the PTIME CF DI ∀− nc is the first tractable encoding of ORM, yet still capturing most of the entities encountered in extant ORM models. Extensibility to EER and UML Class Diagrams. As ORM is more ex- pressive than UML Class Diagrams and EER, and an ORM diagram can be translated into UML and into ER [22], the results obtained should be at least as good for those. A quick matching thanks to availing of the unifying metamodel [31, 18] reveals that that is the case, where CF DI ∀− nc can encode over 97% of the 34 UML models and over 99% of the 34 (E)ER models in our dataset. This can be of use here as well, as also most UML Class Diagram encodings focus on expressiveness, using, among others, DLRifd [5] as well, or an OCL-lite en- coding matching ALCI [34], which is still EXPTIME-complete. Kaneiwa and Satoh claim to have some fragments of UML Class Diagrams in P and PSpace for full satisfiability checking (all classes must be satisfiable) [25, 26], but this has been proven otherwise by Artale et al. [1]. Focusing on coverage of features, the smallest restricted fragment is shown to be NLogSpace [1] when disallowing isa on associations and completeness on subclasses, using approximations of reified binaries (i.e., missing extid, and thus also no qualified associations). An ini- tial analysis shows that this might still capture almost 96% of the UML models in our dataset, and might thus also be a worthwhile fragment of UML. Such high coverage can be obtained partially due to the changes to UML v2.4.1 [33] where relational properties (asymmetry and transitivity) for aggregation have been dropped, with aggregation taking up about a quarter of all associations, and not all UML features in the standard are implemented in modelling tools. The story is similar for (E)ER. Various encodings exist [14, 37, 38], (partially due to the absence of a standard), which either use a language in the EXPTIME- complete DLR family [9–11] or DL-Lite family [12] with different computational complexities for different EER fragments [2]. Trading functionality for gaining a little in computational complexity [2], however, is certainly not an option if coverage is also an aim, especially due to all the identifiers in EER and ORM (which can be represented in CF DI ∀− nc ). 5 Conclusions A logic-based reconstruction for most of ORM using the PTIME Description Logic language CF DI ∀− nc has been presented, covering 96.5% of a set of extant ORM models. This is the first tractable encoding of ORM, which includes fea- tures important for conceptual models, notably n-ary relationships and complex identification constraints. Future work includes working toward implementations of the scenarios alluded to in the introduction, and we also expect to apply this encoding to facilitate inter-model interoperability [17] as the results are easily transferable to EER and UML Class diagrams. Acknowledgments. This work is based upon research supported by the Na- tional Research Foundation of South Africa (Project UID: 90041), the Argen- tinian Ministry of Science and Technology (PRF and CMK), and NSERC (DT). References 1. Artale, A., Calvanese, D., Ibáñez-Garcı́a, Y.: Full satisfiability of UML class dia- grams. In: Parsons, J., et al. (eds.) Proceedings of the 29th International Conference on Conceptual Modeling (ER’10). LNCS, vol. 6412, pp. 317–331. Springer (2010) 2. Artale, A., Calvanese, D., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: Rea- soning over extended ER models. In: Parent, C., Schewe, K.D., Storey, V.C., Thal- heim, B. (eds.) Proceedings of the 26th International Conference on Conceptual Modeling (ER’07). LNCS, vol. 4801, pp. 277–292. Springer (2007), auckland, New Zealand, November 5-9, 2007 3. Artale, A., Ryzhikov, V., Kontchakov, R.: DL-Lite with attributes and datatypes. In: Proceeding of the 20th European Conference on Artificial Intelligence (ECAI’12). pp. 61–66. IOS Press (2012) 4. Bakema, G., Zwart, J.P., van der Lek, H.: Volledig Communicatiegeoriënteerde Informatiemodellering FCO-IM. Academic Service (2005) 5. Berardi, D., Calvanese, D., De Giacomo, G.: Reasoning on UML class diagrams. Artificial Intelligence 168(1-2), 70–118 (2005) 6. Bloesch, A.C., Halpin, T.A.: Conceptual Queries using ConQuer-II. In: Proceedings of ER’97: 16th International Conference on Conceptual Modeling. LNCS, vol. 1331, pp. 113–126. Springer (1997) 7. Bordeaux, L., Katsirelos, G., Narodytska, N., Vardi, M.Y.: The complexity of in- teger bound propagation. Journal of Artificial Intelligence Research 40, 657–676 (2011) 8. Braga, B.F.B., Almeida, J.P.A., Guizzardi, G., Benevides, A.B.: Transforming On- toUML into Alloy: towards conceptual model validation using a lightweight formal methods. Innovations in Systems and Software Engineering 6(1-2), 55–63 (2010) 9. Calvanese, D., De Giacomo, G., Lenzerini, M.: On the decidability of query contain- ment under constraints. In: Proc. of the 17th ACM SIGACT SIGMOD SIGART Sym. on Principles of Database Systems (PODS’98). pp. 149–158 (1998) 10. Calvanese, D., De Giacomo, G., Lenzerini, M.: Reasoning in expressive description logics with fixpoints based on automata on infinite trees. In: Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI’99). pp. 84–89 (1999) 11. Calvanese, D., De Giacomo, G., Lenzerini, M.: Identification constraints and func- tional dependencies in description logics. In: Nebel, B. (ed.) Proc. of the 17th Int. Joint Conf. on Artificial Intelligence (IJCAI 2001). pp. 155–160. Morgan Kaufmann (2001), seattle, Washington, USA, August 4-10, 2001 12. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning 39(3), 385–429 (2007) 13. Calvanese, D., Keet, C.M., Nutt, W., Rodrı́guez-Muro, M., Stefanoni, G.: Web- based graphical querying of databases through an ontology: the WONDER system. In: Shin, S.Y., Ossowski, S., Schumacher, M., Palakal, M.J., Hung, C.C. (eds.) Proceedings of ACM Symposium on Applied Computing (ACM SAC’10). pp. 1389– 1396. ACM (2010), 22-26 March, 2010, Sierre, Switzerland 14. Chen, P.P.: The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems 1(1), 9–36 (1976) 15. Curland, M., Halpin, T.: Model driven development with NORMA. In: Proceedings of the 40th International Conference on System Sciences (HICSS-40). pp. 286a– 286a. IEEE Computer Society (2007), los Alamitos, Hawaii 16. Farré, C., Queralt, A., Rull, G., Teniente, E., Urpı́, T.: Automated reasoning on UML conceptual schemas with derived information and queries. Information and Software Technology 55(9), 1529 – 1550 (2013) 17. Fillottrani, P.R., Keet, C.M.: Conceptual model interoperability: a metamodel- driven approach. In: Bikakis, A., et al. (eds.) Proceedings of the 8th International Web Rule Symposium (RuleML’14). LNCS, vol. 8620, pp. 52–66. Springer (2014), august 18-20, 2014, Prague, Czech Republic 18. Fillottrani, P.R., Keet, C.M.: KF metamodel formalisation. Technical report 1412.6545v1 (September 2014), arxiv.org 19. Fillottrani, P.R., Franconi, E., Tessaris, S.: The ICOM 3.0 intelligent conceptual modelling tool and methodology. Semantic Web Journal 3(3), 293–306 (2012) 20. Franconi, E., Mosca, A.: The formalisation of ORM2 and its encoding in OWL2. Technical Report KRDB12-2, KRDB Research Centre, Faculty of Computer Science, Free University of Bozen-Bolzano, Italy (March 2012), http://www.inf.unibz.it/krdb/pub/TR/KRDB12-2.pdf 21. Halpin, T.: A logical analysis of information systems: static aspects of the data- oriented perspective. Ph.D. thesis, University of Queensland, Australia (1989) 22. Halpin, T., Morgan, T.: Information modeling and relational databases. Morgan Kaufmann, 2nd edn. (2008) 23. Jahangard-Rafsanjani, A., Mirian-Hosseinabadi, S.H.: Lightweight formalization and validation of ORM models. Journal of Logical and Algebraic Methods in Pro- gramming p. http://dx.doi.org/10.1016/j.jlamp.2015.03.001 (2015) 24. Jahangard Rafsanjani, A., Mirian-Hosseinabadi, S.H.: A Z approach to formaliza- tion and validation of ORM models. In: Ariwa, E., El-Qawasmeh, E. (eds.) Digital Enterprise and Information Systems, Communications in Computer and Informa- tion Science, vol. 194, pp. 513–526. Springer (2011) 25. Kaneiwa, K., Satoh, K.: Consistency checking algorithms for restricted UML class diagrams. In: Proceedings of the 4th International Symposium on Foundations of Information and Knowledge Systems (FoIKS’06). Springer Verlag (2006) 26. Kaneiwa, K., Satoh, K.: On the complexities of consistency checking for restricted UML class diagrams. Theoretical Computer Science 411, 301–323 (2010) 27. Kazakov, Y.: RIQ and SROIQ Are Harder than SHOIQ. In: Brewka, G., Lang, J. (eds.) 11th International Conference on Principles of Knowledge Representation and Reasoning (KR’08). pp. 274–284. AAAI Press (2008), 16-19 September, 2008, Sydney, Australia 28. Keet, C.M., Fillottrani, P.R.: Dataset of publicly available conceptual models and its analysis (2015), http://www.meteck.org/SAAR.html 29. Keet, C.M.: Prospects for and issues with mapping the Object-Role Modeling language into DLRif d . In: 20th International Workshop on Description Logics (DL’07). CEUR-WS, vol. 250, pp. 331–338 (2007), 8-10 June 2007, Bressanone, Italy 30. Keet, C.M.: Ontology-driven formal conceptual data modeling for biological data analysis. In: Elloumi, M., Zomaya, A.Y. (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, chap. 6, pp. 129–154. Wiley (2013) 31. Keet, C.M., Fillottrani, P.R.: Toward an ontology-driven unifying metamodel for UML class diagrams, EER, and ORM2. In: Ng, W., Storey, V.C., Trujillo, J. (eds.) 32nd International Conference on Conceptual Modeling (ER’13). LNCS, vol. 8217, pp. 313–326. Springer (2013), 11-13 November, 2013, Hong Kong 32. Object Management Group: Semantics of Business Vocabulary and Rules (SBVR) – OMG released versions of SBVR, formal/2008-01-02 (January 2008), http://www.omg.org/spec/SBVR/1.0 33. Object Management Group: Superstructure specification. Standard 2.4.1, Object Management Group (2012), http://www.omg.org/spec/UML/2.4.1/ 34. Queralt, A., Artale, A., Calvanese, D., Teniente, E.: OCL-Lite: Finite reasoning on UML/OCL conceptual schemas. Data & Knowledge Engineering 73, 1–22 (2012) 35. Smaragdakis, Y., Csallner, C., Subramanian, R.: Scalable satisfiability checking and test data generation from modeling diagrams. Automation in Software Engi- neering 16, 73–99 (2009) 36. Solomakhin, D., Franconi, E., Mosca, A.: Logic-based reasoning support for SBVR. Fundamenta Informaticae 124(4), 543–560 (2013) 37. Song, I.Y., Chen, P.P.: Entity relationship model. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, vol. 1, pp. 1003–1009. Springer (2009) 38. Thalheim, B.: Extended entity relationship model. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, vol. 1, pp. 1083–1091. Springer (2009) 39. Toman, D., Weddell, G.E.: Fundamentals of Physical Design and Query Compi- lation. Synthesis Lectures on Data Management, Morgan & Claypool Publishers (2011) 40. Toman, D., Weddell, G.E.: On adding inverse features to the description logic CFD ∀ nc . In: PRICAI 2014: Trends in Artificial Intelligence - 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, December 1-5, 2014. pp. 587–599 (2014) 41. Wagih, H.M., Zanfaly, D.S.E., Kouta, M.M.: Mapping Object Role Modeling 2 schemes into SROIQ(D) Description Logic. International Journal of Computer Theory and Engineering 5(2), 232–237 (2013)