Modeling and Reasoning with Multirelations, and their encoding in Alloy Peiyuan Sun, Zinovy Diskin, Michał Antkiewicz, and Krzysztof Czarnecki University of Waterloo {p25sun,zdiskin,mantkiew,kczarnec}@gsd.uwaterloo.ca Abstract. Multisets and multirelations arise naturally in modeling. In this paper, we present a sound and practical mathematical framework, which encodes multisets and multirelations using only ordinary sets and total functions. We implement the encoding as a multiconcepts library in Alloy, which is declarative, compatible with ordinary sets and relations, and can be incorporated into existing models seamlessly. 1 Introduction Graphical notations are good for direct intuitive modeling but may be difficult to formalize and lack precision. Text-based modeling languages are well-amenable to formalization and formal reasoning, but the meaning of a textual specification may be not immediate and difficult to grasp. Integration of the two frameworks is a challenge; it needs understanding of how they can interact and how their mutual translation could be mediated. In this paper, we investigate a special instance of the story with the following two parties in the two roles. The graphical modeling party is represented by the UML-style relational modeling, in which a relationship (an association in the UML parlance) between two classes is seen as a set of links represented by arrows. As in general the same two objects can be related by multiple links (the association ends are non-unique), we call such a relationship a multirelation. Note that even if we compose two ordinary relations, R1 :A→B and R2 :B →C, the result is, in general, a multirelation because there may be multiple b∈B such that (a,b)∈R1 and (b,c)∈R2. Converting this multirelation into an ordinary one means discarding some data and may be seen artificial and unsatisfactory from the modeling perspective. The textual modeling party is Alloy, which is specially tailored for relational modeling and analysis. However, Alloy does not provide a direct way to work with multirelations, which requires the modeler to look for a workaround and makes our story intriguing for an Alloy user. Indeed, considering the universe of ordinary relations closed under composition is, as mentioned above, an artificial and often counter-intuitive imposition on relational modeling. To integrate, we, first, formalize multirelations and operations over them in a category-theoretical framework of spans and span operations. Although the framework is well-known in category theory, its accurate presentation in the MDE literature seems to be novel and makes a theoretical contribution of the paper. (For instance, 73 the fact that the disjoint union of two sets is generally a multiset, and the notion of multiset product, in which elements’ roles are freely combined, have not yet been considered in the UML literature up to our knowledge of it.) Second, as the span framework for relational modeling is based on sets and total functions that are readily modeled in Alloy, spans open the door for a seemingly natural Alloy encoding of multirelational modeling. However, implementing the entire multirelational algebra in Alloy as a proper (but conservative) extension of the ordinary relational algebra turned out non-trivial and required solving numerous specific problems of Alloy encoding of elementary operations over sets and functions. To ease using our solution for an Alloy user, we implemented a library of utility modules allowing the user to seamlessly integrate multirelations into existing models; conservativity then ensures that the ordinary relation part of the model is not broken and can be used further if needed. Our plan for the paper is as follows. In Sect. 2 we present a simple running example of multirelational modeling, and show the main notions at work. Sect. 3 describes the mathematical framework. In Sect. 4, we demonstrate the usage of our Alloy module library by building an Alloy model for the running example. Sect. 5 presents the details of our encoding in Alloy, and in Sect. 6 we discuss how we evaluate the Alloy encoding. Related work is discussed in Sect. 7, and Sect. 8 concludes. 2 Multiconcepts naturally arise in modeling We use multiconcept as a generic term referring to multisets, multirelations, and specific multioperations over them. In this section, we present a simple modeling scenario showing how naturally multiconcepts can appear in modeling. We then formulate several basic requirements for a multiconcept modeling framework. 2.1 Running example: seasonal sales and multirelations The manager of a grocery store asks employee to prepare some bundles for the coming seasonal sale. A bundle contains several food items, and each item belongs to a certain product category. The manager imposes some rules on the bundle content, e.g., the following three: (0) Every bundle must have (contains) at least two items (1) Every bundle must have at least two dairy products; (2) Every bundle must have items from at least two product categories. To model the scenario, we identify three classes of objects: Bundle, Item, and Category, and two relations: contains:Bundle→Item and belongsTo:Item→Category as shown in Fig. 1(a) (ignore dashed blue arrows for a while). Note that a bundle may have repeated items according to the label ’non-unique’ at the respective association end. A simple instance of this model is shown in Fig. 1(b). In this instance, the relation contains consists of five links ci shown by arrows. Having two links relating the same pair (B1,Bread) means that the bundle B1 has two breads; we also say that the pair (B1,Bread) occurs twice in the relation, and the latter is then called a multirelation. The relation belongsTo is an ordinary relation consisting of three links/pairs bi (by default, it is considered bearing the ’unique’ label). 74 contains 2..* belongsTo 1 Bundle Item Category non-unique 2..* contains ; belongsTo non-unique (a) contains ;; belongsTo Item Bundle Category c11 Bread b1 c12 B1 Bakery c2 Butter c3 b2 B2 Dairy c4 b3 Milk contains belongsTo (b) Fig. 1: Bundling: a model (a) and its instance (b) In order to check the bundling rules described above, we need to compose the two relations and compute a new relation contains;;belongsTo:Bundle→Category, (dashed lower arrow in Fig. 1a), where we use double semi-colon to denote the composition operation. For the instance in Fig. 1b, the composition would consists of five composed links shown by dashed blue arrows, e.g., c11;b1, c12;b1, etc.. As the bundle B1 has only one composed link to Dairy, it violates rule (1), while bundle B2 has two Dairy links (c3;b2 and c4;b3), and thus satisfies (1). As for rule (2), we can see that bundle B1 satisfies whereas bundle B2 violates it. A bundle containing one bread, one butter, and one milk would have satisfied both rules. Now suppose that the bundle B1 has only one bread and hence contains would be an ordinary relation. Nevertheless, composition contains;;belongsTo of two ordinary relations would still be a multirelation containing two links from B2 to Diary. Hence, there are two types of composition of ordinary relations. One is precise and counts each pair of composable links as a separate composed link as we did above by creating two links from B2 to Diary. We call this composition multijoin and denote it by double semi-column. The other relational composition only deals with reachability, and links two elements (bundle B2 and Diary in the example) as soon as there is at least one composable pair of links between those elements, but only retains one link irrespective to the number of paths. We call this composition ordinary join and denote it by single semi-column. Clearly, ordinary join loses some information, which may be important to consider, e.g., for checking rule (1). Actually, this is a typical situation: computing the price of a bundle composed from prices of its items price:Item→int, or the weight of a bundle, or other aggregate queries (as they are called in the database literature), requires multijoins and multirelations. On the other 75 hand, there are situations whereby we do need the ordinary join, e.g., checking rule (2) in our example, where we are only concerned with reachability. 2.2 Seasonal sales and multisets Unary multirelations or multisets is also a natural modeling concept. For example, in Fig. 1(b), mapping contains defines bundle B1 as a multiset of items, in which item Bread occurs twice, and we write B1={{Bread,Bread,Butter}}, where double brackets indicate that we deal with multisets rather than sets. Importantly, as a unary multirelation, bundle B1 is to be seen as a multi-subset of set Item so that the expression above implicitly states that item Milk is not included into B1. Hence, to be accurate, we should specify the type of a multiset and write B1:MSub(Item), where MSub(Item) denotes the set of all multisubsets of Item. Similarly, bundle B2 is another multisubset {{Butter,Milk}} of the same type, which happens to be an ordinary (sub)set. The italicized reservation is important if, for example, we need to consider the union of two bundles, which should include item Butter twice: B1 ] B2 = {{Bread,Bread,Butter,Butter,Milk}}. Thus, union of two ordinary sets seen as multisets can result in a multiset (exactly like composition of two ordinary relations seen as multirelations can be a multirelation). On the other hand, in some contexts we may need an ordinary (non-counting) union operation, in which B1∪B2= {{Bread,Butter,Milk}}. To distinguish these two operations, we will call them ordinary union and multiunion (similarly to contrasting ordinary join and multijoin in relational composition). Thus, the universe of ordinary (sub)sets is closed under ordinary union, but is not closed under multiunion. Multiunion is often called disjoint union, but the fact that disjoint union of two sets can result in a multiset is often not recognized. 2.3 Requirements The discussion above leads to the following requirements for an effective multiconcepts framework. A modeler should be able to do the following on top of the ordinary sets and relations: 1. directly declare a multi(sub)set or a multirelation; 2. perform operations over multiconcepts; 3. control whether the result of an operation should be ordinary or multi. For example, it should be possible to compute the multijoin of an ordinary relation with a multirelation, or the multijoin of two ordinary relations, or the multiunion of a set and a multiset. Furthermore, the syntax and semantics of the extended framework should be conservative w.r.t. the ordinary operations over ordinary objects, so that the modelers could add multiconcepts to their existing models without having to restructure them. 3 Mathematical framework The diagram in Fig. 1(b) is intuitively simple: three boxes are sets, and two block arrows represent mappings between them. To make the diagram formal, we need 76 to formalize (i) the notion of mapping as a collection of links, and (ii) mapping composition. There are two ways of doing this. The first is well-known: a multirelation is a relation, in which each pair of elements is assigned with an integer called the multiplicity of the pair; correspondingly, multioperations amount to operations over numbers (see the accompanying TR [12] for details). We refer to this framework as multiplicity-based or numeric, but it is beyond our goals in the paper. The second approach is borrowed from category theory and is based on reification of links as indices, we call the approach index-based and present it below. We will begin with index-based formalization of relations, because it allows us to introduce the approach in a natural way by discussing the example in Sect. 2. In this way we come to our major construct of span. Then we introduce the index-based version of multisets—the notion of a family of elements. We will also define operations over spans, operations over families, and mixed compositions of spans with families. 3.1 Relations as spans and operations over them. We formalize the mapping contains in Fig. 1 by reifying all its constituent links as separate objects. This gives us a set Contains consisting of five elements ci as shown in Fig. 2. The special nature of these elements (they represent links) is formalized by mapping each of them to the source and the target elements of the respective link. For example, as element c1 ∈Contains reifies link c1 from B1 to Bread in Fig. 1, it is linked to B1 by the source link c1s, and to Bread by the target link c1t. All source links make a function slegContains :Bundle←Contains called the source leg, and all target links make a function tlegContains :Contains→Item called the target leg. The triple (Contains,slegContains ,tlegContains ) is called a span with head set Contains and legs as above. Similarly, relation belongsTo is formalized by the span with the head set BelongsTo (see Fig. 2). Definition 1 (span). A (finite) span R from a set A to a set B is a triple (headR , slegR ,tlegR ) with headR a set called the head, and slegR :A←headR , tlegR :headR →B two functions called legs. In formal diagrams, we will often use a shorter notation (HR ,sR ,tR ) for span’s components. We denote a span by a stroked arrow R:A9B, and will often use the same name for both the span and its head. If a span represents a total single-valued relation, i.e., a function f :A→B, its source leg is a bijection (e.g., the relation belongsTo in Fig. 1 is such). As the choice of the index set is arbitrary, we can take set A to be the head, and its identity idA :A→A as the source leg. Then the target leg is the function f itself. Note also that sets A and B in Def. 1 are actually placeholders (formal pa- rameters) for sets rather than actual sets. For example, if we want to specify a multirelation R = Spouse on a set Person1, then we define A = Person, B = Person, headSpouse = marriageContract, slegSpouse = spouse1 and tlegSpouse = spouse2, where functions spouse1 and spouse2 map a contract to the two spouses it binds. Formally, this procedure can be described as binding formal parameters, A, slegR etc. to actual 1 the same two persons can be multiply related, if, e.g., they got diversed and then re-married again, and hence may have several marriage contracts 77 tlegContains slegBelongsTo Contains Item BelongsTo slegContains c1 tlegBelongsTo c1t Bread b1 c2 c1s b2 Butter c3 c4 b3 Bundle Milk Category c5 B1 Bakery B2 slegiResult tlegiResult Dairy iResult r1 r2 r3 r4 slegResult tlegResult r5 Result Fig. 2: Indexed-based formalization of Fig. 1 values Person, spouse1 etc. Nodes and arrows in diagrams used in all our formal definitions below are formal parameters to be substituted by actual values (sets for nodes and functions for arrows), when applied in modeling situations. Now we proceed to the index-based formalization of sequential composition of relations. Our discussion of Fig. 1 showed that the core process is composition of links. As links now are reified as elements in the heads of the spans involved, first of all we need to find composable pairs of links. It is clear that links ci ∈ Contains and bj ∈ BelongsTo are composable iff the target of ci is the source of bj , i.e., tlegContains (ci)=slegBelongsTo(bj ). If this condition holds, the two links are composable, and we can create a new link from slegContains (ci)∈Bundle to tlegBelongsTo(bj )∈Category. Thus, the set of all composable links iResult (where i stands for ’inner’, and later we will also build an outer span)) is given by the following formula:  iResult= (ci,bj ): tlegContains (ci)=slegBelongsTo(bj ) . This set is equipped with two projection functions selecting, resp., the first or the second element of pair (ci,bj ), and we obtain a span iResult shown in Fig. 2 as the inner span. To finish the composition and obtain the resulting outer span Result, we need function composition: slegResult =slegiResult;slegContains and tlegResult = tlegiResult;tlegBelongsTo (see Fig. 2). Below we present an abstract formal specification of the procedure. Selection of the composable links is provided by the operation called (in category theory) pullback of functions. 78 Definition 2 (pullback). The pullback of two functions with a common target, f1 :A1 →Y and f2 :Y ←A2 is a span f1 AO 1 /Y O with head P ={(a1,a2)∈A1×A2 : f1(a1)=f2(a2)}. The legs of the span are projections pi :P →Ai with pi(a1,a2)=ai, i=1,2. p1 f2 The diagram on the right shows the respective pullback square P / A2 (note the angle near P denoting such squares). p2 It is easy to check that such a pullback square is commutative: p1;f1 =p2;f2. Now we can define span composition. Definition 3 (span composition). Let R1 :A9B, R2 :B 9C be two composable spans. Their (sequential) composition is a span B AO R1;;R2 =(R,s,t):A9C defined as shown on s1 the right, where arc arrows (s and t) denote t1 functions composed from two functions spanned s RO 1 /B O by arcs (we will use this convention further on). p1 s2 In more detail, we first compute the pullback of functions t1 and s2 to select all pairs of com- R p2 / R2 t2 7/ C posable links. Then we build the outer legs by composing p1;s1 and p2;t2. t If span R2 represents a function (total single-valued relation), we can per- form span composition more effectively by taking R2’s source leg to be the iden- tity, and hence tlegR2 =R2. Then we set headR1 ;;R2 =headR1 , slegR1 ;;R2 =slegR1 , and tlegR1 ;;R2 =tlegR1 ;tlegR2 . This span is isomorphic to any other span representing relation R2 with an arbitrary index set I bijective to B (because the relation is total and single-valued). For example, as relation belongsTo in our running example is a function, we can compose Contains and BelongsTo in this simpler way by identifying b1,2,3 with Bread, Butter and Milk resp. The result will be isomorphic to the span specified in Fig. 1 in the following sense. Definition 4 (span isomorphism). Two parallel spans R1 :A9B, R2 :A9B are isomorphic if there is a bijection sR2 AO o H between their heads, b:HR1 →HR2 , such that sR1 = b;sR2 9 R2 and tR1 = b;tR2 . The two commutativity conditions ensure sR1 b tR2 that if a link h∈HR1 is mapped to a link b(h)∈HR2 , then /B  HR1 both links have the same source and target. tR1 Thus, span composition is defined up to span isomorphism. Moreover, the process of relation indexing by a span is also defined up to isomorphism: we are free in choosing objects reifying links, but the source and target projection links of these objects are uniquely determined by the link being reified. In fact, everything in the indexing world is defined up to natural isomorphisms (bijections between index sets commuting with the respective functions). In this paper, we take particular representatives of equivalence classes defined by isomorphism like above, and perform operations over them. General results of category theory, in which standard operations over sets and functions are redefined via so called limits (e.g., pullback) and colimits (e.g., merge) up to isomorphism ensure that applying these operations to different but isomorphic representatives produces isomorphic results (see, e.g., [5]). 79 3.2 Multisets as families and operations over them. We use the same indexing idea by reifying element occurrences as unique indices to distinguish the multiple occurrences of the same element. In a sense, Definition 5 (families). Let A be a set (perhaps, infinite). A (finite) family over A is a function f :I →A from a finite index set I to A. The latter is called the ground set of family f, while the range set of function f, i.e., set {f(i): i∈I}⊂A, is often called the active domain of f (it is always finite as set @A^ f0 I is such). To avoid confusion, we will use the term ’range set’ f rather than ’active domain’. b / I0 Two families, f :I →A and f 0 :I 0 →A, over the same ground I set A are called isomorphic, and we write f ∼ = f 0, if there is a bijection b:I →I such that b;f =f. 0 0 Definition 6 (multiunion/sum/merge). Given two fam- ilies over A, fk :Ik →A, k =1,2, their multiunion (or sum, or f1 merge) f1 +f2 :I1]I2 →A is given by the diagonal arrow in I1 /A O the commutative diagram on the right, where ] denotes the i1 f2 operation of disjoint union, arrows i1, i2 are canonic injections,  z i2 and the diagonal function is defined by (f1 +f2)(i)=f1(i) if I1]I2 o I2 i∈I1, and (f1 +f2)(i)=f2(i) if i∈I2. We can use the operation of family multiunion for building disjoint union of sets (not surprisingly, as multiunion uses disjoint union of index sets). Given sets A and B, we form their union U =A∪B with two injections i:A→U and j :B →U. These injections can be seen as two families over the same ground set. Summing them gives us a family u:A]B →U, whose index set is the disjoint union of A and B. 3.3 Mixed setting: Families and spans, ordinary and multi Connections between ordinary and multi(sub)sets are realized via the following two functions. Given a set A, we have function dropA :MSub(A)→Sub(A) from multi- to ordinary subsets of A that ignores indexes and only cares about the range set of a family. Conversely, a (trivial) function liftA :MSub(A)←Sub(A) makes an ordinary subset X ⊂A into a multiset by taking I =X and f(x)=x for any index x. Clearly, dropA.liftAX = X for any X ⊂ A, but liftA.dropAf = 6 f as dropping discards the information about family f (where dot denotes function composition). Similarly, we have functions dropAB :MRel(A,B)→Rel(A,B) from the universe of all multirelations from A to B to the universe of all ordinary relations from A to B, and, conversely, liftAB :MRel(A,B)←Rel(A,B) defined in an obvious way. It is easy to see that the ordinary union of two ordinary sets X,Y can be pre- sented as drop(liftX +liftY ). Similarly, ordinary composition of two ordinary relations X ⊂A×B, Y ⊂B×C is drop(liftX;;liftY ). Of course, an effective implementation of the ordinary operations should not follow these patterns. In ordinary relational modleing, given a relation R:A9B, we often need to build the R-image of a subset X ⊂ A, and the R-preimage of a subset Y ⊂ B. In 80 multi-modeling, subsets are families and relations are spans, hence, we need the notions of the image and preimage of family w.r.t. a given span. Remarkably, both notions can be formally defined via pullbacks as described below. f Definition 7 (composing a family with a span). IO /A O Given a family f :I →A and a span S :A9B, their com- p 1 s position is a family f;;S :P →B defined by the diagram on p2 the right (recall that arc arrows denote functions obtained P /S t /7 B by composition of two functions spanned by the arc). This family is also called the S-image of f. f;;S The (inverse) composition of span S :A9B with a family g :J →B is a family S;;g :P →A defined by a diagram dual to the above: we begin with pullback of g and t, which gives us a pair of arrows (q1,q2), and then set S;;g =q1;s. This family is called the S-preimage of g. 3.4 Composition as navigation Span composition can be also defined in a more navigational way. Given a span R:A9B, we can represent it as a function φ:A→MSub(B) by represeting an el- ement a ∈ A as a “family” a∗ :{∗}→A with a∗(∗) = a, and defining φ(a) = a∗;;R. This is nothing but a special case of the general image-operation described in Def. 7. The latter can be described as a function φM R :MSub(A)→MSub(B). Now, if we have spans R1:A9B and R2:B 9C, we represent them as functions φR1 :A→MSub(B) and φR2 :B →MSub(C) resp. Span composition is then represented by the function composition φR1.φM R2 , which is not difficult to show equals to φR1;;R2 (see e.g. [3] for an elementary proof). That is, given an element a∈A, its R1;;R2-image (which is a family (multiset) over C) can be computed either relationally as (a∗;;R1);;R2=a∗;;(R1;;R2), or navigationally as a.φR1.φM R2 . Note also that the special case when multirelation R1 is not defined on a is well treated without exclusion, because then multiset φR1(a) will be empty (i.e., an empty family given by an empty function ∅:∅→B), and φM R2 (∅)=∅. The description above actually shows that span composition effectively prevents the NULL-navigation safety issue occurring in many textual languages such as OCL [13]. Indeed, we use empty multisets to represent the case of non-existence similarly to how other languages use NULLs. Any composition with an empty multiset results in an empty multiset without special treatment. Hence, the navigation is always safe. 4 Demonstration In this section, we demonstrate the usage of our index-based multiconcepts library in Alloy to model the scenario introduced in Section 2. The commented Alloy code below is the full model. 1 open multi // a utility module providing the function drop 2 3 open mrel[Bundle, Item] as Contains // declare a multirelation from Bundle to Item 4 open mrel[Item, Category] as BelongsTo // declare a multirelation from Item to Category 5 open mrel[Bundle, Category] as Result // to be the result of Contains;;BelongsTo 6 7 abstract sig Bundle {} // Bundle class 81 8 one sig B1, B2 extends Bundle {} // Bundle instances 9 abstract sig Item { // Item class 10 belongsTo: set Category // declare an ordinary relation from Item to Category 11 } 12 one sig Bread, Butter, Milk extends Item {} // Item instances 13 fact { 14 Bread.belongsTo = Bakery // Set up belongsTo relation 15 Butter.belongsTo = Dairy 16 Milk.belongsTo = Dairy 17 } 18 abstract sig Category {} // Category class 19 one sig Dairy, Bakery extends Category {} // Category instances 20 21 fact { 22 BelongsTo/liftedFrom[belongsTo] // lift the ordinary relation to a multirelation 23 Result/composedFrom[Contains/get, BelongsTo/get] // perform the multijoin 24 // rules 25 all b : Bundle | #(b<:Result/get:>Dairy) >= 2 // at least two dairy products 26 all b : Bundle | #(b.(drop[Result/get])) >= 2 // at least two categories 27 } 28 29 assert AllHaveBread { // based on our rules, all bundles should contain bread 30 all b: Bundle | some drop[b<:Contains/get].Bread 31 } 32 33 run {} for 6 Contains/Head, 3 BelongsTo/Head, 20 Result/Head 34 check AllHaveBread for 6 Contains/Head, 3 BelongsTo/Head, 20 Result/Head Listing 1.1: The bundling model The goal of this example is to show two typical usages of Alloy: instance finding and assertion checking. For instance finding, we ask the solver to find instances of the multirelation Contains (line 3) such that the rules (lines 25-26) are satisfied. Based on the rules, each bundle must contain some bread; the assertion AllHaveBread (line 29-31) is to check this fact. The module mrel allows for declaring multirelations for the given domain and range signatures (lines 3-5). Such a multi-relation can be lifted from an ordinary relation by assigning each tuple in the ordinary relation a unique index. In our example, the ordinary relation belongsTo declared on line 10 is fixed on lines 13-17. The multirelation BelongsTo (declared at line 4) represents the same information as the ordinary relation belongsTo by lifting (line 22). We need to multijoin Contains and BelongsTo to observe the categories of products in each bundle. Line 23 shows how an unconstrained multirelation Con- tains is multijoined with BelongsTo; the result is stored in Result. We then use the built-in domain and range restriction operators <: and :> and the function drop (drop the indices in a multirelation to make it ordinary) to state the rules “every bundle must contain at least two dairy products” (we care about multiplicity in this case) and “every bundle must contain items covering at least two product categories” (we do not care about multiplicity in this case). Fig. 3 shows a bundling instance generated by Alloy analyzer which satisfied the two rules. 5 Index-based multiconcepts library in Alloy In this section, we present how we encode the index-based formalization of multicon- cepts introduced in Section 3. We demonstate how we leverage language features of 82 Fig. 3: A bundling instance generated by Alloy analyzer Alloy to design a multiconcept library following the mathematical framework while em- phasizing clarity and ease of use. The library comes in three parts: two parametric mod- ules named mset and mrel for declaring multisets and multirelations; one plain mod- ule named multi providing operations over multiconcepts and a few utility functions. 5.1 Encoding of family and span Family and span are naturally generic structures. By specifying the type of the ground set in a family or the types of the source and target in a span, we can obtain concrete family or span of specific types. Such polymorphism is provided by parametric modules in Alloy. We employ this language feature to encode family and span: 1 module mrel [s, t] 1 module mset [g] 2 2 3 sig Head { sLeg: one s, tLeg: one t } 3 sig Idx { f : one g } 4 fun get[]: s -> Head -> t 4 fun get[] : Idx -> one g { f } 5 { a:s, h:Head, b:t| h.sLeg=a && h.tLeg=b } The first line of both listings declares a parametric module: module mset has one type parameter g standing for the ground set of a family; module mrel has two type parameters s and t standing for the source and target of a span. Family is encoded by the index set sig Idx, which has a totally-defined single- valued relation f (restricted by the quantifier one) to ground set g. Span is encoded by the head set sig Head, which has two totally-defined single-valued relations (restricted by the quantifier one) sLeg and tLeg to source set s and target set t. A family can be viewed as a totally-defined single-valued binary relation; a span can be viewed as a ternary relation by constructing a triple in which head set sits in the middle column along with source and target on the sides. Based on the views above, each module exposes a function get which returns its internal structure as a binary or a ternary relation (line 4). Returning them in such form has two benefits. First, in the module multi, we provide a few generic predicates and functions, which 83 work on both families and spans. These predicates and functions can be generic because we represent families and spans as ordinary binary and ternary relations and we can type the parameters of these predicates and functions as univ->univ or univ->univ->univ. Second, our encoding of multiconcepts is also automatically compatible with ordinary sets and relations. For example, a span can be effectively composed with a total single-valued relation by joining its target leg with that relation. Since we have returned the span as a ternary relation, the effective composition can simply be done by the built-in relational join operator (.). 5.2 Encoding of composition The very important operation of composition over families and spans is pullback, implemented as a predicate in the following listing which takes all the sets and relations in the pullback diagram as input parameters. 1 // in module multi 2 pred pullback[ X1: univ, X2: univ, f1: X1 -> one univ, f2: X2 -> one univ, f1 3 P: univ, p1: P -> X1, p2: P -> X2 ] { AO 1 /Y 4 (no X1 or no X2) implies { no P } else { O 5 all x1: X1 | all x2: X2 | x1.f1 = x2.f2 implies p1 f2 6 { one p: P | p.p1 = x1 && p.p2 = x2 } 7 #P = #(f1.~f2) 8 } P / A2 p2 9 } In the pullback diagram above, we assume the sets X1, X2 and Y with the func- tions f1 and f2 are given. By setting proper constraints, the solver will generate all correct instances in set P together with functions p1 and p2 to form a pullback square. We first deal with the case when set X1 or set X2 is empty (line 4), which implies set P is also empty. The constraint in line 5 and 6 simply obeys the definition of pullback. We iterate the elements in set X1 and X2. If elements x1:X1 and x2:X2 satisfy the condition x1.f1 = x2.f2, it implies that there must be one element p in set P which makes the pullback diagram commute by applying the constraint p.p1=x1 && p.p2=x2. This constraint sets a lower bound to the solution space. With this constraint solely, the solver will generate all the correct instances in set P, and possibly some garbage instances due to the lack of an upper bound constraint. We set an upper bound to the solution space using the fact that the cardinality of set P is equal to the cardinality of the composition result f1.˜f2 according to the definition of pullback. We use the pullback predicate to implement the composition. Before we go into details, we should notice that set P is fresh and disjoint with any other set in the pullback diagram, this is why pullback cannot be encoded as a function in Alloy because we cannot return a set that has not been declared before in a function. This is also the reason that we need to manually declare a multiset or a multirelation to hold the result of composition. 84 1 // in module mset [g] 1 // in module mrel [s, t] 2 open multi as m 2 open multi as m 3 // current family is composed from fami & span 3 // current span is composed from span1 & span2 4 4 pred composedFrom[ span1: s -> univ -> univ, 5 pred composedFrom[fami: univ -> one univ, 5 span2: univ -> univ -> t ] { 6 span: univ -> univ -> g ] { 6 let Hd1=span1.head, Hd2=span2.head, 7 let I = fami.idx, Hd = span.head, 7 sLeg1=span1.sleg, tLeg1=span1.tleg, 8 sLeg = span.sleg, tLeg = span.tleg 8 sLeg2=span2.sleg, tLeg2=span2.tleg 9 | some p1: Idx->I, p2: Idx-> Hd 9 | some p1: Head -> Hd1, p2: Head -> Hd2 10 | pullback[I, Hd, fami, sLeg, Idx, p1, p2] 10 | pullback[Hd1,Hd2, tLeg1,sLeg2, Head,p1,p2] 11 && f = p2.tLeg 11 && sLeg = p1.sLeg1 && tLeg = p2.tLeg2 12 } 12 } Predicates composedFrom encodes the composition in Def. 3 and 6 respectively. They restrict the current family or span to be the result of the composition of the given input parameters. We use the utility functions to extract the components of a family or a span. We then apply the pullback predicate on the corresponding sets and relations (line 10) and get the result family or span (line 11). We rely on skolemization performed by Alloy which enables higher-order quantification with an existential quantifier to generate the relations p1 and p2 on the fly without explicitly declaring them (line 9). 5.3 Other multioperations In our encoding, multiunion (merge) is simply a union of two families (assuming their index sets are disjoint): 1 // in module multi 2 fun merge[f1, f2: univ->one univ] : univ->one univ { f1 + f2 } Since we transform a span to a ternary relation, it is straightforward to implement the operation inverse by simply flipping the first and third column of the ternary relation; the built-in domain and range restriction operators are directly applicable to the returned ternary relation. 1 // flip the 1st and 3rd column 2 fun inverse[span: univ->univ->univ] : span { 3 ter/flip13[span] 4 } 5 // For the domain and range restriction, we can use the built-in operators <: and :> As discussed in Section 3.3, an ordinary set can be viewed as a multiset by assigning each element a unique index. In the same way, an ordinary relation can be seen as a multirelation. Therefore, we can transform an ordinary set or relation to a multiset or multirelation in our encoding. The operation lift is implemented as follows: 1 // in module mset [g] 1 // in module mrel [s, t] 2 2 3 pred liftedFrom[ G: g ] { 3 pred liftedFrom[r: s -> t] { 4 Idx.f = G && #Idx = #G 4 (~sleg).tleg = r && #Head = #r 5 } 5 } Conversely, we can transform a multiset or multirelation to an ordinary set or relation by dropping the indices, which is encoded as follows: 1 // in module multi 2 fun drop[f: univ -> one univ] : univ { rel/ran[f] } 85 3 4 fun drop[span: univ -> univ -> univ] : univ -> univ { 5 ter/select13[span] 6 } We can see that our library fullfils the requirements from Section 2.3: it provides 1) a way to directly declare a multiset or a multirelation; 2) an algebra of operations over multiconcepts; and 3) the lift and drop operations to control whether the result is ordinary or multi. 6 Evaluation 6.1 Correctness We assess the correctness of our encoding through testing, which uses assertions to check if the result obtained from the encoded operation is equal to the expected result we manually compute. By running the assertions in Alloy, if a counter-example is found, that means the encoding of the operation is incorrect, otherwise, the encoding passes the test. All the encoding of operations have passed the tests which increases the confidence that the implementation is correct, and we have released the test suite together with the library [12]. 6.2 Performance One of the core of the Alloy reasoning framework is instance finding, but dealing with multiconcepts significantly increases the search space. In a nutshell, suppose we are given sets A,B and a relationship R:A9B. If R is ordinary, then for every pair (a,b)∈A×B, there are only two possibilities: either (a,b)∈R and hence there is a link from a to b, or R does not include such a link. In contrast, if R is a multirelation, for every pair (a,b) there are infinitely many possibilities for the number of links from a to b. Hence, multiplicity constraints play an important role in narrowing the search space in order to gain reasonal performance on reasoning. We use the bundling model in Sect. 4 as the benchmark model. By manually controlling the scope of Contain/Head (Line 33), which sets an upper bound of the multiplicity of multirelation Contain, we investigate how the performance of Alloy analyzer’s instance finding on the model relates with the multiplicity of a multirelation. The results are shown in Table 1 and the visualization is shown in Fig. 4. The data includes the size of the model’s CNF encoding (vars and clauses) and the total analysis time. The Alloy analyzer is configured to use the MiniSAT solver and runs on a 2.4Ghz Core i7 machine with 8GB RAM. According to the diagram, the size of the model increases linearly with the scope of Contains/Head. The analysis time is fluctuating with the increase of scope but the overall analysis time is momentary when the scope is under 20. This result shows our Alloy multiconcept encoding is practical given a reasonable scope. 86 scope vars primary vars clauses analysis time (ms) 5 6860 314 13011 38 6 7496 340 14701 46 7 8128 366 16418 47 8 8756 392 18157 81 9 9378 418 19909 82 10 10000 444 21701 27 15 13012 574 31228 379 20 15748 704 41150 68 Table 1: Benchmark results (not exhaustively listed) and visualization 45000 450 40000 400 35000 350 30000 300 Time (ms) Amount 25000 250 20000 200 15000 150 10000 100 5000 50 0 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Scope of the head of span contain Vars Primary Vars Clauses Solve time Fig. 4: Visualization of the benchmark result 7 Related Work Translating UML class diagrams with OCL constraints (involving bags) into Alloy has been studied in [2] and [9]. In the former paper, bags/multisets were encoded in Alloy as sequences, which is inadequate as bags are not ordered. Neither of the papers considered translation of multirelations, which, as our example in Sect. 2 shows, restricts the practical applicability of the studies. From the Alloy world side, the closest related work is a question: “Are there multisets in Alloy?” [1] on the website Stack Overflow. The selected answer to this question proposes to represent a multiset as a function from a set of elements to the natural numbers. For a given element, the function returns the multiplicity of the ele- ment. The answer also provides implementation of merge, intersection, and difference, which compute the new multiplicities of the results (add the multiplicities for the merge, subtract them for difference, and take their minimum for intersection). The answer does not provide any information about multirelations and other operations over them such as composition. There is a large body of work about multisets treated numerically via multiplicities. Blizard presents a historical survey of the multiset theory [6], and traces back the idea of multisets as families to [11]. A category theoretic treatment of multisets and multirelations can be found in, resp., [10] and [7]. 87 8 Conclusion We have shown that simple categorical notions of family, span, and their composition via pullback can bridge the gap between graphical multirelational modeling and textual modeling in Alloy. Alloy users can now use multiconcepts in their models by importing our library module. There is previous work about transforming Alloy model to SMT [8]. Our encoding is also a blueprint for implementing multiconcept support for Satis- fiability Modulo Theories (SMT) solvers, whereby only total functions are available. In the future, we will use the presented encoding to implement support for mul- ticoncepts as first-class constructs in the modeling language Clafer [4], which uses Alloy as one of the backends. In Clafer, modelers can directly declare multisets and multirelations; however, both the syntax of Clafer has to be extended with the new operators over multi-concepts and the translation to Alloy has to implement the correct semantics. References 1. Are there multisets in Alloy? (2014), http://stackoverflow.com/questions/ 23579928/are-there-multisets-in-alloy, last accessed in Aug 2016 2. Anastasakis, K., Bordbar, B., Georg, G., Ray, I.: On challenges of model transformation from UML to Alloy. Software & Systems Modeling 9(1), 69–86 (2010) 3. Bąk, K., Diskin, Z., Antkiewicz, M., Czarnecki, K., Wąsowski, A.: Partial instances via subclassing. In: International Conference on Software Language Engineering. pp. 344–364. Springer (2013) 4. Bąk, K., Diskin, Z., Antkiewicz, M., Czarnecki, K., Wąsowski, A.: Clafer: unifying class and feature modeling. Software & Systems Modeling (2014) 5. Barr, M., Wells, C.: Category theory for computing science, vol. 49. Prentice Hall New York (1990) 6. Blizard, W.D., et al.: The development of multiset theory. Modern logic 1(4), 319–352 (1991) 7. Bruni, R., Gadducci, F.: Some algebraic laws for spans. Electr. Notes Theor. Comput. Sci. 44(3), 175–193 (2001) 8. El Ghazi, A.A., Taghdiri, M.: Relational reasoning via smt solving. In: International Symposium on Formal Methods. pp. 133–148. Springer (2011) 9. Maoz, S., Ringert, J.O., Rumpe, B.: CD2Alloy: Class diagrams analysis using alloy revisited. In: MoDELS, pp. 592–607. Springer (2011) 10. Monro, G.: The concept of multiset. Mathematical Logic Quarterly 33(2), 171–178 (1987) 11. Rado, R.: The cardinal module and some theorems on families of sets. Annali di Matematica pura ed applicata 102(1), 135–154 (1975) 12. Sun, P., Diskin, Z., Antkiewicz, M., Czarnecki, K.: Modeling and reasoning with multisets and multirelations in Alloy. Tech. Rep. GSDLAB-TR 2016-01-22, Generative Software Development Lab, University of Waterloo (January 2016), http://gsd.uwaterloo.ca/node/658 13. Willink, E.D.: Safe navigation in ocl. In: OCL 2015–15th International Workshop on OCL and Textual Modeling: Tools and Textual Model Transformations Workshop Proceedings. p. 81 (2015) 88