Modeling and Reasoning
    with Multirelations, and their encoding in Alloy

     Peiyuan Sun, Zinovy Diskin, Michał Antkiewicz, and Krzysztof Czarnecki

                              University of Waterloo
              {p25sun,zdiskin,mantkiew,kczarnec}@gsd.uwaterloo.ca


       Abstract. Multisets and multirelations arise naturally in modeling. In this
       paper, we present a sound and practical mathematical framework, which
       encodes multisets and multirelations using only ordinary sets and total
       functions. We implement the encoding as a multiconcepts library in Alloy,
       which is declarative, compatible with ordinary sets and relations, and can
       be incorporated into existing models seamlessly.


1    Introduction

Graphical notations are good for direct intuitive modeling but may be difficult to
formalize and lack precision. Text-based modeling languages are well-amenable to
formalization and formal reasoning, but the meaning of a textual specification may be
not immediate and difficult to grasp. Integration of the two frameworks is a challenge;
it needs understanding of how they can interact and how their mutual translation
could be mediated. In this paper, we investigate a special instance of the story with
the following two parties in the two roles.
    The graphical modeling party is represented by the UML-style relational modeling,
in which a relationship (an association in the UML parlance) between two classes
is seen as a set of links represented by arrows. As in general the same two objects
can be related by multiple links (the association ends are non-unique), we call such
a relationship a multirelation. Note that even if we compose two ordinary relations,
R1 :A→B and R2 :B →C, the result is, in general, a multirelation because there may
be multiple b∈B such that (a,b)∈R1 and (b,c)∈R2. Converting this multirelation
into an ordinary one means discarding some data and may be seen artificial and
unsatisfactory from the modeling perspective.
    The textual modeling party is Alloy, which is specially tailored for relational
modeling and analysis. However, Alloy does not provide a direct way to work with
multirelations, which requires the modeler to look for a workaround and makes
our story intriguing for an Alloy user. Indeed, considering the universe of ordinary
relations closed under composition is, as mentioned above, an artificial and often
counter-intuitive imposition on relational modeling.
    To integrate, we, first, formalize multirelations and operations over them in a
category-theoretical framework of spans and span operations. Although the framework
is well-known in category theory, its accurate presentation in the MDE literature
seems to be novel and makes a theoretical contribution of the paper. (For instance,


                                        73
the fact that the disjoint union of two sets is generally a multiset, and the notion
of multiset product, in which elements’ roles are freely combined, have not yet been
considered in the UML literature up to our knowledge of it.) Second, as the span
framework for relational modeling is based on sets and total functions that are readily
modeled in Alloy, spans open the door for a seemingly natural Alloy encoding of
multirelational modeling. However, implementing the entire multirelational algebra in
Alloy as a proper (but conservative) extension of the ordinary relational algebra turned
out non-trivial and required solving numerous specific problems of Alloy encoding of
elementary operations over sets and functions. To ease using our solution for an Alloy
user, we implemented a library of utility modules allowing the user to seamlessly
integrate multirelations into existing models; conservativity then ensures that the
ordinary relation part of the model is not broken and can be used further if needed.
    Our plan for the paper is as follows. In Sect. 2 we present a simple running
example of multirelational modeling, and show the main notions at work. Sect. 3
describes the mathematical framework. In Sect. 4, we demonstrate the usage of our
Alloy module library by building an Alloy model for the running example. Sect. 5
presents the details of our encoding in Alloy, and in Sect. 6 we discuss how we evaluate
the Alloy encoding. Related work is discussed in Sect. 7, and Sect. 8 concludes.


2     Multiconcepts naturally arise in modeling

We use multiconcept as a generic term referring to multisets, multirelations, and
specific multioperations over them. In this section, we present a simple modeling
scenario showing how naturally multiconcepts can appear in modeling. We then
formulate several basic requirements for a multiconcept modeling framework.


2.1   Running example: seasonal sales and multirelations

The manager of a grocery store asks employee to prepare some bundles for the coming
seasonal sale. A bundle contains several food items, and each item belongs to a certain
product category. The manager imposes some rules on the bundle content, e.g., the
following three:
(0) Every bundle must have (contains) at least two items
(1) Every bundle must have at least two dairy products;
(2) Every bundle must have items from at least two product categories.
     To model the scenario, we identify three classes of objects: Bundle, Item, and
Category, and two relations: contains:Bundle→Item and belongsTo:Item→Category
as shown in Fig. 1(a) (ignore dashed blue arrows for a while). Note that a bundle may
have repeated items according to the label ’non-unique’ at the respective association
end. A simple instance of this model is shown in Fig. 1(b). In this instance, the
relation contains consists of five links ci shown by arrows. Having two links relating
the same pair (B1,Bread) means that the bundle B1 has two breads; we also say
that the pair (B1,Bread) occurs twice in the relation, and the latter is then called
a multirelation. The relation belongsTo is an ordinary relation consisting of three
links/pairs bi (by default, it is considered bearing the ’unique’ label).


                                       74
                              contains        2..*                      belongsTo        1
                  Bundle                                     Item                                Category
                                     non-unique
                                                                                         2..*
                                                     contains ; belongsTo                   non-unique


        (a)
                                                     contains ;; belongsTo


                                                     Item
                Bundle                                                                       Category
                                     c11                    Bread
                                                                                b1
                                     c12
                     B1                                                                         Bakery
                                         c2
                                                            Butter
                                     c3                                         b2

                     B2                                                                          Dairy
                                     c4                                         b3
                                                             Milk


                                  contains                                   belongsTo
       (b)

                  Fig. 1: Bundling: a model (a) and its instance (b)

   In order to check the bundling rules described above, we need to compose the
two relations and compute a new relation

                           contains;;belongsTo:Bundle→Category,

(dashed lower arrow in Fig. 1a), where we use double semi-colon to denote the
composition operation. For the instance in Fig. 1b, the composition would consists
of five composed links shown by dashed blue arrows, e.g., c11;b1, c12;b1, etc.. As the
bundle B1 has only one composed link to Dairy, it violates rule (1), while bundle
B2 has two Dairy links (c3;b2 and c4;b3), and thus satisfies (1). As for rule (2), we
can see that bundle B1 satisfies whereas bundle B2 violates it. A bundle containing
one bread, one butter, and one milk would have satisfied both rules.
     Now suppose that the bundle B1 has only one bread and hence contains would be
an ordinary relation. Nevertheless, composition contains;;belongsTo of two ordinary
relations would still be a multirelation containing two links from B2 to Diary. Hence,
there are two types of composition of ordinary relations. One is precise and counts
each pair of composable links as a separate composed link as we did above by creating
two links from B2 to Diary. We call this composition multijoin and denote it by
double semi-column. The other relational composition only deals with reachability,
and links two elements (bundle B2 and Diary in the example) as soon as there is
at least one composable pair of links between those elements, but only retains one
link irrespective to the number of paths. We call this composition ordinary join
and denote it by single semi-column. Clearly, ordinary join loses some information,
which may be important to consider, e.g., for checking rule (1). Actually, this is a
typical situation: computing the price of a bundle composed from prices of its items
price:Item→int, or the weight of a bundle, or other aggregate queries (as they are
called in the database literature), requires multijoins and multirelations. On the other


                                                75
hand, there are situations whereby we do need the ordinary join, e.g., checking rule
(2) in our example, where we are only concerned with reachability.

2.2   Seasonal sales and multisets
Unary multirelations or multisets is also a natural modeling concept. For example,
in Fig. 1(b), mapping contains defines bundle B1 as a multiset of items, in which
item Bread occurs twice, and we write B1={{Bread,Bread,Butter}}, where double
brackets indicate that we deal with multisets rather than sets. Importantly, as a
unary multirelation, bundle B1 is to be seen as a multi-subset of set Item so that
the expression above implicitly states that item Milk is not included into B1. Hence,
to be accurate, we should specify the type of a multiset and write B1:MSub(Item),
where MSub(Item) denotes the set of all multisubsets of Item. Similarly, bundle
B2 is another multisubset {{Butter,Milk}} of the same type, which happens to be
an ordinary (sub)set. The italicized reservation is important if, for example, we
need to consider the union of two bundles, which should include item Butter twice:
B1 ] B2 = {{Bread,Bread,Butter,Butter,Milk}}. Thus, union of two ordinary sets
seen as multisets can result in a multiset (exactly like composition of two ordinary
relations seen as multirelations can be a multirelation). On the other hand, in some
contexts we may need an ordinary (non-counting) union operation, in which B1∪B2=
{{Bread,Butter,Milk}}. To distinguish these two operations, we will call them ordinary
union and multiunion (similarly to contrasting ordinary join and multijoin in relational
composition). Thus, the universe of ordinary (sub)sets is closed under ordinary union,
but is not closed under multiunion. Multiunion is often called disjoint union, but the
fact that disjoint union of two sets can result in a multiset is often not recognized.

2.3   Requirements
The discussion above leads to the following requirements for an effective multiconcepts
framework. A modeler should be able to do the following on top of the ordinary sets
and relations:
 1. directly declare a multi(sub)set or a multirelation;
 2. perform operations over multiconcepts;
 3. control whether the result of an operation should be ordinary or multi.
    For example, it should be possible to compute the multijoin of an ordinary relation
with a multirelation, or the multijoin of two ordinary relations, or the multiunion of a
set and a multiset. Furthermore, the syntax and semantics of the extended framework
should be conservative w.r.t. the ordinary operations over ordinary objects, so that
the modelers could add multiconcepts to their existing models without having to
restructure them.


3     Mathematical framework
The diagram in Fig. 1(b) is intuitively simple: three boxes are sets, and two block
arrows represent mappings between them. To make the diagram formal, we need


                                        76
to formalize (i) the notion of mapping as a collection of links, and (ii) mapping
composition. There are two ways of doing this. The first is well-known: a multirelation
is a relation, in which each pair of elements is assigned with an integer called the
multiplicity of the pair; correspondingly, multioperations amount to operations over
numbers (see the accompanying TR [12] for details). We refer to this framework as
multiplicity-based or numeric, but it is beyond our goals in the paper. The second
approach is borrowed from category theory and is based on reification of links as
indices, we call the approach index-based and present it below.
    We will begin with index-based formalization of relations, because it allows us
to introduce the approach in a natural way by discussing the example in Sect. 2. In
this way we come to our major construct of span. Then we introduce the index-based
version of multisets—the notion of a family of elements. We will also define operations
over spans, operations over families, and mixed compositions of spans with families.


3.1    Relations as spans and operations over them.

We formalize the mapping contains in Fig. 1 by reifying all its constituent links as
separate objects. This gives us a set Contains consisting of five elements ci as shown
in Fig. 2. The special nature of these elements (they represent links) is formalized
by mapping each of them to the source and the target elements of the respective
link. For example, as element c1 ∈Contains reifies link c1 from B1 to Bread in Fig. 1,
it is linked to B1 by the source link c1s, and to Bread by the target link c1t. All
source links make a function slegContains :Bundle←Contains called the source leg, and
all target links make a function tlegContains :Contains→Item called the target leg. The
triple (Contains,slegContains ,tlegContains ) is called a span with head set Contains and legs
as above. Similarly, relation belongsTo is formalized by the span with the head set
BelongsTo (see Fig. 2).
Definition 1 (span). A (finite) span R from a set A to a set B is a triple (headR ,
slegR ,tlegR ) with headR a set called the head, and slegR :A←headR , tlegR :headR →B
two functions called legs. In formal diagrams, we will often use a shorter notation
(HR ,sR ,tR ) for span’s components. We denote a span by a stroked arrow R:A9B,
and will often use the same name for both the span and its head.
    If a span represents a total single-valued relation, i.e., a function f :A→B, its
source leg is a bijection (e.g., the relation belongsTo in Fig. 1 is such). As the choice
of the index set is arbitrary, we can take set A to be the head, and its identity
idA :A→A as the source leg. Then the target leg is the function f itself.
    Note also that sets A and B in Def. 1 are actually placeholders (formal pa-
rameters) for sets rather than actual sets. For example, if we want to specify a
multirelation R = Spouse on a set Person1, then we define A = Person, B = Person,
headSpouse = marriageContract, slegSpouse = spouse1 and tlegSpouse = spouse2, where
functions spouse1 and spouse2 map a contract to the two spouses it binds. Formally,
this procedure can be described as binding formal parameters, A, slegR etc. to actual
1
    the same two persons can be multiply related, if, e.g., they got diversed and then
    re-married again, and hence may have several marriage contracts


                                          77
                                                  tlegContains               slegBelongsTo
                                   Contains                      Item                        BelongsTo
                 slegContains                c1                                                                 tlegBelongsTo
                                                    c1t             Bread                         b1


                                             c2
                         c1s
                                                                                                  b2
                                                                    Butter
                                             c3


                                             c4                                                   b3
        Bundle                                                        Milk                                                      Category

                                             c5
            B1
                                                                                                                                  Bakery


            B2
                                    slegiResult                                                tlegiResult                         Dairy


                                                                   iResult

                                                                        r1


                                                                        r2


                                                                        r3


                                                                        r4

                                slegResult                                                             tlegResult
                                                                        r5


                                                                   Result


                        Fig. 2: Indexed-based formalization of Fig. 1


values Person, spouse1 etc. Nodes and arrows in diagrams used in all our formal
definitions below are formal parameters to be substituted by actual values (sets for
nodes and functions for arrows), when applied in modeling situations.
    Now we proceed to the index-based formalization of sequential composition of
relations. Our discussion of Fig. 1 showed that the core process is composition of
links. As links now are reified as elements in the heads of the spans involved, first
of all we need to find composable pairs of links. It is clear that links ci ∈ Contains
and bj ∈ BelongsTo are composable iff the target of ci is the source of bj , i.e.,
tlegContains (ci)=slegBelongsTo(bj ). If this condition holds, the two links are composable,
and we can create a new link from slegContains (ci)∈Bundle to tlegBelongsTo(bj )∈Category.
Thus, the set of all composable links iResult (where i stands for ’inner’, and later we
will also build an outer span)) is given by the following formula:
                          
                  iResult= (ci,bj ): tlegContains (ci)=slegBelongsTo(bj ) .

This set is equipped with two projection functions selecting, resp., the first or
the second element of pair (ci,bj ), and we obtain a span iResult shown in Fig. 2
as the inner span. To finish the composition and obtain the resulting outer span
Result, we need function composition: slegResult =slegiResult;slegContains and tlegResult =
tlegiResult;tlegBelongsTo (see Fig. 2). Below we present an abstract formal specification
of the procedure.
    Selection of the composable links is provided by the operation called (in category
theory) pullback of functions.


                                                                 78
Definition 2 (pullback). The pullback of two functions
with a common target, f1 :A1 →Y and f2 :Y ←A2 is a span
                                                                           f1
                                                                    AO 1          /Y
                                                                                    O
with head P ={(a1,a2)∈A1×A2 : f1(a1)=f2(a2)}. The legs of
the span are projections pi :P →Ai with pi(a1,a2)=ai, i=1,2.      p1                  f2

The diagram on the right shows the respective pullback square       P           / A2
(note the angle near P denoting such squares).                             p2

    It is easy to check that such a pullback square is commutative: p1;f1 =p2;f2.
Now we can define span composition.
Definition 3 (span composition).               Let
R1 :A9B, R2 :B 9C be two composable
spans. Their (sequential) composition is a span            B AO
R1;;R2 =(R,s,t):A9C defined as shown on                  s1
the right, where arc arrows (s and t) denote                       t1
functions composed from two functions spanned         s   RO 1              /B
                                                                              O
by arcs (we will use this convention further on).        p1                     s2
In more detail, we first compute the pullback
of functions t1 and s2 to select all pairs of com-           R     p2
                                                                          / R2
                                                                                       t2
                                                                                            7/ C
posable links. Then we build the outer legs by
composing p1;s1 and p2;t2.
                                                                             t

    If span R2 represents a function (total single-valued relation), we can per-
form span composition more effectively by taking R2’s source leg to be the iden-
tity, and hence tlegR2 =R2. Then we set headR1 ;;R2 =headR1 , slegR1 ;;R2 =slegR1 , and
tlegR1 ;;R2 =tlegR1 ;tlegR2 . This span is isomorphic to any other span representing
relation R2 with an arbitrary index set I bijective to B (because the relation is total
and single-valued). For example, as relation belongsTo in our running example is a
function, we can compose Contains and BelongsTo in this simpler way by identifying
b1,2,3 with Bread, Butter and Milk resp. The result will be isomorphic to the span
specified in Fig. 1 in the following sense.
Definition 4 (span isomorphism). Two parallel spans
R1 :A9B, R2 :A9B are isomorphic if there is a bijection
                                                                                  sR2
                                                                       AO o                H
between their heads, b:HR1 →HR2 , such that sR1 = b;sR2                                   9 R2
and tR1 = b;tR2 . The two commutativity conditions ensure sR1                        b
                                                                                                 tR2

that if a link h∈HR1 is mapped to a link b(h)∈HR2 , then                                   /B
                                                                                               
                                                                      HR1
both links have the same source and target.                                        tR1

    Thus, span composition is defined up to span isomorphism. Moreover, the process
of relation indexing by a span is also defined up to isomorphism: we are free in
choosing objects reifying links, but the source and target projection links of these
objects are uniquely determined by the link being reified. In fact, everything in
the indexing world is defined up to natural isomorphisms (bijections between index
sets commuting with the respective functions). In this paper, we take particular
representatives of equivalence classes defined by isomorphism like above, and perform
operations over them. General results of category theory, in which standard operations
over sets and functions are redefined via so called limits (e.g., pullback) and colimits
(e.g., merge) up to isomorphism ensure that applying these operations to different
but isomorphic representatives produces isomorphic results (see, e.g., [5]).


                                             79
3.2   Multisets as families and operations over them.

We use the same indexing idea by reifying element occurrences as unique indices to
distinguish the multiple occurrences of the same element. In a sense,
Definition 5 (families). Let A be a set (perhaps, infinite).
A (finite) family over A is a function f :I →A from a finite
index set I to A. The latter is called the ground set of family
f, while the range set of function f, i.e., set {f(i): i∈I}⊂A,
is often called the active domain of f (it is always finite as set    @A^
                                                                           f0
I is such). To avoid confusion, we will use the term ’range set’
                                                                    f

rather than ’active domain’.                                           b      / I0
Two families, f :I →A and f 0 :I 0 →A, over the same ground I
set A are called isomorphic, and we write f ∼    = f 0, if there is
a bijection b:I →I such that b;f =f.
                     0             0


Definition 6 (multiunion/sum/merge). Given two fam-
ilies over A, fk :Ik →A, k =1,2, their multiunion (or sum, or               f1
merge) f1 +f2 :I1]I2 →A is given by the diagonal arrow in             I1          /A
                                                                                    O
the commutative diagram on the right, where ] denotes the               i1            f2
operation of disjoint union, arrows i1, i2 are canonic injections,      z i2
and the diagonal function is defined by (f1 +f2)(i)=f1(i) if        I1]I2 o        I2
i∈I1, and (f1 +f2)(i)=f2(i) if i∈I2.
     We can use the operation of family multiunion for building disjoint union of sets
(not surprisingly, as multiunion uses disjoint union of index sets). Given sets A and
B, we form their union U =A∪B with two injections i:A→U and j :B →U. These
injections can be seen as two families over the same ground set. Summing them gives
us a family u:A]B →U, whose index set is the disjoint union of A and B.


3.3   Mixed setting: Families and spans, ordinary and multi

Connections between ordinary and multi(sub)sets are realized via the following two
functions. Given a set A, we have function dropA :MSub(A)→Sub(A) from multi- to
ordinary subsets of A that ignores indexes and only cares about the range set of a
family. Conversely, a (trivial) function liftA :MSub(A)←Sub(A) makes an ordinary
subset X ⊂A into a multiset by taking I =X and f(x)=x for any index x. Clearly,
dropA.liftAX = X for any X ⊂ A, but liftA.dropAf =      6 f as dropping discards the
information about family f (where dot denotes function composition). Similarly, we
have functions dropAB :MRel(A,B)→Rel(A,B) from the universe of all multirelations
from A to B to the universe of all ordinary relations from A to B, and, conversely,
liftAB :MRel(A,B)←Rel(A,B) defined in an obvious way.
     It is easy to see that the ordinary union of two ordinary sets X,Y can be pre-
sented as drop(liftX +liftY ). Similarly, ordinary composition of two ordinary relations
X ⊂A×B, Y ⊂B×C is drop(liftX;;liftY ). Of course, an effective implementation of
the ordinary operations should not follow these patterns.
    In ordinary relational modleing, given a relation R:A9B, we often need to
build the R-image of a subset X ⊂ A, and the R-preimage of a subset Y ⊂ B. In


                                       80
multi-modeling, subsets are families and relations are spans, hence, we need the
notions of the image and preimage of family w.r.t. a given span. Remarkably, both
notions can be formally defined via pullbacks as described below.
                                                                  f
Definition 7 (composing a family with a span). IO                      /A
                                                                         O
Given a family f :I →A and a span S :A9B, their com- p
                                                            1          s
position is a family f;;S :P →B defined by the diagram on
                                                                  p2
the right (recall that arc arrows denote functions obtained P          /S      t   /7 B
by composition of two functions spanned by the arc). This
family is also called the S-image of f.                                f;;S

    The (inverse) composition of span S :A9B with a family g :J →B is a family
S;;g :P →A defined by a diagram dual to the above: we begin with pullback of g
and t, which gives us a pair of arrows (q1,q2), and then set S;;g =q1;s. This family
is called the S-preimage of g.

3.4     Composition as navigation
Span composition can be also defined in a more navigational way. Given a span
R:A9B, we can represent it as a function φ:A→MSub(B) by represeting an el-
ement a ∈ A as a “family” a∗ :{∗}→A with a∗(∗) = a, and defining φ(a) = a∗;;R.
This is nothing but a special case of the general image-operation described in Def. 7.
The latter can be described as a function φM   R :MSub(A)→MSub(B). Now, if we have
spans R1:A9B and R2:B 9C, we represent them as functions φR1 :A→MSub(B)
and φR2 :B →MSub(C) resp. Span composition is then represented by the function
composition φR1.φM R2 , which is not difficult to show equals to φR1;;R2 (see e.g. [3] for an
elementary proof). That is, given an element a∈A, its R1;;R2-image (which is a family
(multiset) over C) can be computed either relationally as (a∗;;R1);;R2=a∗;;(R1;;R2),
or navigationally as a.φR1.φM  R2 . Note also that the special case when multirelation R1
is not defined on a is well treated without exclusion, because then multiset φR1(a) will
be empty (i.e., an empty family given by an empty function ∅:∅→B), and φM         R2 (∅)=∅.
    The description above actually shows that span composition effectively prevents the
NULL-navigation safety issue occurring in many textual languages such as OCL [13].
Indeed, we use empty multisets to represent the case of non-existence similarly to
how other languages use NULLs. Any composition with an empty multiset results in
an empty multiset without special treatment. Hence, the navigation is always safe.


4      Demonstration
In this section, we demonstrate the usage of our index-based multiconcepts library
in Alloy to model the scenario introduced in Section 2. The commented Alloy code
below is the full model.
1   open multi                                    // a utility module providing the function drop
2
3   open mrel[Bundle, Item] as Contains         // declare a multirelation from Bundle to Item
4   open mrel[Item, Category] as BelongsTo    // declare a multirelation from Item to Category
5   open mrel[Bundle, Category] as Result           // to be the result of Contains;;BelongsTo
6
7   abstract sig Bundle {}                                                        // Bundle class


                                             81
8    one sig B1, B2 extends Bundle {}                                        // Bundle instances
9    abstract sig Item {                                                           // Item class
10     belongsTo: set Category             // declare an ordinary relation from Item to Category
11   }
12   one sig Bread, Butter, Milk extends Item {}                               // Item instances
13   fact {
14     Bread.belongsTo = Bakery                                     // Set up belongsTo relation
15     Butter.belongsTo = Dairy
16     Milk.belongsTo = Dairy
17   }
18   abstract sig Category {}                                                  // Category class
19   one sig Dairy, Bakery extends Category {}                             // Category instances
20
21   fact {
22     BelongsTo/liftedFrom[belongsTo]          // lift the ordinary relation to a multirelation
23     Result/composedFrom[Contains/get, BelongsTo/get]                 // perform the multijoin
24     // rules
25     all b : Bundle | #(b<:Result/get:>Dairy) >= 2              // at least two dairy products
26     all b : Bundle | #(b.(drop[Result/get])) >= 2                  // at least two categories
27   }
28
29   assert AllHaveBread {               // based on our rules, all bundles should contain bread
30     all b: Bundle | some drop[b<:Contains/get].Bread
31   }
32
33   run {} for 6 Contains/Head, 3 BelongsTo/Head, 20 Result/Head
34   check AllHaveBread for 6 Contains/Head, 3 BelongsTo/Head, 20 Result/Head

                             Listing 1.1: The bundling model

    The goal of this example is to show two typical usages of Alloy: instance finding
and assertion checking. For instance finding, we ask the solver to find instances of the
multirelation Contains (line 3) such that the rules (lines 25-26) are satisfied. Based
on the rules, each bundle must contain some bread; the assertion AllHaveBread
(line 29-31) is to check this fact.
    The module mrel allows for declaring multirelations for the given domain and
range signatures (lines 3-5). Such a multi-relation can be lifted from an ordinary
relation by assigning each tuple in the ordinary relation a unique index. In our
example, the ordinary relation belongsTo declared on line 10 is fixed on lines 13-17.
The multirelation BelongsTo (declared at line 4) represents the same information
as the ordinary relation belongsTo by lifting (line 22).
    We need to multijoin Contains and BelongsTo to observe the categories of
products in each bundle. Line 23 shows how an unconstrained multirelation Con-
tains is multijoined with BelongsTo; the result is stored in Result. We then use
the built-in domain and range restriction operators <: and :> and the function
drop (drop the indices in a multirelation to make it ordinary) to state the rules
“every bundle must contain at least two dairy products” (we care about multiplicity
in this case) and “every bundle must contain items covering at least two product
categories” (we do not care about multiplicity in this case). Fig. 3 shows a bundling
instance generated by Alloy analyzer which satisfied the two rules.


 5      Index-based multiconcepts library in Alloy
 In this section, we present how we encode the index-based formalization of multicon-
 cepts introduced in Section 3. We demonstate how we leverage language features of


                                             82
                      Fig. 3: A bundling instance generated by Alloy analyzer


    Alloy to design a multiconcept library following the mathematical framework while em-
    phasizing clarity and ease of use. The library comes in three parts: two parametric mod-
    ules named mset and mrel for declaring multisets and multirelations; one plain mod-
    ule named multi providing operations over multiconcepts and a few utility functions.

    5.1   Encoding of family and span
    Family and span are naturally generic structures. By specifying the type of the ground
    set in a family or the types of the source and target in a span, we can obtain concrete
    family or span of specific types. Such polymorphism is provided by parametric
    modules in Alloy. We employ this language feature to encode family and span:

                                               1   module mrel [s, t]
1   module mset [g]
                                               2
2
                                               3   sig Head { sLeg: one s, tLeg: one t }
3   sig Idx { f : one g }
                                               4   fun get[]: s -> Head -> t
4   fun get[] : Idx -> one g { f }
                                               5    { a:s, h:Head, b:t| h.sLeg=a && h.tLeg=b }

        The first line of both listings declares a parametric module: module mset has
    one type parameter g standing for the ground set of a family; module mrel has
    two type parameters s and t standing for the source and target of a span.
        Family is encoded by the index set sig Idx, which has a totally-defined single-
    valued relation f (restricted by the quantifier one) to ground set g. Span is encoded
    by the head set sig Head, which has two totally-defined single-valued relations
    (restricted by the quantifier one) sLeg and tLeg to source set s and target set t.
        A family can be viewed as a totally-defined single-valued binary relation; a span
    can be viewed as a ternary relation by constructing a triple in which head set sits
    in the middle column along with source and target on the sides. Based on the views
    above, each module exposes a function get which returns its internal structure as
    a binary or a ternary relation (line 4). Returning them in such form has two benefits.
    First, in the module multi, we provide a few generic predicates and functions, which


                                             83
    work on both families and spans. These predicates and functions can be generic
    because we represent families and spans as ordinary binary and ternary relations
    and we can type the parameters of these predicates and functions as univ->univ
    or univ->univ->univ. Second, our encoding of multiconcepts is also automatically
    compatible with ordinary sets and relations. For example, a span can be effectively
    composed with a total single-valued relation by joining its target leg with that relation.
    Since we have returned the span as a ternary relation, the effective composition can
    simply be done by the built-in relational join operator (.).


    5.2   Encoding of composition

    The very important operation of composition over families and spans is pullback,
    implemented as a predicate in the following listing which takes all the sets and
    relations in the pullback diagram as input parameters.
1   // in module multi
2   pred pullback[ X1: univ, X2: univ, f1: X1 -> one univ, f2: X2 -> one univ,        f1
3                   P: univ, p1: P -> X1, p2: P -> X2 ] {                      AO 1        /Y
4     (no X1 or no X2) implies { no P } else {
                                                                                             O
5     all x1: X1 | all x2: X2 | x1.f1 = x2.f2 implies                         p1                 f2
6       { one p: P | p.p1 = x1 && p.p2 = x2 }
7     #P = #(f1.~f2)
8     }                                                                         P          / A2
                                                                                      p2
9   }


        In the pullback diagram above, we assume the sets X1, X2 and Y with the func-
    tions f1 and f2 are given. By setting proper constraints, the solver will generate
    all correct instances in set P together with functions p1 and p2 to form a pullback
    square.
        We first deal with the case when set X1 or set X2 is empty (line 4), which implies
    set P is also empty. The constraint in line 5 and 6 simply obeys the definition of
    pullback. We iterate the elements in set X1 and X2. If elements x1:X1 and x2:X2
    satisfy the condition x1.f1 = x2.f2, it implies that there must be one element p
    in set P which makes the pullback diagram commute by applying the constraint
    p.p1=x1 && p.p2=x2.
        This constraint sets a lower bound to the solution space. With this constraint
    solely, the solver will generate all the correct instances in set P, and possibly some
    garbage instances due to the lack of an upper bound constraint. We set an upper
    bound to the solution space using the fact that the cardinality of set P is equal to the
    cardinality of the composition result f1.˜f2 according to the definition of pullback.
        We use the pullback predicate to implement the composition. Before we go into
    details, we should notice that set P is fresh and disjoint with any other set in the
    pullback diagram, this is why pullback cannot be encoded as a function in Alloy
    because we cannot return a set that has not been declared before in a function. This
    is also the reason that we need to manually declare a multiset or a multirelation to
    hold the result of composition.


                                               84
1    // in module mset [g]                           1 // in module mrel [s, t]
2    open multi as m                                 2 open multi as m
3    // current family is composed from fami & span 3 // current span is composed from span1 & span2
4                                                    4 pred composedFrom[ span1: s -> univ -> univ,
5    pred composedFrom[fami: univ -> one univ,       5                     span2: univ -> univ -> t ] {
6                       span: univ -> univ -> g ] { 6     let   Hd1=span1.head,   Hd2=span2.head,
7       let I = fami.idx, Hd = span.head,            7        sLeg1=span1.sleg, tLeg1=span1.tleg,
8           sLeg = span.sleg, tLeg = span.tleg       8        sLeg2=span2.sleg, tLeg2=span2.tleg
9       | some p1: Idx->I, p2: Idx-> Hd              9    | some p1: Head -> Hd1, p2: Head -> Hd2
10      | pullback[I, Hd, fami, sLeg, Idx, p1, p2] 10     | pullback[Hd1,Hd2, tLeg1,sLeg2, Head,p1,p2]
11     && f = p2.tLeg                               11   && sLeg = p1.sLeg1 && tLeg = p2.tLeg2
12   }                                              12 }


          Predicates composedFrom encodes the composition in Def. 3 and 6 respectively.
     They restrict the current family or span to be the result of the composition of the given
     input parameters. We use the utility functions to extract the components of a family or
     a span. We then apply the pullback predicate on the corresponding sets and relations
     (line 10) and get the result family or span (line 11). We rely on skolemization performed
     by Alloy which enables higher-order quantification with an existential quantifier to
     generate the relations p1 and p2 on the fly without explicitly declaring them (line 9).


     5.3   Other multioperations

     In our encoding, multiunion (merge) is simply a union of two families (assuming their
     index sets are disjoint):
1    // in module multi
2    fun merge[f1, f2: univ->one univ] : univ->one univ { f1 + f2 }

         Since we transform a span to a ternary relation, it is straightforward to implement
     the operation inverse by simply flipping the first and third column of the ternary
     relation; the built-in domain and range restriction operators are directly applicable
     to the returned ternary relation.
1    // flip the 1st and 3rd column
2    fun inverse[span: univ->univ->univ] : span {
3      ter/flip13[span]
4    }
5    // For the domain and range restriction, we can use the built-in operators <: and :>

         As discussed in Section 3.3, an ordinary set can be viewed as a multiset by
     assigning each element a unique index. In the same way, an ordinary relation can be
     seen as a multirelation. Therefore, we can transform an ordinary set or relation to a
     multiset or multirelation in our encoding. The operation lift is implemented as follows:

1    // in module mset [g]                           1   // in module mrel [s, t]
2                                                    2
3    pred liftedFrom[ G: g ] {                       3   pred liftedFrom[r: s -> t] {
4      Idx.f = G && #Idx = #G                        4     (~sleg).tleg = r && #Head = #r
5    }                                               5   }

         Conversely, we can transform a multiset or multirelation to an ordinary set or
     relation by dropping the indices, which is encoded as follows:
1    // in module multi
2    fun drop[f: univ -> one univ] : univ { rel/ran[f] }


                                                85
3
4   fun drop[span: univ -> univ -> univ] : univ -> univ {
5     ter/select13[span]
6   }


        We can see that our library fullfils the requirements from Section 2.3: it provides
    1) a way to directly declare a multiset or a multirelation; 2) an algebra of operations
    over multiconcepts; and 3) the lift and drop operations to control whether the result
    is ordinary or multi.


    6     Evaluation

    6.1   Correctness

    We assess the correctness of our encoding through testing, which uses assertions to
    check if the result obtained from the encoded operation is equal to the expected result
    we manually compute. By running the assertions in Alloy, if a counter-example is
    found, that means the encoding of the operation is incorrect, otherwise, the encoding
    passes the test. All the encoding of operations have passed the tests which increases
    the confidence that the implementation is correct, and we have released the test suite
    together with the library [12].


    6.2   Performance

    One of the core of the Alloy reasoning framework is instance finding, but dealing
    with multiconcepts significantly increases the search space. In a nutshell, suppose we
    are given sets A,B and a relationship R:A9B. If R is ordinary, then for every pair
    (a,b)∈A×B, there are only two possibilities: either (a,b)∈R and hence there is a link
    from a to b, or R does not include such a link. In contrast, if R is a multirelation, for
    every pair (a,b) there are infinitely many possibilities for the number of links from
    a to b. Hence, multiplicity constraints play an important role in narrowing the search
    space in order to gain reasonal performance on reasoning.
        We use the bundling model in Sect. 4 as the benchmark model. By manually
    controlling the scope of Contain/Head (Line 33), which sets an upper bound of the
    multiplicity of multirelation Contain, we investigate how the performance of Alloy
    analyzer’s instance finding on the model relates with the multiplicity of a multirelation.
        The results are shown in Table 1 and the visualization is shown in Fig. 4. The
    data includes the size of the model’s CNF encoding (vars and clauses) and the total
    analysis time. The Alloy analyzer is configured to use the MiniSAT solver and runs
    on a 2.4Ghz Core i7 machine with 8GB RAM.
        According to the diagram, the size of the model increases linearly with the scope
    of Contains/Head. The analysis time is fluctuating with the increase of scope but
    the overall analysis time is momentary when the scope is under 20. This result shows
    our Alloy multiconcept encoding is practical given a reasonable scope.


                                              86
       scope                        vars          primary vars                                  clauses                  analysis time (ms)
         5                         6860               314                                        13011                           38
         6                         7496               340                                        14701                           46
         7                         8128               366                                        16418                           47
         8                         8756               392                                        18157                           81
         9                         9378               418                                        19909                           82
         10                        10000              444                                        21701                            27
         15                        13012              574                                        31228                           379
         20                        15748              704                                        41150                            68
       Table 1: Benchmark results (not exhaustively listed) and visualization


                   45000                                                                                                               450

                   40000                                                                                                               400

                   35000                                                                                                               350

                   30000                                                                                                               300


                                                                                                                                             Time (ms)
          Amount


                   25000                                                                                                               250

                   20000                                                                                                               200

                   15000                                                                                                               150

                   10000                                                                                                               100

                   5000                                                                                                                50

                      0                                                                                                                0
                           5   6    7   8   9          10     11      12       13       14       15       16       17   18   19   20
                                                            Scope of the head of span contain

                                                Vars          Primary Vars          Clauses           Solve time


                               Fig. 4: Visualization of the benchmark result


7    Related Work

Translating UML class diagrams with OCL constraints (involving bags) into Alloy
has been studied in [2] and [9]. In the former paper, bags/multisets were encoded
in Alloy as sequences, which is inadequate as bags are not ordered. Neither of the
papers considered translation of multirelations, which, as our example in Sect. 2
shows, restricts the practical applicability of the studies.
    From the Alloy world side, the closest related work is a question: “Are there
multisets in Alloy?” [1] on the website Stack Overflow. The selected answer to this
question proposes to represent a multiset as a function from a set of elements to the
natural numbers. For a given element, the function returns the multiplicity of the ele-
ment. The answer also provides implementation of merge, intersection, and difference,
which compute the new multiplicities of the results (add the multiplicities for the
merge, subtract them for difference, and take their minimum for intersection). The
answer does not provide any information about multirelations and other operations
over them such as composition.
    There is a large body of work about multisets treated numerically via multiplicities.
Blizard presents a historical survey of the multiset theory [6], and traces back the
idea of multisets as families to [11]. A category theoretic treatment of multisets and
multirelations can be found in, resp., [10] and [7].


                                                                   87
8    Conclusion
We have shown that simple categorical notions of family, span, and their composition
via pullback can bridge the gap between graphical multirelational modeling and textual
modeling in Alloy. Alloy users can now use multiconcepts in their models by importing
our library module. There is previous work about transforming Alloy model to SMT [8].
Our encoding is also a blueprint for implementing multiconcept support for Satis-
fiability Modulo Theories (SMT) solvers, whereby only total functions are available.
    In the future, we will use the presented encoding to implement support for mul-
ticoncepts as first-class constructs in the modeling language Clafer [4], which uses
Alloy as one of the backends. In Clafer, modelers can directly declare multisets and
multirelations; however, both the syntax of Clafer has to be extended with the new
operators over multi-concepts and the translation to Alloy has to implement the
correct semantics.


References
 1. Are there multisets in Alloy? (2014), http://stackoverflow.com/questions/
    23579928/are-there-multisets-in-alloy, last accessed in Aug 2016
 2. Anastasakis, K., Bordbar, B., Georg, G., Ray, I.: On challenges of model transformation
    from UML to Alloy. Software & Systems Modeling 9(1), 69–86 (2010)
 3. Bąk, K., Diskin, Z., Antkiewicz, M., Czarnecki, K., Wąsowski, A.: Partial instances
    via subclassing. In: International Conference on Software Language Engineering. pp.
    344–364. Springer (2013)
 4. Bąk, K., Diskin, Z., Antkiewicz, M., Czarnecki, K., Wąsowski, A.: Clafer: unifying class
    and feature modeling. Software & Systems Modeling (2014)
 5. Barr, M., Wells, C.: Category theory for computing science, vol. 49. Prentice Hall New
    York (1990)
 6. Blizard, W.D., et al.: The development of multiset theory. Modern logic 1(4), 319–352
    (1991)
 7. Bruni, R., Gadducci, F.: Some algebraic laws for spans. Electr. Notes Theor. Comput.
    Sci. 44(3), 175–193 (2001)
 8. El Ghazi, A.A., Taghdiri, M.: Relational reasoning via smt solving. In: International
    Symposium on Formal Methods. pp. 133–148. Springer (2011)
 9. Maoz, S., Ringert, J.O., Rumpe, B.: CD2Alloy: Class diagrams analysis using alloy
    revisited. In: MoDELS, pp. 592–607. Springer (2011)
10. Monro, G.: The concept of multiset. Mathematical Logic Quarterly 33(2), 171–178 (1987)
11. Rado, R.: The cardinal module and some theorems on families of sets. Annali di
    Matematica pura ed applicata 102(1), 135–154 (1975)
12. Sun, P., Diskin, Z., Antkiewicz, M., Czarnecki, K.: Modeling and reasoning
    with multisets and multirelations in Alloy. Tech. Rep. GSDLAB-TR 2016-01-22,
    Generative Software Development Lab, University of Waterloo (January 2016),
    http://gsd.uwaterloo.ca/node/658
13. Willink, E.D.: Safe navigation in ocl. In: OCL 2015–15th International Workshop on
    OCL and Textual Modeling: Tools and Textual Model Transformations Workshop
    Proceedings. p. 81 (2015)


                                         88