Towards Flexible Model Analysis and
             Constraint Development:
    A Small Demo Based on Large Real-Life Data

                    Matthias Sedlmeier and Martin Gogolla

             Database Systems Group, University of Bremen, Germany
                   {ms|gogolla}@informatik.uni-bremen.de


      Abstract. This contribution discusses the handling of a larger model on
      an abstract level employing the standardized modeling languages UML
      and OCL and an accompanying tool. We represent real-world data in
      form of a large object model and perform data analysis, verification,
      and validation tasks on the modeling level in order to obtain feedback
      about the original data. The tool allows to explore larger object diagrams
      interactively in a flexible way through a combination of visual and textual
      techniques. Furthermore, model invariants can be created in a versatile
      fashion by iteratively considering relevant object diagram fractions and
      evolving OCL expressions.
      Keywords. Working with large models, Transformation and validation
      of large models, Visualization techniques for large models.


1    Introduction

Modeling and model management approaches have found their way into main-
stream software production. Thus they are applied to large and complex systems,
and approaches and tools have to deal with large models. In this contribution,
we discuss an approach handling larger object models representing real-world
data in a design tool originally elaborated for model validation and verification.
The aim is to perform analysis, verification, and validation on the modeling level
in terms of the languages UML [5] and OCL [8].
Working on the modeling level (in contrast to working on the code level or work-
ing with “production” systems as databases or programming languages) gives a
high degree of abstraction through the available advanced modeling and valida-
tion options. UML and OCL have in contrast to traditional (relational query)
languages the advantage of providing abstraction support for (a) different col-
lection kinds (as sequences or ordered sets), (a) powerful operations (as closure
or iterate), and (c) a mixture of textual and visual exploration mechanisms. Our
approach gives interactive and direct feedback through the combination of tex-
tual and visual aspects, in particular for exploring model properties and model
constraints. For large object models we support the exploration of queries and in-
variants that involve aggregation functions (that are of relevance in larger states
only) like the minimum or the average, as these functions are fully integrated
into OCL.
Regarding larger object models, [4] identifies queries and expressions (e.g. in
form of OCL) as an important field of activity. Our demo could further be part
of an MDE tool benchmark focusing on queries [1]. Furthermore, structuring
and slicing large class and objects models as in [7] is applicable in our context.
The rest of paper is structured as follows. Section 2 describes our process for
obtaining our example model, in particular a large object model. Section 3 dis-
cusses the exploration of large object models by combining visual and textual
techniques. Section 4 puts forwards an interactive process for constraint deter-
mination. Finally, Section 6 concludes our work and indicates further research.


2   Development Process for Class and Object Model

This section describes a real-life example for studying a large model. We wanted
to obtain a model containing a UML class diagram and a large UML object
diagram that can be handled interactively and flexible in UML modeling and
verification tools. We employ the UML and OCL design tool USE (Uml-based
Specification Environment) [3]. The tool supports model validation and verifi-
cation for various UML diagram kinds, among them class and object diagrams,
and OCL (Object Constraint Language) constraints in form of class invariants
and operation contracts. Recently, the capabilities of USE have been improved in
particular for handling large object diagrams (more efficient handling of object
collections and evaluation of OCL expressions).
Our goal was not to compete w.r.t size with production tools as e.g. database
systems. We aimed at an interactive development process exploring properties
in the underlying data (in technical terms, in the employed object diagram) and
exploring model constraints that have to be applied.
The German Federal Statistical Office (Destatis)1 provides a municipal directory
called GV100 of about 20.000 German administrative units, e.g., towns or vil-
lages, including about 60.000 structural connections based on a census in 2011.
This directory can be officially obtained on the Destatis web portal as a mono-
lithic column-oriented ASCII file together with record descriptions in natural
language required to interpret the provided data2 .
The ASCII records contain information about federal states (German: Bundes-
land), districts (Bezirk), counties (Kreis), municipalities associations (Gemein-
deverband) as well as municipalities (Gemeinde). Furthermore, the hierarchical
connections between those units are given as well as the places of administration.
We have complemented this data with additional geographical position informa-
tion provided by the OpenGeoDB project3 and obtain municipality records pro-
1
  https://www.destatis.de/DE/Startseite.html
2
  https://www.destatis.de/DE/ZahlenFakten/LaenderRegionen/Regionales/
  Gemeindeverzeichnis/Administrativ/Archiv/GV100ADQ/GV100AD3108.html
3
  http://opengeodb.org/wiki/OpenGeoDB
viding information about area, population, male population, female population,
zip code as well as geographical information in form of longitude and latitude.


          Fig. 1. Development process for UML class and object diagram
Figure 1 shows the process that we have applied in order to obtain an initial
version of the class and object diagram. We started with the manual extraction
of a data schema based on the Destatis GV100 record descriptions (1). As an
intermediate representation, we have created this schema with the graph editor
yEd in a UML/ER like modeling language (as in [6]). Moreover, we consider
geographical information from the OpenGeoDB project.
In the next step, we perform an automatic transformation of the UML/ER
schema into a valid USE specification (2) describing a USE class model (3).
Figure 2 shows the central elements of the resulting UML class diagram. This
model contains those classes and associations required for instantiating an object
model. We have enriched the USE specification by OCL constraints and queries
for verification and validation purposes (4), which are manually elaborated by
analyzing the given record descriptions as well as the concrete ASCII records.
For the schema at hand, we semi-automatically extract the instance data as
CSV files by means of Destatis record descriptions (5). The CSV data is then
automatically transformed into USE data manipulation statements formulated
in SOIL (6), which are interpreted by USE to instantiate the object model (7).
SOIL (‘Simple Ocl-like Imperative Language‘) is the USE “programming” lan-
guage based on OCL. From this point on, we are able to interactively work in
a flexible way with the class model including the object diagram employing the
USE graphical interface, the command line interface and a combination of them
in order to apply verification and validation tasks based on OCL constraints and
queries.
                            FederalState        federalState
                                                                                district
                                                           federalState                          District
     administeredFederalState
                                                                                           native_fs_id : String
                                                                                                                     district
                                                         federalState
                                          federalState
                                                                                                    district   district
                                                                   administeredDistrict


                                                                                                                                                   county

                                                                                                                                                            County
                                                                                                                                        county
                                                                                                                                                   native_fs_id : String
                                                                                                                                                   native_district_id : String
                                                                                                                   administeredCounty
                                                                                                                                                        county     county

                                          districtAdministration


                                         municipality                     countyAdministration                                                   municipalitiesAssoc
     federalStateAdministration
                                                                                                                                                                           municipalitiesAssoc
                                                          municipality                                                            municipalitiesAssoc

                                                                                  municipality                                                                 MunicipalitiesAssoc
                            Municipality
                                                                                                                   administeredMunicipalitiesAssoc          native_fs_id : String
                  area : Integer
                                                                          municipalitiesAssocAdministration                                                 native_district_id : String
                  population : Integer
                                                                                                                                                            native_county_id : String
                  male_population : Integer
                  zip_code : String                                                                                                municipalitiesAssoc
                  zip_code_exemplary : Boolean
                  native_fs_id : String
                                                         municipality
                  native_district_id : String
                  native_county_id : String
                  native_ma_id : String
                  latitude : Real
                  longitude : Real


                     Fig. 2. Central elements of the UML class model for GV100


3   Exploring Object Models Visually and Textually

This section will explain the various interactive, but powerful options that USE
offers to explore a large object diagram. The considered object diagram contains
about 80.000 objects and links, as indicated in Fig. 3. Naturally, these objects
do not fit on a single screen. USE allows to choose specific instances to cre-
ate clearly arranged object model fractions. USE offers various object selection
mechanisms [2], which effectively help to filter the desired subset. The techniques
from [2] have been substantially extended in order to cope with object models
showing the example’s complexity. In USE, one can obtain fractions of the overall
state fitting the developer’s needs.


                           Fig. 3. Number of objects and links in an object model
First, it is possible to start with an empty object diagram to prevent long display
times. We can then choose to show successively objects either by their type or
by their properties.
Second, in addition to this simple filtering method, USE offers sophisticated
object selection based on link path length, OCL query expressions and views.
The latter method enables the developer to select objects specifically within a
table view. The chosen set of objects can then be shown, hidden or cropped. A
corresponding example is shown in Fig. 4, where a specific FederalState object
(Schleswig-Holstein) and a Municipality object (Kiel) are manually selected by
check boxes. Besides the objects, also all connecting links show up.


              Fig. 4. Object selection based tables visually presented
Third, selection via link path length enables the designer to show only those
objects, which are reachable from a given object over a specified number of link
segments.
Fourth, the most flexible selection mechanism is provided via OCL query expres-
sions, which allows the developer to show, hide or crop arbitrary sets of objects
based on OCL query results being either single objects or object collections. Fig-
ure 5 shows an object model fraction containing only those federal states holding
municipalities, which have a population above 100.000 and are located east of a
fixed longitude and south of a fixed latitude. The example OCL expression only
returns 3 objects, depending on the restrictness of the expression large object
collections can be obtained.

Figure 6 illustrates a more detailed scenario, where some subordinated admin-
istrative units of the county Merzig-Wadern (where the Informatics meeting
center “Dagstuhl” is located) are displayed. The example explores the adminis-
trative connections between all involved units (the county Merzig-Wadern be-
longs to the federal state Saarland and contains the municipality association
Wadern, Stadt, which is at the same time also defined as a municipality; the
county Merzig-Wadern also contains other municipalities like Merzig, Mettlach
or Losheim am See).
          Fig. 5. Object selection by OCL expression visually represented


           Fig. 6. Result of combining textual and visual object selection
Thus the model exploration options available in USE allow a combination of
textual and visual techniques. The example object model display in Fig. 6 was
created by evaluating multiple OCL expressions exploring the shown objects.
USE provides the possibilities to construct object models stepwise with different
OCL queries. This flexibility of switching between visual selection and evaluation
of textual OCL expressions is, to the best of our knowledge, unique to USE.


4   Interactive Determination of Invariants

This section discusses how OCL model invariants can be developed interactively
by considering the system state, formulating a hypothetical invariant, checking
the assumed invariant against the state, and possibly revising the formulation
of the invariant when the system state does not satisfy it and thus the invari-
ant cannot be validated. We follow an empirical approach, where we first make
assumptions about rules, which should apply in the model application context.
We therefore explore the schema information as well as the system state. Based
on these assumptions we formulate an invariant, which we check by considering
the system state in USE. If the invariant holds, we accept it as part of our USE
model. If it does not hold, we analyze the result in order to determine, if our
condition is invalid due to wrong premises or if the given object model (in the
example determined by the Destatis information) is actually inconsistent. In the
first case, we check if it is reasonable to adapt the invariant and start again. In
the second case, we accept, that the underlying information (in the example the
Destatis data) is possibly to a certain degree faulty and extend the USE spec-
ification with a modified invariant preventing the faulty information to go into
the validation process (e.g., by weakening an invariant by adding an implication
with a weakening premise).
In the example, the longitude and latitude attributes are not part of the original
Destatis data but the information was taken from the OpenGeoDB project.
This does not have any effect on the evaluation of the GV100 records. In fact,
we would able to partly check the OpenGeoDB data, too.


4.1   Invariant popGreaterThanZero

We assume, that all municipalities have a population (pop) greater than zero:

context Municipality inv popGreaterThanZero:
  population > 0

We realize, that this invariant does not hold in the first inspection. USE allows us to
check, which instances violate invariants either by the Class Invariant View or by the
interactive shell check -d command. This is again an example where USE supports
the developer through visual and textual techniques.
Through the analysis, we recognize, that all objects breaking our condition have the
same municipality type assigned. This type classifies all assigned municipalities explic-
itly as uninhabited territory (German: ‘gemeindefreies Gebiet, unbewohnt’). Based on
this insight, we adapt our condition and accept it as part of our USE specification.

context Municipality inv popGreaterThanZero:
  (type.name <> ’gemeindefreies Gebiet, unbewohnt’) implies
  (population > 0) and (male_population > 0)

This invariant holds in the system state, from which we can conclude, that the Destatis
data is valid in this specific context.


4.2   Invariant malePopLessPop

Our premise is, that the male population must always be less than the overall popula-
tion. This assumption is valid, because we explored the system state and validated, that
there are no “all-male municipalities” in Germany (the best pro-male ratio is 1:3.47 in
Freistatt, Niedersachsen and the best pro-female ratio is 1:2.13 in Hamm, Rheinland-
Pfalz ). Computations like the best pro-male ratio can be formulated as OCL queries.
In this case the ratio is determined by the following OCL expression:

Municipality.allInstances.collectNested(m|
  Sequence{m.federalState.name, m.name, m.population, m.male_population,
           m.population-m.male_population})
  ->collectNested(t|
    Sequence{t->at(1), t->at(2), t->at(3), t->at(4),t->at(5)
             t->at(4).oclAsType(Integer) / t->at(5).oclAsType(Integer)})
  ->asSequence()
  ->sortedBy(t|t->at(6).oclAsType(Real))
  ->select(t|t->at(3).oclAsType(Integer) > 0)->last

Taking the last invariant into account, we define:

context Municipality inv malePopLessPop:
  type.name <> ’gemeindefreies Gebiet, unbewohnt’ implies
  population > male_population

The invariant is true in the considered system state. We accept it and conclude, that
the Destatis data is correct in this context.


4.3    Invariant zipCodeExemplary

The Municipality class has a boolean attribute called zip code exemplary. This at-
tribute indicates, that a single municipality can have multiple zip codes, but only one
exemplary zip code is held in the attribute zip code. However, we expect, that there
must be municipalities having only one zip code, which we know from everyday life.

context Municipality inv notAllZipCodeExemplary:
  not Municipality.allInstances.forAll(zip_code_exemplary)


4.4    Invariants Based on Geographical Information

As we mentioned above, we enrich the Destatis data by geographical information. We
try to assign the longitude and latitude value to each municipality by matching their
names. This method is neither exact nor does it cover all instances. We also do not
know, if the OpenGeoDB coordinates are always correct, since we did not compare
them with third-party sources. However, we may assume, that all coordinates must
be within the most northern, southern, eastern and western points of Germany. The
following list explores the corresponding limits for all compass directions in the World
Geodetic System 1984 format4 .

Based on this information, we postulate, that all longitude values must be in the stated
interval. This should hold analogously for the latitude values. Given that, we derive 4
4
    http://earth-info.nga.mil/GandG/publications/tr8350.2/wgs84fin.pdf
 direction     location                                   latitude/longitude
 most northern List, Sylt, Schleswig-Holstein             55.050000, 08.400000
 most southern Haldenwanger Eck, Oberstdorf, Bavaria      47.270108, 10.178319
 most eastern Deschka, Neieaue, Saxony                    51.266667, 15.033333
 most western Isenbruch, Selfkant, North Rhine-Westphalia 51.050000, 05.866667
                          Table 1. Border points of Germany

similar invariants respecting the fact, that longitude and latitude values are not always
present. All 4 invariants hold in the system state and we add them to the USE model.

context Municipality inv northLimit: -- analogously for south, east, west
  latitude <> null implies latitude <= 55.05


4.5   Derived Properties Employing Aggregate Functions

Instead of using longitude and latitude coordinates as constants, we can compute the
bordering rectangle of a federal state with derived attributes as indicated in Fig. 7.
The derived attributes for a federal state make use of the association with roles names
federalState and municipality. This association indicates the municipalities that con-
stitute the federal state. All municipalities of a federal state are considered there, and
the most western (eastern, northern, southern) coordinates are computed. Another
independent association is the one with role name federalStateAdministration. In a
constraint one can now check that the coordinates of a federal state administration are
consistent with the bordering rectangle of the federal state.


           Fig. 7. Derived properties employing aggregate function in OCL
Another use of the bordering rectangle is a check about the plausibility of the bordering
rectangles for different federal states. One can determine whether two federal states
possess or do not possess common points (overlapping bordering rectangles). With
that one can check that the actual present coordinates are such that, for example,
the most northern state ‘Schleswig-Holstein’ and the most southern states ‘Baden-
Wuerttemberg’ and ‘Bavaria’ do not have common points. We refrain from showing
the detailed OCL expressions. But we emphasize that the derived attributes and the
accompanying constraints only make sense in the presence of large object models as
the attributes and constraints rely on aggregate function (here min() and max()) that
apply in particular for large object collections.
5    Conclusion and Future Work
The practical experiences in our demo explores the capability of USE to handle larger
object models with good performance. All discusssed expressions evaluate in magni-
tudes of at most few minutes. We have shown, how a large UML object model was
employed for analyzing real-world data. The demo shows that analyzing data on the
modeling level brings advantages w.r.t. to ease of formulating expressions due to the
available abstraction mechanisms. We were able to browse and analyze the model and
perform primary validation tasks. We used the provided textual and visual object ex-
ploration and selection mechanisms including OCL queries. In fact, our approach has
an initial setup cost, but we obtain insight on a formal level with standardized modeling
languages.
Further experiments will identify limitations with regard to the size of objects and
links. Future work will also elaborate on fragmentation and slicing techniques for ob-
ject models in order to handle representative slices of the original data. Another topic
is the improvement of the visualization options for large object models, e.g., selection
mechanisms by link kind (only association, binary and ternary associations, aggrega-
tion, composition). Naturally, more case studies and comparison with existing solutions
must give feedback about the applicability of the proposal.

References
1. A. Benelallam, M. Tisi, I. Ráth, B. Izsó, and D. S. Kolovos. Towards an open set
   of real-world benchmarks for model queries and transformations. In D. S. Kolovos,
   D. D. Ruscio, N. D. Matragkas, J. de Lara, I. Ráth, and M. Tisi, editors, Proc. 2nd
   WS Scalability in Model Driven Engineering BigMDE@STAF2014, volume 1206 of
   CEUR Workshop Proceedings, pages 14–22. CEUR-WS.org, 2014.
2. M. Gogolla, L. Hamann, J. Xu, and J. Zhang. Exploring (Meta-)Model Snap-
   shots by Combining Visual and Textual Techniques. In F. Gadducci and L. Mar-
   iani, editors, Proc. Workshop Graph Transformation and Visual Modeling Tech-
   niques (GTVMT’2011). ECEASST, Electronic Communications, journal.ub.tu-
   berlin.de/eceasst/issue/view/53, 2011.
3. M. Gogolla and F. Hilken. Model Validation and Verification Options in a Con-
   temporary UML and OCL Analysis Tool. In A. Oberweis and R. Reussner, editors,
   Proc. Modellierung, pages 203–218. GI, LNI 254, 2016.
4. D. S. Kolovos, L. M. Rose, N. Matragkas, R. F. Paige, E. Guerra, J. S. Cuadrado,
   J. D. Lara, I. Rath, D. Varro, M. Tisi, and J. Cabot. A research roadmap towards
   achieving scalability in model driven engineering. In Proc. 1st WS Scalability in
   Model Driven Engineering (BigMDE 2013). ACM, 2013.
5. J. Rumbaugh, I. Jacobson, and G. Booch. The Unified Modeling Language 2.0
   Reference Manual. Addison-Wesley, Reading, 2003.
6. M. Sedlmeier and M. Gogolla. Model Driven ActiveRecord with yEd. In T. Welzer,
   H. Jaakkola, B. Thalheim, Y. Kiyoki, and N. Yoshida, editors, Proc. Int. 25th Int.
   Conf. Information Modelling and Knowledge Bases (EJC’2015), pages 65–76. IOS
   Press, Amsterdam, 2015.
7. D. Strüber, M. Selter, and G. Taentzer. Tool support for clustering large meta-
   models. In D. D. Ruscio, D. S. Kolovos, and N. Matragkas, editors, Proc. Workshop
   Scalability in Model Driven Engineering (BigMDE 2013). ACM, 2013.
8. J. Warmer and A. Kleppe. The Object Constraint Language: Precise Modeling with
   UML. Addison-Wesley, 2003. 2nd Edition.