The Problems of Data Modeling in Software Practice Harald Huber USU Softwarehaus, Spitalhof, D-71696 M oglingen Abstract the methodology starts to become ine ective, and no This paper presents, from the author's per- longer provides the overview required. Apparently, spective, the problems that occur in prac- there are just a few 'gurus' who are able to create tice during data modelling. The author's a complete complex data model. Often this data experiences are a result of a considerable model quickly decreases in value, as soon as that number of projects which he carried out person leaves the company. Director's oces exist in the framework of his consultancy role in which the corporate data model is hanging up at USU Softwarehaus in Moglingen (Ger- behind glass - however, this is regrettably the only many). place in which the data model is noticed or paid heed to. These projects concerned the following The following problems, among others, have been themes: recognised:  Corporate Datamodelling  Comparing Datamodels 1.1 Low Expressivness of a Data Model  Project (Application)- related Data in E/R-Form modelling. During the analysis phase, many of the organisa- In all cases, E/R-notation was the chosen tion's interdependencies and processes are identi ed. representation-form. From these experi- These are subsequently, to use the relativly inade- ences, the author formed an impression of quate language of the E/R-Model, abstracted and the problems that occur in practice when generalised. This often requires a change in termi- de ning a data model. These problems nology; in other words a uni ed, formal language is have, however, also led to the author's in- compulsory. What many authors (e.g. Vetter) see as creased interest in knowledge representa- an advantage of data modelling (exactly this coming- tion, in turn leading to his usage of KR- into-being of a corporate, uni ed terminology) often methods in practice. This has shown itself turns out to be a disadvantage: the terms used in to be quite e ective. the data model are not understood by the user de- Sections 2 and 3 brie y illustrate the rec- partments. To make matters worse, these terms are ommendations and the experiences arising mostly held in commentary form (if at all). Also the from their usage in projects. cross-reference of the new, uni ed terminology to the terms used in the departments is, in most cases, not documented at all. This makes understanding the 1 Datamodelling in Practice - the data Model afterwards very dicult (see 1.6). Problems 1.2 The Development of the Data Datamodelling was still up until recently the buz- Model is not Documented zword with which one believed to be able to solve the software crisis. CASE products concentrated A model undergoes many changes during the mod- on this area, meta-databases were created using a elling phase. Requirements, ideas and practical ex- data-modelling process (E/R), and large companies amples from the user department contribute to the invested millions in order to acquire a corporate permanent extension and improvement of the model. data model. Although this trend has subsided a lit- Consequently, variations in the Business Processes tle, the theme in general is still of current interest. are represented by generalisations, and classes (e.g. What Chen already recognised as an important ben- Subtypes) are created in order to denote similar e t when presenting the E/R-Model, is today still 'things' in the model. The problem is that in nearly seen as a key e ect of a data model: the representa- every case the documentation of this development tion provides a standard communications basis with is missing, i.e. reasons and re ections on which the which understanding between DP and users is more model's structures and elements are founded will be easily accomplished. lost after a short time. This results in diculties if This however, unfortunately seems to hold just the model is changed due to further development or for small data models. For larger areas of attention, new requirements. 1.3 The Ideal Model is Developed discusses entities and relationships, whose meanings Although the user departments are consulted during are comparatively trivial and thereby are a matter the analysis phase, in practice one is often left with of interpretation and alteration when trying to un- the impression that the DP-sta 's ideal model is de- derstand the 'fact-content' behind them. veloped. This trend is strengthened by the fact that 1.7 Missionary character of DP the creation of the data model requires a change in terminology and a certain generalisation (see 1.1). DP tends to over-estimate itself in many organisa- The user department sta usually see themselves tions. This inaccurate estimation doesn't particu- therefore as incapable of e ectively contradicting the larly a ect the importance of DP for the organisa- 'high- ying' ideas of their DP-colleagues. The result tion's success so much (This could certainly be the is mostly a model which gives the impression of ab- subject of heated discussion both in theory and in solute perfection, but which neither makes the day- an organisation's leadership). This obviously false to-day business its priority, nor is so understandable judgement of one's own situation a ects the im- that the user-departments can work with it. plementation of standards and norms much more. The standardisation of terminology (mentioned un- 1.4 Weak Methodology of the der point 1.1) which the DP-Department carries out Developer during data modelling is here an excellent example. The possibilities of graphical development tools and It implies however, that 0.5 - 2 % of the company can the resulting excellent representation often disguises dictate the terminology of the remaining employees. the weakness in the developer's understanding of the This over-estimation, together with the problem out- methodology. In this way, entities such as 'Total lined in 1.3, means that DP doesn't model according Turnover' and 'Turnover per Customer' can actu- to requirements, rather use their own ideas as basis ally be modelled. Most developers tend to model for the 'ideal model'. concepts as entities, instead of taking the expressive 1.8 Too much Technical Thinking character of entities in general into account. (This behaviour is also to be seen in a completely di erent Since most modellers come from a technical back- form, where the developers come from a very tech- ground (e.g. Application Development), they nd nical background and mean tables or les instead it extremely dicult to ignore this technical knowl- of entities. Let's leave this point for the moment edge when modelling. In the past, many cases oc- - it will be touched upon again in point 1.8). An- cured where performance considerations were incor- other weakness is the missing experience in interview porated into the E/R-Diagram. The problem, how- technique. Very often, the interviewer's question is ever, goes much deeper than that. Most modellers formulated like "And how can I show that in E/R?" cannot imagine any way to represent the character- instead of "Which process stati occur in practice - istics of entities other than with attributes. Two let's leave E/R out of it for the moment?". entities with the same attributes are hastily made one, without considering that they express a classi- 1.5 Exceptional Cases Become the cation on a logical level. Core of Model Since the daily business of a company is in most 2 Suggestion for a Solution cases comparatively simple to represent, Data Mod- The approach this solution takes is basically to use elling projects often rush headlong into attempting to best e ect the developer's (and the user depart- to build every case imaginable into the model, as if ment's) tendency to express himself in concepts. the knowledge for treating each of these cases really This means that in the initial Data modelling phase, had to be documented. The e ect of this is that the one creates a model of these concepts in the form of models quickly become too detailed and dicult to a semantic network. It's quite possible that other, understand - so much so that the user-departments, more modern, representations are more suitable for who really should judge the model's 'correctness' - this task. However, since the author has his roots in more or less make this judgement on the basis of the Data modelling world, moving towards seman- 'gut-feeling'. If they see well-known terms and recog- tic networks was the easier way for him to come to nise relationships between them that are held to be terms with knowledge representation methodology. necessary, then the model seems to them to be com- The author makes the following suggestion for the plete and correct, even though in many cases they development of a Data Model (relational or E/R): cannot follow it through to the lowest detail.  Creation of various semantic networks for 1.6 Assumption of Understanding parts of total area of attention. These The relatively low expressiveness of an E/R-model semantic networks contain all statements-of- all-too-seldom leads to recognition of this 'inade- fact and requirements issued by the user- quacy in meaning'. Often this inadequacy is com- department, in order not to let any information pensated for by an overkill of interpretation, which fall by the wayside. Representative questions means that the model, which really should be the from the user-departments can also be noted basis for a common understanding, often becomes a here. problem of understanding. The real world is then  Consolidation of the various networks. no longer the topic of discussion (in which the ques- The aforementioned networks are consolidated. tion of understanding certainly arises) - rather, one Synonyms and homonyms are not 'cleaned up'. This means that there is no uni cation of lan- guage necessary. Rather, the individual terms are cross-referenced to one another.  Generation of an E/R-model. The user de- partment requirements can be generated using all of the semantic networks. The E/R-Model can be worked on using this basis and can be tested using the requirements represented in the networks. This model is then the basis for the creation of the relational model. To make the consolidation of several semantic net- works developed by several developers possible, a standardized, uni ed representation of the networks is suggested. This means that only two types of as- sociations are allowed, represented by lines; all other relevant concepts and associations appear as nodes. This restriction forces the uni ed representation nec- essary for the consolidation. The following two types of associations are allowed to be represented by lines:  Type 1, which describes just the extension of a concept  Type 2, which de nes the intention. Note that these associations are not de ned by their symbolic meaning, rather by a relatively for- mal context. This has the advantage that the se- mantics of these associations are not interpretation- dependent. 3 Experiences from Projects The suggested methodology solves the aformen- tioned problems. The interviewers interview- technique is positively a ected, because his anno- tation is not subject to the restrictions of the E/R- model. The developement of the model is also doc- umented, whereby the supplementary information discovered during the analysis phase, is held in the model.  The tendency to strong generalisation and 'ar- ti cial terms' is restricted - the terminology can still be understood by the user department.  The selection process (what's an entity?) can be re-created and checked in reviews. The user- department sta can concentrate more on the model's content, thereby avoiding 'ideal struc- tures'.  The cabability to consolidate the various parts means that the model in the user-department stays relatively small.  There are, however, also disadvantages. If one uses a strictly formal representation, as sug- gested above, the model becomes dicult to grasp in its entirety. Furthermore, during the interviews, the interviewer requires considerable concentration in order to express the facts in the required manner. In practice, however, during the interview a some- what less formal representation is chosen, which is subsequently translated into a formal model. Note that the principle elements of the model are concepts, and not other elements such as entities, even if a less formal notation is used.