Visual Variables in UML: a First Empirical Assessment Yosser El Ahmar∗ § , Xavier Le Pallec § , Sébastien Gérard∗ and Truong Ho-Quang¶ ∗ CEA, LIST, Laboratory of Model Driven Engineering for Embedded Systems, P.C. 174, Gif-sur-Yvette, 91191, France {yosser.ELAHMAR, Sebastien.GERARD}@cea.fr § University of Lille, CRIStAL Lab UMR 9189, 59650 Villeneuve d’Ascq, France Email: xavier.le-pallec@univ-lille1.fr ¶ Chalmers, Göteborg University, Göteborg, Sweden Email: truongh@chalmers.se Abstract—This paper presents results of an empirical research the variations of the visual aspects into six variables called study of the Unified Modeling Language (UML) use in practice. retinal variables: Size, brightness, texture/grain, color, shape We employed a selective range of research methodologies includ- and orientation. X, Y planar axis (position) and the retinal ing in-depth semi structured interviews and quantitative analysis of > 3500 UML diagrams related to open source projects in variables are called visual variables [3]. The retinal variables GitHub. The aim of the study is to provide greater understanding are very significant in highlighting information. They are about the use of UML and to particularly shed light on the use rapidly perceived because the reader’s eye can detect their of the visual variables (i.e., color, size, brightness, texture/grain, variation without moving the visual brush, signals received shape and orientation) in practice. The theoretical perspective on the retina are sufficient. The use of the retinal variables of the study is to explore the usefulness of the visual variables in UML. These latter are highly significant in reducing the allow the human eye to perceive in a third dimension (i.e., cognitive load of human beings, when effectively employed. As depth perception). The depth perception does not require with all qualitative study, findings should be carefully interpreted, cognitive processing neither in the working memory nor in they should be seen as providing better understanding about the long term memory (pre-attentive perception). Hence, it the aforementioned scopes. We conclude by discussions of the reduces considerably the cognitive load of human beings. obtained results and lessons learned for future researches. UML mainly uses the shape visual variable to visually encode semantics (e.g., ellipses, rectangles, circles). The other visual I. I NTRODUCTION variables are less employed, despite their aforementioned great By the 90s, numerous graphical modeling languages were performances. That leaves the opportunity to explore the used by the software engineering community. This is due to usefulness of the other visual variables as a possible mean the increased acceptance of modeling and the emergence of to enhance the UML use in practice. This possibility has been Object Oriented systems. Each graphical modeling language already recognized as advantageous in software engineering uses its own graphic signs and meanings. That helped de- via the Cognitive Dimensions framework [4] and the Physics creasing intra-communities ambiguities but led to problems of of Notations framework [5]. interoperability between tool-vendors. Consequently, the Ob- The exploration of the visual variables in UML requires ject Management Group (OMG) [1] standardized the Unified understanding about; (i) Details about the situations of the Modeling Language (UML) [2] as an attempt to resolve the use of UML in practice. (ii) Details about the actual state latter interoperability issue. It has taken the advantages of the of use (or not) of the visual variables in UML. If numerous graphical representations and has defined UML as a visual lan- empirical studies treat the first scope (i) by investigating the guage for specifying, constructing and documenting software use of UML in practice, less researches investigate the use intensive systems. The OMG exhaustively describes the UML of the visual variables (ii). These latter mainly focus only on graphic signs via a concrete syntax and their meanings (i.e., the position visual variable to find effective layouts [6] [7] [8] semantics) via an abstract syntax. with sometimes studies on colors [9]. (i) and (ii) are strongly UML takes advantage of the high performances of the graph- related to each other and we deem that they have to be treated ical system. This bi-dimensional system has a major interest concomitantly. In fact, the use of the visual variables might compared to linear ones like the audio system or the textual depend on the way of the UML use by each practitioner. To one. In the audio system, there are two variations: the sound fill this gap, we conducted a qualitative exploratory empirical and the time. In a same time unit, the human ear hears only one study using interviews as strategy of inquiry. The purpose of variation: one sound. Whereas, in the graphical system, there the present study is to create more and better understanding are three possible variations: the X, Y planar dimensions and about the situations of the UML use in practice. A situation the visual aspect of a graphic sign like its color or its shape. refers to the activities, the stakeholders that are involved in In the same time unit, the human eye might perceive all the each activity, the practices of UML users in employing UML relationships between the three variations. [3] has subdivided and the purposes of such usages. In the captured situations, the study aims at discovering the need for the visual variables verify hypothesis or generalize findings. It mainly serves as in practice. If such need exists, we want to gain a great an exploratory study to help ongoing researches around UML. understanding about the kinds of visual annotation that UML Analysis of our interview data has been carried out using practitioners perform, the purposes and the ways to do so. the ‘grounded theory’ approach [12]. For that, we began by As a triangulation method, we analyzed the use of the visual manually transcribing the interviews from audio to textual variables in > 3500 diagrams related to open source projects form. We read throughout the data and identified themes and in GitHub [10][11]. That aims at finding quantitative data that descriptions. We tried to interrelate them using the grounded might reinforce the results of the interviews. Obtained results theory approach then we interpreted the findings. The analysis might help ongoing researches exploring the benefits of the of the UML models related to open source projects involved visual system as a mean of resolving problems that the present some basic enumerations and simple statistical calculations to study might reveal. They can also help studying the usefulness get overall sense about the use of the visual variables in UML. of the visual variables in enhancing the effectiveness of UML The major effort was spent on the manual classification of the in the captured situations of use. Finally, they might help tool different diagrams based on the different usages (or not) of vendors enhancing the usability of their tools by making more each visual variable. That helped us reporting on the state of ergonomic visual automation. practices of UML modelers in using the visual variables. This paper presents results from the qualitative study that we have conducted with 8 experts and practitioners of UML. B. Data collection procedures Then, it describes results of the analysis of the use of the visual 1) Interviews: We conducted a series of semi-structured in- variables in > 3500 UML diagrams from the models repository depth interviews with 8 participants (6 from industry and 2 [11]. The study takes a deliberately broad interpretation of researchers). 7 interviews have been carried out by phone and results from both methods, as it is meant to be exploratory. one face to face interview. As the first intent of the present work is to understand in depth the use of UML in practice, II. D ESIGN METHODOLOGY we were particularly interested by practitioners of UML. They We used a selective range of research techniques to gather come from a variety of backgrounds and with a range of data for our study. We used both qualitative study via in-depth expertise in UML. The interviews lasted approximately 30- interviews and quantitative study via the analysis of UML 60 minutes and began with a brief announcement of the goal models related to open source projects in GitHub [11]. Such of the study. We also introduced the fact that interviews will use of a variety of types of data helps us ensure a better be anonymous and asked permission to record them. Then, coverage and a greater understanding about our aforemen- we asked participants about their current position and level tioned and following two research questions: (i) What are the of experience with modeling using UML. We followed with details about the situations of the use of UML in practice and questions about the situations of UML use in practice to particularly the information that practitioners need to visualize. answer our first research question. That included the purposes (ii) what are the details about the actual state of use (or not) of of the use of UML, the activities done with UML diagrams, the the visual variables in the previously captured situations. The employed diagrams, the reasons of using a particular diagram, qualitative in-depth interviews with 8 experts and practitioners the sought information and the ways of using UML in a of UML help us gain understanding about the use of UML project from the beginning until the final steps. Then, we and the use of the visual variables in practice. It allows us to asked questions about their current use (or not) of the visual understand the relationships between both kinds of use. The variables in practice. That concerned the identification of the analysis of the UML models related to open source projects utility of the visual variables in practice, the most used visual helps us gather quantitative data, particularly, about the use of variables and the manners of their use in practice. As for any the visual variables. It allows us mainly to answer the second semi-structured interviews, we have identified a number of research question (ii). Conclusions about the amount of the topics that have to be covered in each interview. But, we visual variables use were calculated in a sample of > 3500 strongly encouraged participants to explain details of their UML diagrams. That also enables us to build conclusions claims by pointing out that the minor detail is very important about the effectiveness (or not) of such usages based on for our study. All the interviews have been conducted in the existing theories [3]. The present study is conducted as an form of a discussion where the interviewer followed the logic attempt to help us achieving our mid-term goal in exploring and the reasoning of the participants. Finally, the interviews the high performances of the visual variables to enhance the were recorded and transcribed with the permission of the effectiveness of UML in practice. participants. 2) Models database analysis: We manually analyzed the A. Interpretation of results use of the visual variables in > 3500 UML models related We need to be particularly careful about how we analyze to open source projects in GitHub [11] [10]. Most of these the results of our study and the conclusions that will be diagrams are class diagrams, exactly 3328 class diagrams, drawn from it. In fact, the first intent of this work is to 392 are sequence diagrams (The models repository is already create better understanding about the use of UML and the biased towards structural (class) models [11]). This aims at use of the visual variables in practice. It is not meant to gathering quantitative data that might reinforce the interviews results. To that end, we first began by identifying the visual a) Purposes of the use of UML: Results of the present variables that we will study, notably: the size, brightness, qualitative study revealed that communication is the first color, texture and orientation. Then, we manually classified purpose of using UML in practice. All the 8 participants have the UML diagrams based on their containment of a particular confirmed their use of UML diagrams as a communication visual variable (In case of two or more visual variables, we vehicle. Communications might be held internally within the created a new dedicated folder). For each visual variable project teams or with costumers. This finding is also confirmed (i.e., for each folder), we classified the diagrams based on by the previous empirical studies in this field [13][14]. The the nature of its use. In fact, we observed that each visual next paragraph further focuses on our results about UML use variable might be differently applied to UML elements: on the in communications. The second purpose of using UML is code border, text, background, edges, heads and/or compartments. generation. The latter finding is contradictory with previous We also differentiated significant visual variables variations empirical studies [15] where code generation generally appears and non significant ones. In fact, a visual variable variation in the last ranges. However, that seems logic in our case is considered as significant if there are different categories because most of our interviewees are practitioners of MDE of this latter in a same diagram (e.g., blue, green, red are approaches. They use models from early design steps until different categories of the color visual variable). The latter maintenance tasks. The third purpose of using UML is to draw kind of variations is very important for our study. In fact, the participants own understanding in an informal way where they mean that authors of the corresponding diagrams wanted UML diagrams are considered as a “map of the system”. That to highlight information using a particular visual variation. might be done using a pen and paper or on a white board. In We will further concentrate on their analysis to understand in such kind of use, participants do not care about the conformity depth their use and answer our second research question. Non of their diagrams to the UML standard. Their goal mainly significant variations refer to the use of a single category of a concerns the comprehension of the system to be built and its visual variable (e.g., all the classes are yellow). conformity to the clients needs. Finally, UML diagrams are less employed for model execution and model analysis. III. I NTERVIEWS b) UML and communications: We asked our practi- tioners about their practices of using UML diagrams for We have identified interviewees by focusing on their levels communications. We distinguished two types of audiences: of experience in modeling using UML and by ensuring their persons who are familiar with UML (e.g., technical team) practice of UML. We asked all our contacts in order to identify and non-familiars with UML (e.g. customers). We found out industrial practitioners who might be willing to be interviewed. that all our practitioners don’t modify (i.e., contextualize) their We have first done an announcement on mailing lists contain- diagrams for communications with persons who are familiar ing potential practitioners of UML: Papyrus tool developers with UML (Figure 1). They argue that all the stakeholders and users community. We have received two answers to that announcement. The first one has been discarded because the corresponding profile did not match with our target population. 8 3 The second one was retained because he had the adequate 6 2.5 target profile: practitioner and UML expert. Then, we sent 4 direct mails to industrial experienced practitioners in the MDE 2 community. We asked them to participate in our study or ask 2 1.5 other potential persons that might be interested and interesting 0 1 for our study. We contacted 11 persons where: 6 persons agrams diagram s fo. ML ify di odify Filter inAdapt the spee ch info Do mod Don’t m textual Don’t use U accepted our request, one person suggested another one that he Include deemed more interesting for our study and who was retained, (a) Communication with familiars (b) Communications with non- two persons have not answered to our mails and finally two with UML familiars with UML indirect contacts have not accepted to participate to our study Fig. 1: The need to contextualize UML diagrams for commu- because they are not experts and practitioners of UML. In total, nications we have carried out 8 interviews with 8 participants all experts and practitioners of UML. Roles of the interviewees range already know and understand the language. However, when from the requirement manager, software architect, software it is about discussing with customers, they react differently. designer, consultants, and software engineers. They work on Most of the practitioners don’t modify their diagrams but try different domains: transportation, aerospace engineering and to adapt their speech to the audience. Following are two claims defense, avionics, telecommunication, E-commerce, insurance, from our practitioners: banking, etc. 5.5 hours of interviews have been recorded and “. . . We kind of read the diagram to them then we say our manually transcribed. interpretation and they just hear what we say and they agree or not with that. . . ” (Transcript 3) A. Analysis “I didn’t ask him to learn all of UML but like for the class 1) Situations of UML use in practice: diagram I would explain the class you know what the class is, the attributes and relationships that takes only a few minutes requirement manager justifies his use of the use case diagrams and then... the subject matter he is really familiar when he by the fact that such use is recommended by the safety require- sees that these boxes as you know class called solution or ment standard. Use cases are also used to drive our software column or pump and types of things that are easier to work engineer thinking then they will be part of the documentation. with...” (Transcript 4) State machines are mostly used to design the behavior of Other interviewees would prefer filter some information from the systems to be built or as an executable model. Activity their diagrams to keep only those interesting for their commu- and structure diagrams are the fourth most used diagrams nications. To that end, they omit technical details that don’t by our practitioners. Activity diagrams are mostly seen as really matter to their customers. They try to keep diagrams an elaboration of the use cases and a representation of the simple to better communicate. systems features. They are also used for the business process “. . . we actually try to simplify as much as possible in our modeling and as a communication vehicle with customers. ... UML model because they aren’t UML experts so we try Then come the interface, component and interaction diagrams to filter out all. . . We try not to overload our diagrams with as less used ones. These findings are coherent with previous labels everywhere that non UML experts will not understand” empirical studies in this area [15][14]. (Transcript 7) d) Pattern of UML use in practice: We asked the inter- Finally, one practitioner prefers not using UML when dis- viewees to describe in detail their practices in using UML to cussing with non-familiars with UML. build a system or a project. We analyzed the answers to this Generally, all our practitioners were aware of the unsuitability question and were able to identify a pattern of the use of UML of UML for all types of communications. They try to find dif- by our interviewees. All our practitioners begin by gathering ferent manners to facilitate such use. Rare of the practitioners the requirements from the customer. That might be in a textual has mentioned the recurrent use of the visual variables to adapt form or via a modeling session. the diagrams to communications. This fact is mainly due to “The three people working on the project for example, we problems with tools (see Section 6). interview users who want the system and we understand c) Used UML diagrams: Interviews showed that class from them what the requirements are, then we translate these diagrams and sequence diagrams are the most used ones in requirements. It is like we have a modeling session, we sit practice (Figure 2). Different reasons are given to justify the with them the three of us and we interview that, what do you choice of such particular diagrams. A software engineer argues imagine blablabla. And then we capture the use cases and we that the class diagram is the most expressive notation in UML start populating a use case diagram...” (Transcript 3) for modeling data. A software designer uses the class diagram At this level, the models serve as a support of communication to have a design of the database. A software architect pointed with the customer and within the technical team members. out that class diagram is used to divide the work among the This step allows our participants to draw the big picture of the different teams involved in a same project. Class diagrams are systems to be built. One interviewee mentioned the advantages also employed to draw the business entities of the systems of representing the system in a visual form instead of text. and to represent the functional relationships between these “Drawing the system instead of writing is a good tool to latter. Concerning the sequence diagrams, they are most used communicate and share mind viewpoint. The vision goes more to define the interaction between the classes and interactions quickly, we can decide more quickly about the architecture, between users and the solution (i,e., the common definition the architecting stuff.” (Transcript 6) of a sequence diagram). Sequence diagrams are also used Then, participants move to an understanding session where they review and check the requirements of the customer to Interaction ensure they correspond to their needs. Components “We want to represent the system as it is and we want to Interface understand the needs may be to understand the way to go Structure Activities to the system to be. So we used different diagrams offered, State machines provided by UML to draw the big picture of the – context to Use cases Sequence deeply understand what is the need.” (Transcript 6) Class To that end, they might need to go back to the customer and 1 2 3 4 5 review the requirements in another modeling session. Once Fig. 2: UML diagrams used in practice ensured that their models match well with the requirements of the customer, they split the work among the persons involved in the definition of the white box part of a solution and to in that project. To that end, UML might be employed as realize specific use cases. The use case diagrams and the a discussion vehicle via the class and use case diagrams. state machines diagrams are ranked second among the most Finally, each participant continue using UML for his particular used UML diagrams in practice. The purpose of creating use needs: model simulation and execution where the models case diagrams is to enumerate the functions to develop and to represent the code. They might generate code from them or specify actors and the interactions between them. Eventually, continue coding the system and keep the created models in that refers to the definition itself of a use case diagram. Our the documentation. In all cases, the models will populate the documentation that describes each project or system. To “We have discussed and said that we should avoid coloring. At these ends, most of our practitioners use a modeling tool in least if the colors have a specific semantic I mean you should their practice. One interviewee pointed out that the use of a be able to understand the diagram without the colors we can’t modeling tool depends on his needs. If it is about gaining his put any semantic meaning into the colors because if you lose own understanding, he settles for a pen and paper. Otherwise, the colors when you print into black and white printers I mean if it is about automation, he do use a modeling tool. it is pretty fundamental to still have the same semantic of the e) Searched information: We asked our practitioners diagram”. (Transcript 5) about information that they need to visualize in practice. Furthermore, we observe that most of the examples of high- We distinguish two types of information. First, we find the lighted information using colors are “selective” information: semantic information (i.e., what is modeled in a diagram). Practitioners want to highlight UML nodes belonging to a Following are examples of semantic information mentioned same group (e.g., MVC elements, elements that have the same by our interviewees: Input and output statements for the semantics) together. They also need to highlight “ordered” requirements, to see the communication in a sequence diagram information (e.g., progress of implementation, important fea- to understand the logic, to see which system does what, tures). to search the across functions, to see the interactions of a practitioner own system, to search for references for specific No signals or events in the model, specific signal in a specific protocol or interface that trigger a state machine. Second, we Yes but problems with tools find what we called extra-semantic information. It consists in non-semantic information but that can be extracted from Yes a UML model. Examples of extra-semantic information are: 1 1.5 2 2.5 3 level of implementation of the classes, bugs in the model in Fig. 3: Utility of colors in practice (Were colors helpful?) the case of model execution. We observe that practitioners need to visualize information b) Utility of colors in practice: We asked our practition- on their diagrams. Before going to the documentation, UML ers if the previously mentioned use of colors has been helpful. diagrams are subjects of many visualizations where practition- We found out that most of them agree on the added value of ers need to search for important information to accomplish colors and that their use was helpful in practice. Figure 3 their tasks. If we link this finding to the previous results about details the answers of the participants. 3 interviewees totally the purposes of using UML in practice, most of the searched agree on the utility of colors in practice. The same number information belong to the “drawing of understanding” purpose. of interviewees confirm that colors are helpful in practice but Practitioner visually navigate in their diagrams to find accurate there are problems with modeling tools that deteriorate such information to build the mental map of their systems or use. Besides, they express their need for an automatic and projects. efficient tool and propose some recommendations that will be 2) Visual variables in practice: discussed in Section 6. One interviewee stresses that colors a) Color: We asked our practitioners about their need are helpful but only for communications. for colors in practice and about examples of information they c) How to use colors? : In case of the use of colors, needed to highlight using them. Again, we distinguish two we wanted to understand how practitioners do chose colors. types of information: semantic information and extra-semantic We found out that only two practitioners use some internal information. Only two semantic information were mentioned conventions of their companies. Following are examples of by one single practitioner: Important features like inheritance conventions used within two different companies: or interface and elements that have the same semantic. Most “Non-tested functions: Blue; safety functions: yellow. . . ” of the interviewees used color to highlight extra-semantic (Transcript 1) information. The progress of the implementation of classes has “To communicate the green means we have it, yellow means been mentioned by three practitioners. They want to visualize in progress, red means we –“ (Transcript 3) the progress of the development of their classes directly on The majority of interviewees do not have internal conventions, the diagrams. Examples of extra-information mentioned are: they follow their own tastes. • Role in the design (criticality, parts of patterns (especially “In my domain which are in general embedded systems we can MVC), parts of layers, levels of security). use blue for that software functional related, I use orange for • Status in development (progress in implementation, test- everything that software platform related to framework system, ing, execution). drivers, etc and red for everything that is material, hardware • Distribution of tasks between the stakeholders (ownership related.” (Transcript 7) of each class). “I avoid red because red means mistake and green is nice Besides, one practitioner mentioned that colors must not be because it means correct.” (Transcript 1) used to highlight semantic information. He argues that the One practitioner says that they have internal conventions but diagram should be understood without coloring which might are used in an ad-hoc manner. disappear in case of the use of black and white printers. “Unfortunately, this (internal reference documents) is used in an ad-hoc manner. We have just documents to follow but no “Texture: The diagrams are printed and stuck in the wall so body follows them in a formal manner.” (Transcript 6) using texture... to me it is making the model less readable... Then, we wanted to know if practitioners add legends/keys it could be more beautiful for business people. For technical when they use colors. We found out that the majority of people I don’t think it will be added value.” (Transcript 8) practitioners do add legends or would like to do so: 2 practi- The use of the visual variables depends also on the size of the tioners confirm that if they use color, they add keys. Two other working teams. In large organizations, the use the different practitioners would like to add keys but there are limitations visual variables might create a mess. in the used modeling tools (Figure 4). “In smaller teams, probably they are perfectly well where you At this level, we observe that most of the practitioners neither can align and decide the coloring rules and so on but as long as get a little bit bigger, then going and using different visual No variables... just creates a mess.” (Transcript 5) Ongoing researches about the use of the visual variables in UML should take into account these claims and provide effective material (i.e., via theories and convenient tools) to handle the aforementioned problems. Yes/ would like to add them 2 2.5 3 3.5 4 IV. M ODELS REPOSITORY ANALYSIS Fig. 4: The need for keys in practice follow internal conventions nor add keys when they use color. Such behavior is non effective because keys are primordial if at least one visual variations does exist [3]. That might create ambiguities to understand the diagrams in question (e.g., by the author himself after a long time or another team member who might need to read it while maintenance tasks). d) The other visual variables: We asked an open ended question about the utility of the other visual variables (i.e.; size, brightness, texture/grain and orientation) in practice. The majority of our practitioners confirms that the use of the visual Fig. 6: Analysis of the visual variations in the models reposi- variables might help using UML in practice (Figure 5). In tory. As mentioned in Section 3, we distinguish significant vari- Problems ations of a particular visual variable and non significant ones. In that context, results of the analysis of the models database Yes only for communication show that 22% of the diagrams present a significant visual variation (Figure 6.1). That means that modelers did need to Yes but problem with tools highlight information and used the visual variables to that end. 1 2 3 4 5 6 As depicted in Figure 6.2, color, brightness and size are the Fig. 5: The need for the visual variables in practice three most used visual variables. We found out that only one diagram is using the texture visual variable and the orientation parallel, they stressed on the effectiveness and usability of is never used. 67 % of the diagrams present non significant the employed tools for that purpose. The utility of the visual variations (Figure 6.1). Such non-significant variations refer variables directly depends on the efficiency and usability of to the default configurations of the used modeling tools (e.g., the tools. One interviewee argues that these visual variables by default, all the classes might be yellow, blue, green, gray, might be helpful only for communications. If it is about etc,.). 11% of the diagrams are purely black and white and do understanding or using his own diagrams (i.e.; that he creates), not present visual variations. he will not use them. “If I have to model a function, I AM the designer, I am A. color modeling this function, so I don’t see how I should use visual Color is the most used visual variable to express significant annotations.” (Transcript 1) visual variations by 80% (Figure 6.2). We analyzed details Another interviewee thinks that the use of the visual variables about the use of colors and observed that they are differently might create a mess. The size visual variable might make big applied to UML elements: background, borders, edges, text, diagrams less readable. Texture also might create problems of heads and compartments. We found out that colors are applied readability and printing issues. to the background of the UML elements in 57% of the “Size: No because most of the time, the models are so complex. diagrams that present significant color variations (i.e., classes So having classes bigger than others make the diagram less or lifelines) (Figure 7). 10% of the diagrams present a color readable.” (Transcript 8) variation of the contained text of an UML element: class name, they want to highlight (e.g., attributes, methods or just a part Annotations Combinations 3% 2% of them). Once, the different sizes of the text have been used. Head 6% Text V. S TATE OF THE ART 10% A. UML use in practice Borders 9% Several empirical studies investigating the use of UML in Background practice do exist in the literature. [16] evaluates the costs Edges 63% 7% and benefits of modeling in practice via discussions with 38 professionals at a developer-community meeting. The authors found out that the three main advantages of UML are: the ability to handle the growing complexity of software develop- Fig. 7: How are colors applied to UML elements? ment by working at higher levels of abstraction, traceability from requirements to low level design and more efficient communications. They pointed out problems that should be attributes, methods or even text related to comments. 9% of addressed. In fact, participants claimed the need to efficiently the diagrams present color variations of the borders of UML communicate using UML diagrams. They emphasized the elements. Modelers vary only the border of packages and importance of keeping diagrams focused and as simple as classes. Finally, we observed that modelers add information possible. They also pointed out the difficulty to perform a in their diagrams using colored text or arrows. We looked research to find relevant occurrences, spacial layout problems further into detail to find out the information that modelers and other issues. [13] presents results of a survey of 113 wanted to highlight in the UML diagrams. However, it was software practitioners that studied the motivations of using difficult to identify them. The latter difficulty is due to the code centric versus modeling centric approaches. They found lack of keys or any information that designate the meaning out that UML is the most used notation in practice and each color variation. In fact, only 4% of the diagrams that quality of generated code is one of the biggest problems with present a color variation do contain keys or simply meanings modeling tools. [13] Wojciech and al. conducted a controlled of the applied visual variations (Figure 8). 14% of these experiment to assess the benefits and costs of using UML keys are not up-to-date with the corresponding diagram. That particularly while maintenance tasks. They showed that UML might occur because the used tool does not automatically diagrams helped participants fixing changes but increased the update the keys. This finding joins the interviews results where development time due to the overhead of updating the UML practitioners have raised their need to add keys and pointed diagrams. [15] discusses results of a web survey with analysts out the limitations of tools to add these latter. We analyzed the that are familiar with UML. It investigates the purposes of us- Present Keys ing UML in practice, the used diagrams for each purpose and 4% the degree of success UML has in facilitating communications within development teams. All of these works gather different kinds of data (i.e., quantitative and/or qualitative) to explore the UML use in practice. They try to answer diverse initially No keys fixed research questions about UML. They analyze data from 96% different angles/perspectives. However, no work has attacked Fig. 8: Do modelers add keys/legends? the angle of investigating the situations where practitioners need to visualize information, the sought information and highlighted information when keys are available. 28% of the eventually the use of the visual variables in practice. We deem diagrams present the Model, View and Controller elements inescapable to conduct the present empirical research to further as highlighted information. They use the following sets of study the usefulness of the visual variables in UML. colors: (pink, yellow and mauve), (green, yellow and mauve), (blue, orange and green) and (yellow, green and red). All B. The visual variables in practice the highlighted information are selective ones where different If some works study the impact of the visual variables groups of elements can be visually grouped together. use on UML, they focus only on the following two visual variables: position via layouts and colors. The impact of the B. Brightness and size other visual variables has not been studied, despite their great As mentioned above, the brightness is the second used performances in reducing the cognitive load of human beings. visual variable. As colors, brightness is mostly employed to A lot of researches aiming at finding effective layouts have highlight selective information. Modelers chose different levels been conducted. Finding the effective layouts was and still is of brightness of a particular color or ranges of white and an important topic in software engineering field. [6] [7] [8] gray. Brightness is always applied to the background of UML aim at finding effective layouts based on diagram comprehen- elements. For the size variations, significant ones are mostly sions (i.e., ask comprehension questions about diagrams with applied to text. Modelers change the thickness of the text that different layouts) and user preferences (i.e., ask participants to mention their preferences on diagrams with different layouts). and sometimes, they are not up-to-date with the corresponding The use of colors has been less investigated. [17] evaluates diagrams. The latter finding confirms the interviews results effective layouts based on class diagrams comprehension via about the need to an automatic tool. Existing theories like an experiment. It uses colors to highlight information on the [3] prove that keys are mandatory when at least one visual diagrams. The authors found out that color helped participants variation does exist in a graphical representation. It helps to answer questions. [9] uses eye tracking to evaluate the use of reading and understanding the meanings of such variations. colors, layouts and stereotypes in the comprehensions of UML Indeed, when we tried to analyze the models in the repository class diagrams. The latter experiments showed that colors [11], we encountered problems to understand the meanings were helpful for participants. However, all of these works of the visual variations applied by modelers. That might be are quantitative studies. No qualitative research have been problematic in practice when modelers want to understand conducted to present better understanding about the actual the diagrams containing some significant visual variations. state of use of the visual variables in practice. They are all In addition, we observed that the used visual variables are controlled experiments, they don’t reflect the practices of UML differently applied to UML elements: border, background, text, users and their opinions about such usages. No exploratory etc. In that context, there are absolutely implementations that filed study does exist in this area. are more effective than others. Ongoing researches should better explore the effective ones [18]. VI. D ISCUSSION AND LESSONS LEARNED Via both research methods, we observed that colors are mostly Results of the interviews show that UML diagrams are employed to express selective information. Based on [3], such employed in several situations (e.g., communication, drawing use is effective. Selectivity is one of the perceptive properties of understanding, analysis) using different diagrams (e.g., of colors, the human eye can rapidly select groups of elements classes, activities, state machines). These situations involve having the same color together. However, we also noticed that many visualization tasks where practitioners need to research practitioners use colors to express ordered information (e.g., information important to accomplish their work. These infor- the progress of the implementation of a project). Such use is mation might be semantic or extra-semantic ones. Interviews non effective [3]. In fact, the human eye can not order colors also show that colors are sometimes used in practice. Such use but it can spontaneously and rapidly order different levels of has been recognized as helpful when used by our practitioners. brightness (i.e., from dark to light and vice versa). Concerning the other visual variables (i.e., size, brightness, texture/grain and orientation), practitioners do not actually VII. C ONCLUSION use them but deem that they might be helpful and useful The present empirical study provides understanding about in practice. However, such usefulness directly depends on the use of UML and the visual variables in practice. 8 the usability of modeling tools. To reinforce their claims, interviews have been carried out with experts and practitioners practitioners mention recommendations about effective ones. of UML. In addition, + 3500 UML diagrams were analyzed to First, they express the need to an automatic tool that updates discover the employed visual variables and discuss the ways of the visual variations in case a highlighted information does their usages. Interviews show that UML diagrams are used in change or evolve. In the extra-semantic information about the different situations where practitioners need to visualize infor- progress of the implementation, classes should automatically mation. Results from both research methods show that color be updated when a class status moves from in progress is the most used visual variable. It is differently employed status to implemented. Practitioners have also raised the need (borders, text, edges, compartment, etc,.) to express selective to add keys when they use colors. They pointed out that information. But, it is also employed to express ordered infor- not all modeling tools present such feature. In that context, mation which is not effective based on [3]. Furthermore, keys they suggest to have an interactive legend that enables, for are primordial when at least one visual variation does exist instance, the possible update of the visual variations in the [3]. However, our practitioners and the analysis of the UML UML diagram directly in the keys and vice versa. They models show that they are not often added. This is mainly due also recommended the possibility to define rules that map to problems with modeling tools. These results might help the information to highlight and the corresponding visual ongoing researches providing theories to effectively employ variables. Furthermore, practitioners stress on the subtlety of the visual variables (e.g., effective implementations, rules of the used visual variations. The visual variables have to be efficiency to map information to the most adequate visual associated to particular meanings. They also stress on the variable). Second, effective tools, that respect the practitioners necessity to consider large organizations where a big number recommendations for instance, must be provided. Besides, we of persons collaborate on the same models: the tool should begun developing such tool in Papyrus [19]. handle the conflicts that might appear. In the future, other empirical studies should be held to rein- As with the interviews, results of the quantitative analysis force the findings of the present research. That might be done of the UML models show that color is the most used visual via surveys, experiments or discussion panels. The contexts of variable. But, concerning the other visual variables, it shows the use of the visual variables in the models repository might that brightness and size are also used to highlight information. be collected to further link the performed visual variations to In the models repository, only 4% of the diagrams present keys the real situations of use. ACKNOWLEDGEMENT We would like to gratefully thank all the practitioners who have accepted to participate to the interviews. R EFERENCES [1] “Object management Group, howpublished= http://www.omg.org/.” [2] “UML specification, howpublished= http://www.omg.org/spec/uml/.” [3] J. Bertin, “Semiology of graphics: diagrams, networks, maps,” 1983. [4] T. R. G. Green and M. Petre, “Usability analysis of visual programming environments: a cognitive dimensions framework,” Journal of Visual Languages & Computing, vol. 7, no. 2, pp. 131–174, 1996. [5] D. L. Moody, “The physicss of notations: toward a scientific basis for constructing visual notations in software engineering,” Software Engineering, IEEE Transactions on, vol. 35, no. 6, pp. 756–779, 2009. [6] K. Wong and D. Sun, “On evaluating the layout of UML diagrams for program comprehension,” Software Quality Journal, vol. 14, no. 3, pp. 233–259, 2006. [7] B. Sharif and J. I. Maletic, “An empirical study on the comprehension of stereotyped UML class diagram layouts,” in Program Comprehension, 2009. ICPC’09. IEEE 17th International Conference on. IEEE, 2009, pp. 268–272. [8] H. C. Purchase, L. Colpoys, D. Carrington, and M. McGill, “UML class diagrams: an empirical study of comprehension,” in Software Visualization. Springer, 2003, pp. 149–178. [9] S. Yusuf, H. Kagdi, and J. I. Maletic, “Assessing the comprehension of UML class diagrams via eye tracking,” in 15th IEEE International Conference on Program Comprehension (ICPC’07). IEEE, 2007, pp. 113–122. [10] R. Hebig, T. Ho-Quang, G. Robles, M. Fernandez, and M. R. V. Chaudron, “The quest for open source projects that use uml: mining github,” in Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems. ACM, 2016, pp. 173–183. [11] “UML repository,” http://oss.models-db.com/. [12] F. Shull, J. Singer, and D. I. Sjøberg, Guide to advanced empirical software engineering. Springer, 2007. [13] A. Forward, T. C. Lethbridge, and O. Badreddin, “Perceptions of Software Modeling: A Survey of Software Practitioners,” University of Ottawa, Tech. Rep., 2010. [14] W. J. Dzidek, E. Arisholm, and L. C. Briand, “A realistic empirical evaluation of the costs and benefits of UML in software maintenance,” Software Engineering, IEEE Transactions on, vol. 34, no. 3, pp. 407– 432, 2008. [15] B. Dobing and J. Parsons, “How uml is used,” Commun. ACM, vol. 49, no. 5, pp. 109–113, May 2006. [Online]. Available: http://doi.acm.org/10.1145/1125944.1125949 [16] M. R. V. Chaudron, W. Heijstek, and A. Nugroho, “How effective is uml modeling?” Software & Systems Modeling, vol. 11, no. 4, pp. 571–580, 2012. [17] O. Andriyevska, N. Dragan, B. Simoes, and J. I. Maletic, “Evaluating uml class diagram layout based on architectural importance,” in Visual- izing Software for Understanding and Analysis, 2005. VISSOFT 2005. 3rd IEEE International Workshop on. IEEE, 2005, pp. 1–6. [18] Y. El Ahmar, X. Le Pallec, and S. Gérard, “Empirical activity: Assessing the perceptual properties of the size visual variation in uml sequence diagram.” [19] S. Gérard, C. Dumoulin, P. Tessier, and B. Selic, “19 papyrus: A uml2 tool for domain-specific language modeling,” in Model-Based Engineering of Embedded Real-Time Systems. Springer, 2010, pp. 361– 368.