On the Impact of Diagram Layout:
              How Are Models Actually Read?

    Harald Störrle1 , Nick Baltsen1 , Henrik Christoffersen1 , Anja M. Maier2
                 1
                  Dept. of Applied Mathematics and Computer Science
       2
           Dept. of Management Engineering, Technical University of Denmark


      Abstract. This poster presents the latest results from a very large eye
      tracking study (n=29) that explores how modelers read UML diagrams.
      We find that various factors like layout quality, modeler experience, and
      diagram type lead to significant differences in diagram reading strate-
      gies. We derive elements of a theory of diagram reading behavior from
      our findings. This paper presents only late breaking results: all findings
      presented, theories constructed, and conclusions drawn are of a prelim-
      inary nature. This paper does not present the amount and degree of
      evidence that would allow us to consider the contents as being scientifi-
      cally validated, yet.


1   Introduction
Practical experience suggests that usage and understanding of UML diagrams
is greatly affected by the quality of their layout: it is a lot harder to spot errors
and weaknesses in a cluttered diagram, and it is a lot easier to understand the—
quite literally—overall picture of a nicely laid out diagram. Previous research
has identified what constitutes good and bad practices of layout (see e.g. [1]),
but has failed to provide conclusive evidence in support of this hypothesis. In
fact,
    “We set up an experiment to study the impact of [layout] rules on under-
standability [...] operationalized in terms of [...] faults and [...] time. We could
not identify a statistically significant relation between [them].”[1]
    On the other hand, our own previous work provided substantial evidence to
this effect [4, 5]. We could demonstrate, for instance, that error rate, time used,
and subjectively assessed cognitive load increased for UML diagrams with bad
layout, and that this effect is likely independent of the diagram type (we studied
the five most commonly used diagram types of UML).
    Since there is such a stark contrast between our results and the previous
literature, we deemed it necessary to add to the validity of our study by a
replication of the results. The replication varies the previous studies in three
aspects.
 – First, obviously, different participants were recruited for the present study.
   They are of a comparable population, though, so this amounts to a minor
   variation point only.
 – Second, the study is designed and conducted by two graduate students (the
   second and third authors), so as to reduce impact and bias through the
   person of the experimenter of the previous studies (the first author). Again,
   this was expected to be of secondary importance due to the previous study
   setup that had reduced personal interactions almost entirely.
 – Third, we use a different method (eye tracking) with objective, physiologi-
   cal measurements of cognitive load, thus addressing critique that subjective
   assessments of cognitive load are not sufficiently reliable. We also used the
   measurements previously employed (score, duration, subjective assessment)
   to establish a point of reference.
     Next we were curious whether or not we would be able to observe any iden-
tifiable behavior patterns, that amount to reading strategies. If such reading
strategies do exist, would there be differences under the various treatments, e.g.,
would novices exhibit different reading strategies than experts, say? Ideally, this
should lead to elements of a theory of how UML models (and similar artifacts)
are processed by human modelers, and what we can do to support this process.
Since these are subtle questions, and since we have not used eye tracking as a
research method before, we have developed and refined our study design and
experimental procedure in a pilot study, see [3] (n=4). Here, we report initial
results from the main study following on the pilot study (n=29, one excluded
due to technical errors in the measurement process).


2   Research Objective 1: Replication of previous results
By and large, the present study produced the same results as previous studies:
the impact of layout quality on modeler performance could be replicated, though
it showed more in the objective measures (score, time) than in subjective effort
assessment – in our earlier studies, we found a stronger effect in cognitive load
than in objective achievement. The differences are not so big as to be inexplica-
ble, though, which leads us to attribute it to the different experimental setting,
and the different populations. As in earlier experiments, the effect appears in-
dependent of diagram type, but is influenced by the expertise level, and the size
of diagrams.
    Previously, we had only used subjective assessments of cognitive load which
have been criticized as less reliable than physiological indicators like heart rate
or skin conductivity. So we instrumented our stimuli accordingly and measured
the pupil dilation, fixation duration, and blink rate associated to diagram el-
ements and easily localizable layout problems (e.g., line crossings, line bends).
We could clearly show that subjective assessments correlate tightly to physio-
logical measures, which is in line with existing literature on the subject [2]. We
also found, that there is small, yet statistically significant evidence that one ob-
jective performance indicator (score) increases with decreasing layout quality.
Obviously, this is counter-intuitive and in contrast to all previous findings. So,
while we lack an explanation for this observation so far, we tend to attribute
this to a mistake in the experimental procedure.

                                         2
3   Research Objective 2: Existence and identification of
    behavioral patterns

We could clearly identify behavioral patterns in the subjects reading diagrams.
In order to study these patterns, we recorded and analyzed the starting point
and progress of participants’ scan paths. We found three different behavioral
patterns in them.

 – Branch Following From Anchor (BFFA): An anchor would be estab-
   lished either at the top left corner, the largest element, or the center of the
   diagram. Starting from there, the graph structure of the diagram would be
   followed in a systematic way.
 – Left/Right Top/Bottom (LRTB): starting at the top left corner, proceed
   downwards and to the right irrespective of the direction of arrows in the
   diagram, i.e., as if reading a text in the normal reading direction for western
   languages.
 – Random Walk (RW): Starting at any point, continue to any other point
   and so on, at no discernible pattern.

    LRTB coincides with BFFA for diagrams of a diagram type with natural
direction (e.g., Activity Diagrams), when the actual diagram layout agrees with
the reading direction. In the literature, this has been described as “diagram
flow”, a notion that could be formalized as the dominant direction of directed
elements in a diagram. This definition does not cover other kinds of flow like
circular or radial, but our study did not consider examples of these layout types.
    Interestingly, all participants (i.e., both experts and novices) followed the
BFFA or LRTB strategies for diagrams with good layout, irrespective of diagram
type. For diagrams with bad layout, however, novices tended to use the RW
strategy while experts continued to use the BFFA/LRTB strategies.


4   Research Objective 3: Analysis of underlying cognitive
    processes

The stability of behavioral patterns we have observed were relatively stable over
different individuals, and even more so for advanced modelers. This might be in-
terpreted to show that while behavior exhibits individual variation in less skilled
modelers, increasing experience results in a converge of behavior. If this is so, it
is likely that one and the same cognitive process is effective universally, in all
modelers—but what can we know about this process?
     In a recent study [6], we could show that an important factor for decreasing
modeler performance (and increasing difficulty) is the size of the diagram, as
measured by the number of model elements. Our present study showed, however,
that diagram flaws like line crossings exhibit similar cognitive load profiles as
shapes (i.e., diagram elements representing classes, use cases, and so on), as
measured by pupil dilation, fixation duration, and blink rate. That means, that

                                         3
increasing (or decreasing) the number of diagram elements has the same effect
on cognitive processing than increasing (or decreasing) the number of layout
flaws. This seems to suggest that diagram size and layout quality are just two
sides of a coin, and the same cognitive process is affected by both.
    The literature on layout quality, however, frequently discusses two distinct
layers: one concerned with low-level layout flaws like line crossings, and one
concerned with higher-level aspects like flow (this latter level is sometimes called
diagram architecture). And indeed, as we have seen above, there are clearly
discernible reading patterns that obviously are independent of diagram size, but
do depend on the overall layout structure. So, there is some indication that
diagram reading might involve at least two different cognitive processes, but it
is not quite clear whether and how they are connected.
    For diagram types with a “natural” flow like Activity Diagrams, it is fairly
obvious to assume that following the reading direction (strategy BR) leads to
good layouts. So we might be tempted to consider “layout flow” as a layout
quality measure. But what does flow mean for diagrams with no natural di-
rectionality? Also, our sample did not contain diagrams with radial or circular
layout where different flow might be observed. Finally, it might be that flow is
the wrong notion to begin with, and we should rather consider flow as a special
kind of symmetry, and explore this notion instead. This will require experiments
on specifically created layouts that highlight one or the other aspect, and to see
whether these layout styles give rise to different behavior.


5   Interpretation

We conduct a study on the impact of layout quality, expertise level, and diagram
type on the understanding of UML diagrams. Like in earlier studies, we find
that the diagram understanding outcome (measured in time, errors, and load)
is affected by layout quality, even though in the present experiment, we used
eye tracking instead of the subjective assessments of cognitive load we had used
before. We conclude that our initial study and our replication corroborate each
other, underlining the validity of the evidence presented.
    But it is not just the outcome that is affected by layout quality, we also ob-
serve changes to the diagram reading behavior (measured as start location and
subsequent evolution of the scan path). Even though the results presented here
are of preliminary nature they clearly indicate that there are several distinct
reading strategies. Surprisingly, we find the same reading strategies for different
diagram types: one would expect that diagram types with a natural directional-
ity might be read in a different way than diagrams without a “built-in” direction.
More research is needed to explore this question further, possibly the experimen-
tal setup of our study (in particular, the respective layouts) limit the insight of
our study.
    We have replaced the global and undifferentiated assessment of cognitive load
used in earlier experiments by three concurrent, objective measurements with
high temporal and spatial resolution: pupil dilation, fixation duration, and blink


                                         4
rate. This allows us to identify the cognitive load of individual diagram elements,
and we observe that shapes (i.e., diagram elements representing classes, use
cases, actions, and so on) provoke a similar cognitive load profile than layout
flaw such as line crossings and line bends. In other words: increasing the number
of diagram elements has the same effect on cognitive processing than increasing
the number of layout flaws. This explains in a very elegant way our previous
finding that diagram size seems to be more important than layout quality: the
number of diagram flaws can (often) be reduced, the diagram size (measured in
the number of elements, [6]) cannot.


6   Summary and Future Work
We replicate earlier experiments on the impact of layout quality (and other fac-
tors) on UML diagram understanding by using eye tracking instead of subjective
assessment. We observe characteristic behavior patterns and study several influ-
ence factors. We observe a great uniformity in the reading strategies applied (at
least for experienced modelers) to the degree we are almost certain to observe a
universal cognitive process. If it is indeed universal, we might be able to find a
physiological substrate to the observed behavior. Obviously, our present study
is not suited to explore this thread further, and we have to defer that to future
research employing different machinery, in particular brain imaging.


References
1. Holger Eichelberger and K. Schmid. Guidelines on the aesthetic quality of UML
   class diagrams. Information and Software Technology, 51(12):1686–1698, 2009.
2. Daniel Gopher and Rolf Braune. On the Psychophysics of Workload: Why Bother
   with Subjective Measures? Human Factors, 26(5):519–532, 1984.
3. Maier, Anja M. and Baltsen, Nick and Christoffersen, Henrik and Strrle, Harald.
   Towards Diagram Understanding: A Pilot-Study Measuring Cognitive Workload
   Through Eye-Tracking. In Proc. Intl. Conf. Human Behavior in Design, 2014.
4. Harald Störrle. On the Impact of Layout Quality to Unterstanding UML Dia-
   grams. In Proc. IEEE Symp. Visual Languages and Human-Centric Computing
   (VL/HCC’11), pages 135–142. IEEE Computer Society, 2011.
5. Harald Störrle. On the Impact of Layout Quality to Unterstanding UML Dia-
   grams: Diagram Type and Expertise. In Gennaro Costagliola, Andrew Ko, Allen
   Cypher, Jeffrey Nichols, Christopher Scaffidi, Caitlin Kelleher, and Brad My-
   ers, editors, Proc. IEEE Symp. Visual Languages and Human-Centric Computing
   (VL/HCC’12), pages 195–202. IEEE Computer Society, 2012.
6. Harald Störrle. On the Impact of Layout Quality to Understanding UML Dia-
   grams: Size Matters. In Jürgen Dingel, Wolfram Schulte, Isidro Ramos, and Emilio
   Abrahao, Silviaand Insfran, editors, Proc. 17th Intl. Conf. Model Driven Engineer-
   ing Languages and Systems (MoDELS’14), number 8767 in LNCS, pages 518–534.
   Springer Verlag, 2014.


                                         5